Data Models
nplinker.genomics
¶
BGC
¶
Class to model BGC (biosynthetic gene cluster) data.
BGC data include both annotations and sequence data. This class is mainly designed to model the annotations or metadata.
The raw BGC data is stored in GenBank format (.gbk). Additional
GenBank features
could be added to the GenBank file to annotate
BGCs, e.g. antiSMASH has some self-defined features (like region
) in
its output GenBank files.
The annotations of BGC can be stored in JSON format, which is defined and used by MIBiG.
Attributes:
-
id
–BGC identifier, e.g. MIBiG accession, GenBank accession.
-
product_prediction
–A tuple of (predicted) natural products or product classes of the BGC. For antiSMASH's GenBank data, the feature
region /product
gives product information. For MIBiG metadata, its biosynthetic class provides such info. -
mibig_bgc_class
(tuple[str] | None
) –A tuple of MIBiG biosynthetic classes to which the BGC belongs. Defaults to None, which means the class is unknown.
MIBiG defines 6 major biosynthetic classes for natural products, including
NRP
,Polyketide
,RiPP
,Terpene
,Saccharide
andAlkaloid
. Note that natural products created by the other biosynthetic mechanisms fall under the categoryOther
. For more details see the paper. -
description
(str | None
) –Brief description of the BGC. Defaults to None.
-
smiles
(tuple[str] | None
) –A tuple of SMILES formulas of the BGC's products. Defaults to None.
-
antismash_file
(str | None
) –The path to the antiSMASH GenBank file. Defaults to None.
-
antismash_id
(str | None
) –Identifier of the antiSMASH BGC, referring to the feature
VERSION
of GenBank file. Defaults to None. -
antismash_region
(int | None
) –AntiSMASH BGC region number, referring to the feature
region
of GenBank file. Defaults to None. -
parents
(set[GCF]
) –The set of GCFs that contain the BGC.
-
strain
(Strain | None
) –The strain of the BGC.
Parameters:
-
id
(str
) –BGC identifier, e.g. MIBiG accession, GenBank accession.
-
product_prediction
(str
, default:()
) –BGC's (predicted) natural products or product classes.
Examples:
>>> bgc = BGC("Unique_BGC_ID", "Polyketide", "NRP")
>>> bgc.id
'Unique_BGC_ID'
>>> bgc.product_prediction
('Polyketide', 'NRP')
>>> bgc.is_mibig()
False
Source code in src/nplinker/genomics/bgc.py
bigscape_classes
property
¶
Get BiG-SCAPE's BGC classes.
BiG-SCAPE's BGC classes are similar to those defined in MiBIG but have more categories (7 classes), including:
- NRPS
- PKS-NRP_Hybrids
- PKSI
- PKSother
- RiPPs
- Saccharides
- Terpene
For BGC falls outside of these categories, the value is "Others".
Default is None, which means the class is unknown.
More details see: https://doi.org/10.1038%2Fs41589-019-0400-9.
__repr__
¶
__str__
¶
is_mibig
¶
is_mibig() -> bool
Check if the BGC is a MIBiG reference BGC or not.
Warning
This method evaluates MIBiG BGC based on the pattern that MIBiG BGC names start with "BGC". It might give false positive result.
Returns:
-
bool
–True if it's MIBiG reference BGC
Source code in src/nplinker/genomics/bgc.py
GCF
¶
GCF(id: str)
Class to model gene cluster family (GCF).
GCF is a group of similar BGCs and generated by clustering BGCs with tools such as BiG-SCAPE and BiG-SLICE.
Attributes:
-
id
–id of the GCF object.
-
bgc_ids
(set[str]
) –a set of BGC ids that belongs to the GCF.
-
bigscape_class
(str | None
) –BiG-SCAPE's BGC class. BiG-SCAPE's BGC classes are similar to those defined in MiBIG but have more categories (7 classes), including:
- NRPS
- PKS-NRP_Hybrids
- PKSI
- PKSother
- RiPPs
- Saccharides
- Terpene
For BGC falls outside of these categories, the value is "Others".
Default is None, which means the class is unknown.
More details see: https://doi.org/10.1038%2Fs41589-019-0400-9.
Parameters:
-
id
(str
) –id of the GCF object.
Examples:
Source code in src/nplinker/genomics/gcf.py
__hash__
¶
__hash__() -> int
Hash function for GCF.
Note that GCF class is a mutable container. We only hash the GCF id to
avoid the hash value changes when self._bgcs
is updated.
add_bgc
¶
add_bgc(bgc: BGC) -> None
Add a BGC object to the GCF.
Source code in src/nplinker/genomics/gcf.py
detach_bgc
¶
detach_bgc(bgc: BGC) -> None
Remove a child BGC object.
Source code in src/nplinker/genomics/gcf.py
has_strain
¶
has_mibig_only
¶
has_mibig_only() -> bool
Check if the GCF's children are only MIBiG BGCs.
Returns:
-
bool
–True if
GCF.bgc_ids
are only MIBiG BGC ids.