Skip to content

Data Models

nplinker.genomics

BGC

BGC(id: str, /, *product_prediction: str)

Class to model BGC (biosynthetic gene cluster) data.

BGC data include both annotations and sequence data. This class is mainly designed to model the annotations or metadata.

The raw BGC data is stored in GenBank format (.gbk). Additional GenBank features could be added to the GenBank file to annotate BGCs, e.g. antiSMASH has some self-defined features (like region) in its output GenBank files.

The annotations of BGC can be stored in JSON format, which is defined and used by MIBiG.

Attributes:

  • id

    BGC identifier, e.g. MIBiG accession, GenBank accession.

  • product_prediction

    A tuple of (predicted) natural products or product classes of the BGC. For antiSMASH's GenBank data, the feature region /product gives product information. For MIBiG metadata, its biosynthetic class provides such info.

  • mibig_bgc_class (tuple[str] | None) –

    A tuple of MIBiG biosynthetic classes to which the BGC belongs. Defaults to None, which means the class is unknown.

    MIBiG defines 6 major biosynthetic classes for natural products, including NRP, Polyketide, RiPP, Terpene, Saccharide and Alkaloid. Note that natural products created by the other biosynthetic mechanisms fall under the category Other. For more details see the paper.

  • description (str | None) –

    Brief description of the BGC. Defaults to None.

  • smiles (tuple[str] | None) –

    A tuple of SMILES formulas of the BGC's products. Defaults to None.

  • antismash_file (str | None) –

    The path to the antiSMASH GenBank file. Defaults to None.

  • antismash_id (str | None) –

    Identifier of the antiSMASH BGC, referring to the feature VERSION of GenBank file. Defaults to None.

  • antismash_region (int | None) –

    AntiSMASH BGC region number, referring to the feature region of GenBank file. Defaults to None.

  • parents (set[GCF]) –

    The set of GCFs that contain the BGC.

  • strain (Strain | None) –

    The strain of the BGC.

Parameters:

  • id (str) –

    BGC identifier, e.g. MIBiG accession, GenBank accession.

  • product_prediction (str, default: () ) –

    BGC's (predicted) natural products or product classes.

Examples:

>>> bgc = BGC("Unique_BGC_ID", "Polyketide", "NRP")
>>> bgc.id
'Unique_BGC_ID'
>>> bgc.product_prediction
('Polyketide', 'NRP')
>>> bgc.is_mibig()
False
Source code in src/nplinker/genomics/bgc.py
def __init__(self, id: str, /, *product_prediction: str):
    """Initialize the BGC object.

    Args:
        id: BGC identifier, e.g. MIBiG accession, GenBank accession.
        product_prediction: BGC's (predicted) natural products or product classes.

    Examples:
        >>> bgc = BGC("Unique_BGC_ID", "Polyketide", "NRP")
        >>> bgc.id
        'Unique_BGC_ID'
        >>> bgc.product_prediction
        ('Polyketide', 'NRP')
        >>> bgc.is_mibig()
        False
    """
    # BGC metadata
    self.id = id
    self.product_prediction = product_prediction

    self.mibig_bgc_class: tuple[str] | None = None
    self.description: str | None = None
    self.smiles: tuple[str] | None = None

    # antismash related attributes
    self.antismash_file: str | None = None
    self.antismash_id: str | None = None  # version in .gbk, id in SeqRecord
    self.antismash_region: int | None = None  # antismash region number

    # other attributes
    self.parents: set[GCF] = set()
    self._strain: Strain | None = None

id instance-attribute

id = id

product_prediction instance-attribute

product_prediction = product_prediction

mibig_bgc_class instance-attribute

mibig_bgc_class: tuple[str] | None = None

description instance-attribute

description: str | None = None

smiles instance-attribute

smiles: tuple[str] | None = None

antismash_file instance-attribute

antismash_file: str | None = None

antismash_id instance-attribute

antismash_id: str | None = None

antismash_region instance-attribute

antismash_region: int | None = None

parents instance-attribute

parents: set[GCF] = set()

strain property writable

strain: Strain | None

Get the strain of the BGC.

bigscape_classes property

bigscape_classes: set[str | None]

Get BiG-SCAPE's BGC classes.

BiG-SCAPE's BGC classes are similar to those defined in MiBIG but have more categories (7 classes), including:

  • NRPS
  • PKS-NRP_Hybrids
  • PKSI
  • PKSother
  • RiPPs
  • Saccharides
  • Terpene

For BGC falls outside of these categories, the value is "Others".

Default is None, which means the class is unknown.

More details see: https://doi.org/10.1038%2Fs41589-019-0400-9.

aa_predictions property

aa_predictions: list

Amino acids as predicted monomers of product.

Returns:

  • list

    list of dicts with key as amino acid and value as prediction

  • list

    probability.

__repr__

__repr__()
Source code in src/nplinker/genomics/bgc.py
def __repr__(self):
    return str(self)

__str__

__str__()
Source code in src/nplinker/genomics/bgc.py
def __str__(self):
    return "{}(id={}, strain={}, asid={}, region={})".format(
        self.__class__.__name__,
        self.id,
        self.strain,
        self.antismash_id,
        self.antismash_region,
    )

__eq__

__eq__(other) -> bool
Source code in src/nplinker/genomics/bgc.py
def __eq__(self, other) -> bool:
    if isinstance(other, BGC):
        return self.id == other.id and self.product_prediction == other.product_prediction
    return NotImplemented

__hash__

__hash__() -> int
Source code in src/nplinker/genomics/bgc.py
def __hash__(self) -> int:
    return hash((self.id, self.product_prediction))

__reduce__

__reduce__() -> tuple

Reduce function for pickling.

Source code in src/nplinker/genomics/bgc.py
def __reduce__(self) -> tuple:
    """Reduce function for pickling."""
    return (self.__class__, (self.id, *self.product_prediction), self.__dict__)

add_parent

add_parent(gcf: GCF) -> None

Add a parent GCF to the BGC.

Parameters:

  • gcf (GCF) –

    gene cluster family

Source code in src/nplinker/genomics/bgc.py
def add_parent(self, gcf: GCF) -> None:
    """Add a parent GCF to the BGC.

    Args:
        gcf: gene cluster family
    """
    gcf.add_bgc(self)

detach_parent

detach_parent(gcf: GCF) -> None

Remove a parent GCF.

Source code in src/nplinker/genomics/bgc.py
def detach_parent(self, gcf: GCF) -> None:
    """Remove a parent GCF."""
    gcf.detach_bgc(self)

is_mibig

is_mibig() -> bool

Check if the BGC is a MIBiG reference BGC or not.

Warning

This method evaluates MIBiG BGC based on the pattern that MIBiG BGC names start with "BGC". It might give false positive result.

Returns:

  • bool

    True if it's MIBiG reference BGC

Source code in src/nplinker/genomics/bgc.py
def is_mibig(self) -> bool:
    """Check if the BGC is a MIBiG reference BGC or not.

    Warning:
        This method evaluates MIBiG BGC based on the pattern that MIBiG
        BGC names start with "BGC". It might give false positive result.

    Returns:
        True if it's MIBiG reference BGC
    """
    return self.id.startswith("BGC")

GCF

GCF(id: str)

Class to model gene cluster family (GCF).

GCF is a group of similar BGCs and generated by clustering BGCs with tools such as BiG-SCAPE and BiG-SLICE.

Attributes:

  • id

    id of the GCF object.

  • bgc_ids (set[str]) –

    a set of BGC ids that belongs to the GCF.

  • bigscape_class (str | None) –

    BiG-SCAPE's BGC class. BiG-SCAPE's BGC classes are similar to those defined in MiBIG but have more categories (7 classes), including:

    • NRPS
    • PKS-NRP_Hybrids
    • PKSI
    • PKSother
    • RiPPs
    • Saccharides
    • Terpene

    For BGC falls outside of these categories, the value is "Others".

    Default is None, which means the class is unknown.

    More details see: https://doi.org/10.1038%2Fs41589-019-0400-9.

Parameters:

  • id (str) –

    id of the GCF object.

Examples:

>>> gcf = GCF("Unique_GCF_ID")
>>> gcf.id
'Unique_GCF_ID'
Source code in src/nplinker/genomics/gcf.py
def __init__(self, id: str, /) -> None:
    """Initialize the GCF object.

    Args:
        id: id of the GCF object.

    Examples:
        >>> gcf = GCF("Unique_GCF_ID")
        >>> gcf.id
        'Unique_GCF_ID'
    """
    self.id = id
    self.bgc_ids: set[str] = set()
    self.bigscape_class: str | None = None
    self._bgcs: set[BGC] = set()
    self._strains: StrainCollection = StrainCollection()

id instance-attribute

id = id

bgc_ids instance-attribute

bgc_ids: set[str] = set()

bigscape_class instance-attribute

bigscape_class: str | None = None

bgcs property

bgcs: set[BGC]

Get the BGC objects.

strains property

Get the strains in the GCF.

__str__

__str__() -> str
Source code in src/nplinker/genomics/gcf.py
def __str__(self) -> str:
    return (
        f"GCF(id={self.id}, #BGC_objects={len(self.bgcs)}, #bgc_ids={len(self.bgc_ids)},"
        f"#strains={len(self._strains)})."
    )

__repr__

__repr__() -> str
Source code in src/nplinker/genomics/gcf.py
def __repr__(self) -> str:
    return str(self)

__eq__

__eq__(other) -> bool
Source code in src/nplinker/genomics/gcf.py
def __eq__(self, other) -> bool:
    if isinstance(other, GCF):
        return self.id == other.id and self.bgcs == other.bgcs
    return NotImplemented

__hash__

__hash__() -> int

Hash function for GCF.

Note that GCF class is a mutable container. We only hash the GCF id to avoid the hash value changes when self._bgcs is updated.

Source code in src/nplinker/genomics/gcf.py
def __hash__(self) -> int:
    """Hash function for GCF.

    Note that GCF class is a mutable container. We only hash the GCF id to
    avoid the hash value changes when `self._bgcs` is updated.
    """
    return hash(self.id)

__reduce__

__reduce__() -> tuple

Reduce function for pickling.

Source code in src/nplinker/genomics/gcf.py
def __reduce__(self) -> tuple:
    """Reduce function for pickling."""
    return (self.__class__, (self.id,), self.__dict__)

add_bgc

add_bgc(bgc: BGC) -> None

Add a BGC object to the GCF.

Source code in src/nplinker/genomics/gcf.py
def add_bgc(self, bgc: BGC) -> None:
    """Add a BGC object to the GCF."""
    bgc.parents.add(self)
    self._bgcs.add(bgc)
    self.bgc_ids.add(bgc.id)
    if bgc.strain is not None:
        self._strains.add(bgc.strain)
    else:
        logger.warning("No strain specified for the BGC %s", bgc.id)

detach_bgc

detach_bgc(bgc: BGC) -> None

Remove a child BGC object.

Source code in src/nplinker/genomics/gcf.py
def detach_bgc(self, bgc: BGC) -> None:
    """Remove a child BGC object."""
    bgc.parents.remove(self)
    self._bgcs.remove(bgc)
    self.bgc_ids.remove(bgc.id)
    if bgc.strain is not None:
        for other_bgc in self._bgcs:
            if other_bgc.strain == bgc.strain:
                return
        self._strains.remove(bgc.strain)

has_strain

has_strain(strain: Strain) -> bool

Check if the given strain exists.

Parameters:

  • strain (Strain) –

    Strain object.

Returns:

  • bool

    True when the given strain exist.

Source code in src/nplinker/genomics/gcf.py
def has_strain(self, strain: Strain) -> bool:
    """Check if the given strain exists.

    Args:
        strain: `Strain` object.

    Returns:
        True when the given strain exist.
    """
    return strain in self._strains

has_mibig_only

has_mibig_only() -> bool

Check if the GCF's children are only MIBiG BGCs.

Returns:

  • bool

    True if GCF.bgc_ids are only MIBiG BGC ids.

Source code in src/nplinker/genomics/gcf.py
def has_mibig_only(self) -> bool:
    """Check if the GCF's children are only MIBiG BGCs.

    Returns:
        True if `GCF.bgc_ids` are only MIBiG BGC ids.
    """
    return all(map(lambda id: id.startswith("BGC"), self.bgc_ids))

is_singleton

is_singleton() -> bool

Check if the GCF contains only one BGC.

Returns:

  • bool

    True if GCF.bgc_ids contains only one BGC id.

Source code in src/nplinker/genomics/gcf.py
def is_singleton(self) -> bool:
    """Check if the GCF contains only one BGC.

    Returns:
        True if `GCF.bgc_ids` contains only one BGC id.
    """
    return len(self.bgc_ids) == 1