Skip to content

MiBIG

nplinker.genomics.mibig

MibigLoader

MibigLoader(data_dir: str | PathLike)

Bases: BGCLoaderBase

Parse MIBiG metadata files and return BGC objects.

MIBiG metadata file (json) contains annotations/metadata information for each BGC. See https://mibig.secondarymetabolites.org/download.

The MiBIG accession is used as BGC id and strain name. The loaded BGC objects have Strain object as their strain attribute (i.e. BGC.strain).

Parameters:

  • data_dir (str | PathLike) –

    Path to the directory of MIBiG metadata json files

Examples:

>>> loader = MibigLoader("path/to/mibig/data/dir")
>>> loader.data_dir
'path/to/mibig/data/dir'
>>> loader.get_bgcs()
[BGC('BGC000001', 'NRP'), BGC('BGC000002', 'Polyketide')]
Source code in src/nplinker/genomics/mibig/mibig_loader.py
def __init__(self, data_dir: str | PathLike):
    """Initialize the MIBiG metadata loader.

    Args:
        data_dir: Path to the directory of MIBiG metadata json files

    Examples:
        >>> loader = MibigLoader("path/to/mibig/data/dir")
        >>> loader.data_dir
        'path/to/mibig/data/dir'
        >>> loader.get_bgcs()
        [BGC('BGC000001', 'NRP'), BGC('BGC000002', 'Polyketide')]
    """
    self.data_dir = str(data_dir)
    self._file_dict = self.parse_data_dir(self.data_dir)
    self._metadata_dict = self._parse_metadata()
    self._bgcs = self._parse_bgcs()

data_dir instance-attribute

data_dir = str(data_dir)

get_files

get_files() -> dict[str, str]

Get the path of all MIBiG metadata json files.

Returns:

  • dict[str, str]

    The key is metadata file name (BGC accession), and the value is path to the metadata

  • dict[str, str]

    json file

Source code in src/nplinker/genomics/mibig/mibig_loader.py
def get_files(self) -> dict[str, str]:
    """Get the path of all MIBiG metadata json files.

    Returns:
        The key is metadata file name (BGC accession), and the value is path to the metadata
        json file
    """
    return self._file_dict

parse_data_dir staticmethod

parse_data_dir(data_dir: str | PathLike) -> dict[str, str]

Parse metadata directory and return paths to all metadata json files.

Parameters:

  • data_dir (str | PathLike) –

    path to the directory of MIBiG metadata json files

Returns:

  • dict[str, str]

    The key is metadata file name (BGC accession), and the value is path to the metadata

  • dict[str, str]

    json file

Source code in src/nplinker/genomics/mibig/mibig_loader.py
@staticmethod
def parse_data_dir(data_dir: str | PathLike) -> dict[str, str]:
    """Parse metadata directory and return paths to all metadata json files.

    Args:
        data_dir: path to the directory of MIBiG metadata json files

    Returns:
        The key is metadata file name (BGC accession), and the value is path to the metadata
        json file
    """
    file_dict = {}
    json_files = list_files(data_dir, prefix="BGC", suffix=".json")
    for file in json_files:
        fname = Path(file).stem
        file_dict[fname] = file
    return file_dict

get_metadata

get_metadata() -> dict[str, MibigMetadata]

Get MibigMetadata objects.

Returns:

  • dict[str, MibigMetadata]

    The key is BGC accession (file name) and the value is MibigMetadata object

Source code in src/nplinker/genomics/mibig/mibig_loader.py
def get_metadata(self) -> dict[str, MibigMetadata]:
    """Get MibigMetadata objects.

    Returns:
        The key is BGC accession (file name) and the value is MibigMetadata object
    """
    return self._metadata_dict

get_bgcs

get_bgcs() -> list[BGC]

Get BGC objects.

The BGC objects use MiBIG accession as id and have Strain object as their strain attribute (i.e. BGC.strain), where the name of the Strain object is also MiBIG accession.

Returns:

  • list[BGC]

    A list of BGC objects

Source code in src/nplinker/genomics/mibig/mibig_loader.py
def get_bgcs(self) -> list[BGC]:
    """Get BGC objects.

    The BGC objects use MiBIG accession as id and have Strain object as
    their strain attribute (i.e. `BGC.strain`), where the name of the Strain
    object is also MiBIG accession.

    Returns:
        A list of BGC objects
    """
    return self._bgcs

MibigMetadata

MibigMetadata(file: str | PathLike)

Class to model the BGC metadata/annotations defined in MIBiG.

MIBiG is a specification of BGC metadata and use JSON schema to represent BGC metadata. More details see: https://mibig.secondarymetabolites.org/download.

Parameters:

  • file (str | PathLike) –

    Path to the json file of MIBiG BGC metadata

Examples:

>>> metadata = MibigMetadata("/data/BGC0000001.json")
Source code in src/nplinker/genomics/mibig/mibig_metadata.py
def __init__(self, file: str | PathLike) -> None:
    """Initialize the MIBiG metadata object.

    Args:
        file: Path to the json file of MIBiG BGC metadata

    Examples:
        >>> metadata = MibigMetadata("/data/BGC0000001.json")
    """
    self.file = str(file)
    with open(self.file, "rb") as f:
        self.metadata = json.load(f)

    self._mibig_accession: str
    self._biosyn_class: tuple[str]
    self._parse_metadata()

file instance-attribute

file = str(file)

metadata instance-attribute

metadata = load(f)

mibig_accession property

mibig_accession: str

Get the value of metadata item 'mibig_accession'.

biosyn_class property

biosyn_class: tuple[str]

Get the value of metadata item 'biosyn_class'.

The 'biosyn_class' is biosynthetic class(es), namely the type of natural product or secondary metabolite.

MIBiG defines 6 major biosynthetic classes for natural products, including NRP, Polyketide, RiPP, Terpene, Saccharide and Alkaloid. Note that natural products created by the other biosynthetic mechanisms fall under the category Other. For more details see the paper.

download_and_extract_mibig_metadata

download_and_extract_mibig_metadata(
    download_root: str | PathLike,
    extract_path: str | PathLike,
    version: str = "3.1",
)

Download and extract MIBiG metadata json files.

Note that it does not matter whether the metadata json files are in nested folders or not in the archive, all json files will be extracted to the same location, i.e. extract_path. The nested folders will be removed if they exist. So the extract_path will have only json files.

Parameters:

  • download_root (str | PathLike) –

    Path to the directory in which to place the downloaded archive.

  • extract_path (str | PathLike) –

    Path to an empty directory where the json files will be extracted. The directory must be empty if it exists. If it doesn't exist, the directory will be created.

  • version (str, default: '3.1' ) –

    description. Defaults to "3.1".

Examples:

>>> download_and_extract_mibig_metadata("/data/download", "/data/mibig_metadata")
Source code in src/nplinker/genomics/mibig/mibig_downloader.py
def download_and_extract_mibig_metadata(
    download_root: str | os.PathLike,
    extract_path: str | os.PathLike,
    version: str = "3.1",
):
    """Download and extract MIBiG metadata json files.

    Note that it does not matter whether the metadata json files are in nested folders or not in the archive,
    all json files will be extracted to the same location, i.e. `extract_path`. The nested
    folders will be removed if they exist. So the `extract_path` will have only json files.

    Args:
        download_root: Path to the directory in which to place the downloaded archive.
        extract_path: Path to an empty directory where the json files will be extracted.
            The directory must be empty if it exists. If it doesn't exist, the directory will be created.
        version: _description_. Defaults to "3.1".

    Examples:
        >>> download_and_extract_mibig_metadata("/data/download", "/data/mibig_metadata")
    """
    download_root = Path(download_root)
    extract_path = Path(extract_path)

    if download_root == extract_path:
        raise ValueError("Identical path of download directory and extract directory")

    # check if extract_path is empty
    if not extract_path.exists():
        extract_path.mkdir(parents=True)
    else:
        if len(list(extract_path.iterdir())) != 0:
            raise ValueError(f'Nonempty directory: "{extract_path}"')

    # download and extract
    md5 = _MD5_MIBIG_METADATA[version]
    download_and_extract_archive(
        url=MIBIG_METADATA_URL.format(version=version),
        download_root=download_root,
        extract_root=extract_path,
        md5=md5,
    )

    # After extracting mibig archive, it's either one dir or many json files,
    # if it's a dir, then move all json files from it to extract_path
    subdirs = list_dirs(extract_path)
    if len(subdirs) > 1:
        raise ValueError(f"Expected one extracted directory, got {len(subdirs)}")

    if len(subdirs) == 1:
        subdir_path = subdirs[0]
        for fname in list_files(subdir_path, prefix="BGC", suffix=".json", keep_parent=False):
            shutil.move(os.path.join(subdir_path, fname), os.path.join(extract_path, fname))
        # delete subdir
        if subdir_path != extract_path:
            shutil.rmtree(subdir_path)

parse_bgc_metadata_json

parse_bgc_metadata_json(file: str | PathLike) -> BGC

Parse MIBiG metadata file and return BGC object.

Note that the MiBIG accession is used as the BGC id and strain name. The BGC object has Strain object as its strain attribute.

Parameters:

  • file (str | PathLike) –

    Path to the MIBiG metadata json file

Returns:

  • BGC

    BGC object

Source code in src/nplinker/genomics/mibig/mibig_loader.py
def parse_bgc_metadata_json(file: str | PathLike) -> BGC:
    """Parse MIBiG metadata file and return BGC object.

    Note that the MiBIG accession is used as the BGC id and strain name. The BGC
    object has Strain object as its strain attribute.

    Args:
        file: Path to the MIBiG metadata json file

    Returns:
        BGC object
    """
    metadata = MibigMetadata(str(file))
    mibig_bgc = BGC(metadata.mibig_accession, *metadata.biosyn_class)
    mibig_bgc.mibig_bgc_class = metadata.biosyn_class
    mibig_bgc.strain = Strain(metadata.mibig_accession)
    return mibig_bgc