Skip to content

Dataset Loader

nplinker.loader

DatasetLoader

DatasetLoader(config: Dynaconf)

Load datasets from the working directory with the given configuration.

Concept and Diagram

Working Directory Structure

Dataset Loading Pipeline

Loaded data are stored in the data containers (attributes), e.g. self.bgcs, self.gcfs, etc.

Attributes:

Parameters:

  • config (Dynaconf) –

    A Dynaconf object that contains the configuration settings.

Examples:

>>> from nplinker.config import load_config
>>> from nplinker.loader import DatasetLoader
>>> config = load_config("nplinker.toml")
>>> loader = DatasetLoader(config)
>>> loader.load()
See Also

DatasetArranger: Download, generate and/or validate datasets to ensure they are ready for loading.

Source code in src/nplinker/loader.py
def __init__(self, config: Dynaconf) -> None:
    """Initialize the DatasetLoader.

    Args:
        config: A Dynaconf object that contains the configuration settings.

    Examples:
        >>> from nplinker.config import load_config
        >>> from nplinker.loader import DatasetLoader
        >>> config = load_config("nplinker.toml")
        >>> loader = DatasetLoader(config)
        >>> loader.load()

    See Also:
        [DatasetArranger][nplinker.arranger.DatasetArranger]: Download, generate and/or validate
            datasets to ensure they are ready for loading.
    """
    self.config = config

    self.bgcs: list[BGC] = []
    self.gcfs: list[GCF] = []
    self.spectra: list[Spectrum] = []
    self.mfs: list[MolecularFamily] = []
    self.mibig_bgcs: list[BGC] = []
    self.mibig_strains_in_use: StrainCollection = StrainCollection()
    self.product_types: list = []
    self.strains: StrainCollection = StrainCollection()

config instance-attribute

config = config

bgcs instance-attribute

bgcs: list[BGC] = []

gcfs instance-attribute

gcfs: list[GCF] = []

spectra instance-attribute

spectra: list[Spectrum] = []

mfs instance-attribute

mfs: list[MolecularFamily] = []

mibig_bgcs instance-attribute

mibig_bgcs: list[BGC] = []

mibig_strains_in_use instance-attribute

mibig_strains_in_use: StrainCollection = StrainCollection()

product_types instance-attribute

product_types: list = []

strains instance-attribute

load

load() -> bool

Load all data from data files in the working directory.

See Dataset Loading Pipeline for the detailed steps.

Returns:

  • bool

    True if all data are loaded successfully.

Source code in src/nplinker/loader.py
def load(self) -> bool:
    """Load all data from data files in the working directory.

    See [Dataset Loading Pipeline][dataset-loading-pipeline] for the detailed steps.

    Returns:
        True if all data are loaded successfully.
    """
    if not self._load_strain_mappings():
        return False

    if not self._load_metabolomics():
        return False

    if not self._load_genomics():
        return False

    # set self.strains with all strains from input plus mibig strains in use
    self.strains = self.strains + self.mibig_strains_in_use

    if len(self.strains) == 0:
        raise Exception("Failed to find *ANY* strains.")

    return True