Working Directory Structure¶

NPLinker requires a fixed structure of working directory with fixed names for the input and output data.

root_dir # (1)!
    │
    ├── nplinker.toml                           # (2)!
    ├── strain_mappings.json                [F] # (3)!
    ├── strains_selected.json               [F][O] # (4)!
    │
    ├── gnps                                [F] # (5)!
    │       ├── spectra.mgf                 [F]
    │       ├── molecular_families.tsv      [F]
    │       ├── annotations.tsv             [F]
    │       └── file_mappings.tsv (.csv)    [F] # (6)!
    │
    ├── antismash                           [F] # (7)!
    │   ├── GCF_000514975.1                     # (8)!
    │   │   ├── xxx.region001.gbk
    │   │   └── ...
    │   ├── GCF_000016425.1
    │   │   ├── xxxx.region001.gbk
    │   │   └── ...
    │   └── ...
    │
    ├── bigscape                            [F][O] # (9)!
    │   ├── mix_clustering_c0.30.tsv           [O] # (10)!
    │   ├── data_sqlite.db                     [O] # (11)!
    │   └── bigscape_running_output            [A] # (12)!
    │       └── ...
    │
    ├── downloads                           [F][A] # (13)!
    │       ├── GCF_000016425.1.zip
    │       ├── GCF_0000514975.1.zip
    │       ├── c22f44b14a3d450eb836d607cb9521bb.zip
    │       ├── genome_status.json
    │       ├── mibig_json_3.1.tar.gz
    │       └── ...
    │
    ├── mibig                               [F][A] # (14)!
    │   ├── BGC0000001.json
    │   ├── BGC0000002.json
    │   └── ...
    │
    ├── output                              [F][A] # (15)!
    │   └── ...
    │
    └── ...                                        # (16)!

root_dir is the working directory you created, used as the root directory for NPLinker.
nplinker.toml is the configuration file (toml format) provided by the user for running NPLinker.
strain_mappings.json contains the mappings from strain to genomics and metabolomics data. It is generated by NPLinker for podp mode; for local mode, users need to create it manually.
[F] means the file name strain_mappings.json is fixed (including the extension) and must be named as required.
strains_selected.json is an optional file containing the list of strains to be used in the analysis. If it is not provided, NPLinker will use all strains detected from the input data.
[O] means optional, it's optional for users to provide the file strains_selected.json.
gnps directory contains the GNPS data. The files in this directory must be named as shown. See gnps data for more information.
This file could be .tsv or .csv format.
antismash directory contains a collection of AntiSMASH BGC data. The BGC data (*.region*.gbk files) must be stored in subdirectories named after NCBI accession number (e.g. GCF_000514975.1).
The GCF_000514975.1 has nothing to do with BigScape GCF, and it's just the NCBI accession number of the genome.
This directory contains the output of BigScape. If the directory is not provided, NPLinker will run BigScape automatically to generate it using the AntiSMASH BGC data.
If you provide the BigScape output, you just need to provide output from v1 or v2, not both.
mix_clustering_c0.30.tsv is an example output of BigScape v1. The file name must follow the pattern mix_clustering_c{cutoff}.tsv, where {cutoff} is the cutoff value used in the BigScape run.
data_sqlite.db is the output of BigScape v2.
The bigscape_running_output directory is automatically created and managed by NPLinker. It stores the output data of BigScape. Users should not interfere with this directory and its content.
[A] means the directory is automatically created and/or managed by NPLinker.
downloads directory is automatically created and managed by NPLinker. It stores the downloaded data from the internet. Users can also use it to store their own downloaded data.
mibig directory contains the MIBiG metadata, which is automatically created and downloaded by NPLinker. Users should not interfere with this directory and its content.
output directory is automatically created by NPLinker. It stores the output data of NPLinker.
It's flexible to extend NPLinker by adding other types of data.

Tip

[F] means the file or directory name is fixed and must be named as shown. The names are defined in the defaults module.
[O] means the file or directory is optional for users to provide. It does not mean the file or directory is optional for NPLinker to use. If it's not provided by the user, NPLinker may generate it.
[A] means the directory is automatically created and/or managed by NPLinker.