Working Directory Structure¶
NPLinker requires a fixed structure of working directory with fixed names for the input and output data.
root_dir # (1)!
│
├── nplinker.toml # (2)!
├── strain_mappings.json [F] # (3)!
├── strains_selected.json [F][O] # (4)!
│
├── gnps [F] # (5)!
│ ├── spectra.mgf [F]
│ ├── molecular_families.tsv [F]
│ ├── annotations.tsv [F]
│ └── file_mappings.tsv (.csv) [F] # (6)!
│
├── antismash [F] # (7)!
│ ├── GCF_000514975.1 # (8)!
│ │ ├── xxx.region001.gbk
│ │ └── ...
│ ├── GCF_000016425.1
│ │ ├── xxxx.region001.gbk
│ │ └── ...
│ └── ...
│
├── bigscape [F][O] # (9)!
│ ├── mix_clustering_c0.30.tsv [O] # (10)!
│ ├── data_sqlite.db [O] # (11)!
│ └── bigscape_running_output [A] # (12)!
│ └── ...
│
├── downloads [F][A] # (13)!
│ ├── GCF_000016425.1.zip
│ ├── GCF_0000514975.1.zip
│ ├── c22f44b14a3d450eb836d607cb9521bb.zip
│ ├── genome_status.json
│ ├── mibig_json_3.1.tar.gz
│ └── ...
│
├── mibig [F][A] # (14)!
│ ├── BGC0000001.json
│ ├── BGC0000002.json
│ └── ...
│
├── output [F][A] # (15)!
│ └── ...
│
└── ... # (16)!
root_diris the working directory you created, used as the root directory for NPLinker.nplinker.tomlis the configuration file (toml format) provided by the user for running NPLinker.strain_mappings.jsoncontains the mappings from strain to genomics and metabolomics data. It is generated by NPLinker forpodpmode; forlocalmode, users need to create it manually.
[F]means the file namestrain_mappings.jsonis fixed (including the extension) and must be named as required.strains_selected.jsonis an optional file containing the list of strains to be used in the analysis. If it is not provided, NPLinker will use all strains detected from the input data.
[O]means optional, it's optional for users to provide the filestrains_selected.json.gnpsdirectory contains the GNPS data. The files in this directory must be named as shown. See gnps data for more information.- This file could be
.tsvor.csvformat. antismashdirectory contains a collection of AntiSMASH BGC data. The BGC data (*.region*.gbkfiles) must be stored in subdirectories named after NCBI accession number (e.g.GCF_000514975.1).- The
GCF_000514975.1has nothing to do with BigScape GCF, and it's just the NCBI accession number of the genome. - This directory contains the output of BigScape. If the directory is not provided, NPLinker will
run BigScape automatically to generate it using the AntiSMASH BGC data.
If you provide the BigScape output, you just need to provide output from v1 or v2, not both. mix_clustering_c0.30.tsvis an example output of BigScape v1. The file name must follow the patternmix_clustering_c{cutoff}.tsv, where{cutoff}is the cutoff value used in the BigScape run.data_sqlite.dbis the output of BigScape v2.- The
bigscape_running_outputdirectory is automatically created and managed by NPLinker. It stores the output data of BigScape. Users should not interfere with this directory and its content.
[A]means the directory is automatically created and/or managed by NPLinker. downloadsdirectory is automatically created and managed by NPLinker. It stores the downloaded data from the internet. Users can also use it to store their own downloaded data.mibigdirectory contains the MIBiG metadata, which is automatically created and downloaded by NPLinker. Users should not interfere with this directory and its content.outputdirectory is automatically created by NPLinker. It stores the output data of NPLinker.- It's flexible to extend NPLinker by adding other types of data.
Tip
[F]means the file or directory name is fixed and must be named as shown. The names are defined in the defaults module.[O]means the file or directory is optional for users to provide. It does not mean the file or directory is optional for NPLinker to use. If it's not provided by the user, NPLinker may generate it.[A]means the directory is automatically created and/or managed by NPLinker.