Working Directory Structure¶
NPLinker requires a fixed structure of working directory with fixed names for the input and output data.
root_dir # (1)!
│
├── nplinker.toml # (2)!
├── strain_mappings.json [F] # (3)!
├── strains_selected.json [F][O] # (4)!
│
├── gnps [F] # (5)!
│ ├── spectra.mgf [F]
│ ├── molecular_families.tsv [F]
│ ├── annotations.tsv [F]
│ └── file_mappings.tsv (.csv) [F] # (6)!
│
├── antismash [F] # (7)!
│ ├── GCF_000514975.1
│ │ ├── xxx.region001.gbk
│ │ └── ...
│ ├── GCF_000016425.1
│ │ ├── xxxx.region001.gbk
│ │ └── ...
│ └── ...
│
├── bigscape [F][O] # (8)!
│ ├── mix_clustering_c0.30.tsv [F] # (9)!
│ └── bigscape_running_output
│ └── ...
│
├── downloads [F][A] # (10)!
│ ├── paired_datarecord_4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.json # (11)!
│ ├── GCF_000016425.1.zip
│ ├── GCF_0000514975.1.zip
│ ├── c22f44b14a3d450eb836d607cb9521bb.zip
│ ├── genome_status.json
│ └── mibig_json_3.1.tar.gz
│
├── mibig [F][A] # (12)!
│ ├── BGC0000001.json
│ ├── BGC0000002.json
│ └── ...
│
├── output [F][A] # (13)!
│ └── ...
│
└── ... # (14)!
root_dir
is the working directory you created, used as the root directory for NPLinker.nplinker.toml
is the configuration file (toml format) provided by the user for running NPLinker.strain_mappings.json
contains the mappings from strain to genomics and metabolomics data. It is generated by NPLinker forpodp
mode; forlocal
mode, users need to create it manually.
[F]
means the file namenplinker.toml
is a fixed name (including the extension) and must be named as shown.strains_selected.json
is an optional file containing the list of strains to be used in the analysis. If it is not provided, NPLinker will use all strains detected from the input data.
[O]
means the filestrains_selected.json
is optional for users to provide.gnps
directory contains the GNPS data. The files in this directory must be named as shown. See XXX for more information about the GNPS data.- This file could be
.tsv
or.csv
format. antismash
directory contains a collection of AntiSMASH BGC data. The BGC data (*.region*.gbk
files) must be stored in subdirectories named after NCBI accession number (e.g.GCF_000514975.1
).bigscape
directory is optional and contains the output of BigScape. If the directory is not provided, NPLinker will run BigScape automatically to generate the data using the AntiSMASH BGC data.mix_clustering_c0.30.tsv
is an example output of BigScape. The file name must follow the patternmix_clustering_c{cutoff}.tsv
, where{cutoff}
is the cutoff value used in the BigScape run.downloads
directory is automatically created and managed by NPLinker. It stores the downloaded data from the internet. Users can also use it to store their own downloaded data.
[A]
means the directory is automatically created and/or managed by NPLinker.- This is an example file, the actual file would be different. Same as the other files in
the
downloads
directory. mibig
directory contains the MIBiG metadata, which is automatically created and downloaded by NPLinker. Users should not interfere with this directory and its content.output
directory is automatically created by NPLinker. It stores the output data of NPLinker.- It's flexible to extend NPLinker by adding other types of data.
Tip
[F]
means the file or directory name is fixed and must be named as shown. The names are defined in the defaults module.[O]
means the file or directory is optional for users to provide. It does not mean the file or directory is optional for NPLinker to use. If it's not provided by the user, NPLinker may generate it.[A]
means the directory is automatically created and/or managed by NPLinker.