Skip to content

Working Directory Structure

NPLinker requires a fixed structure of working directory with fixed names for the input and output data.

root_dir # (1)!
        ├── nplinker.toml                           # (2)!
    ├── strain_mappings.json                [F] # (3)!
    ├── strains_selected.json               [F][O] # (4)!
        ├── gnps                                [F] # (5)!
           ├── spectra.mgf                 [F]
           ├── molecular_families.tsv      [F]
           ├── annotations.tsv             [F]
           └── file_mappings.tsv (.csv)    [F] # (6)!
        ├── antismash                           [F] # (7)!
       ├── GCF_000514975.1
          ├── xxx.region001.gbk
          └── ...
       ├── GCF_000016425.1
          ├── xxxx.region001.gbk
          └── ...
       └── ...
        ├── bigscape                            [F][O] # (8)!
       ├── mix_clustering_c0.30.tsv        [F]    # (9)!
       └── bigscape_running_output
           └── ...
        ├── downloads                           [F][A] # (10)!
           ├── paired_datarecord_4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.json # (11)!
           ├── GCF_000016425.1.zip
           ├── GCF_0000514975.1.zip
           ├── c22f44b14a3d450eb836d607cb9521bb.zip
           ├── genome_status.json
           └── mibig_json_3.1.tar.gz
        ├── mibig                               [F][A] # (12)!
       ├── BGC0000001.json
       ├── BGC0000002.json
       └── ...
        ├── output                              [F][A] # (13)!
       └── ...
        └── ...                                        # (14)!
  1. root_dir is the working directory you created, used as the root directory for NPLinker.
  2. nplinker.toml is the configuration file (toml format) provided by the user for running NPLinker.
  3. strain_mappings.json contains the mappings from strain to genomics and metabolomics data. It is generated by NPLinker for podp mode; for local mode, users need to create it manually.
    [F] means the file name nplinker.toml is a fixed name (including the extension) and must be named as shown.
  4. strains_selected.json is an optional file containing the list of strains to be used in the analysis. If it is not provided, NPLinker will use all strains detected from the input data.
    [O] means the file strains_selected.json is optional for users to provide.
  5. gnps directory contains the GNPS data. The files in this directory must be named as shown. See XXX for more information about the GNPS data.
  6. This file could be .tsv or .csv format.
  7. antismash directory contains a collection of AntiSMASH BGC data. The BGC data (*.region*.gbk files) must be stored in subdirectories named after NCBI accession number (e.g. GCF_000514975.1).
  8. bigscape directory is optional and contains the output of BigScape. If the directory is not provided, NPLinker will run BigScape automatically to generate the data using the AntiSMASH BGC data.
  9. mix_clustering_c0.30.tsv is an example output of BigScape. The file name must follow the pattern mix_clustering_c{cutoff}.tsv, where {cutoff} is the cutoff value used in the BigScape run.
  10. downloads directory is automatically created and managed by NPLinker. It stores the downloaded data from the internet. Users can also use it to store their own downloaded data.
    [A] means the directory is automatically created and/or managed by NPLinker.
  11. This is an example file, the actual file would be different. Same as the other files in the downloads directory.
  12. mibig directory contains the MIBiG metadata, which is automatically created and downloaded by NPLinker. Users should not interfere with this directory and its content.
  13. output directory is automatically created by NPLinker. It stores the output data of NPLinker.
  14. It's flexible to extend NPLinker by adding other types of data.

Tip

  • [F] means the file or directory name is fixed and must be named as shown. The names are defined in the defaults module.
  • [O] means the file or directory is optional for users to provide. It does not mean the file or directory is optional for NPLinker to use. If it's not provided by the user, NPLinker may generate it.
  • [A] means the directory is automatically created and/or managed by NPLinker.