GNPS
nplinker.metabolomics.gnps
¶
GNPSFormat
¶
GNPSDownloader
¶
Download GNPS zip archive for the given task id.
Concept
Note that only GNPS workflows listed in the GNPSFormat enum are supported.
Attributes:
-
GNPS_DATA_DOWNLOAD_URL
(str
) –URL template for downloading GNPS data.
-
GNPS_DATA_DOWNLOAD_URL_FBMN
(str
) –URL template for downloading GNPS data for FBMN.
Parameters:
-
task_id
(str
) –GNPS task id, identifying the data to be downloaded.
-
download_root
(str | PathLike
) –Path where to store the downloaded archive.
-
gnps_version
(str
, default:'1'
) –Version of GNPS platform that has been used to run the task. Available values are "1" and "2". Choose "1" if the platform https://gnps.ucsd.edu/ has been used; or "2" for the platform https://gnps2.org/.
Raises:
-
ValueError
–If the given task id does not correspond to a supported GNPS workflow.
-
ValueError
–If the given GNPS version is not valid.
Examples:
Download GNPS1 job
Download GNPS2 job
Source code in src/nplinker/metabolomics/gnps/gnps_downloader.py
GNPS_DATA_DOWNLOAD_URL
class-attribute
instance-attribute
¶
GNPS_DATA_DOWNLOAD_URL: str = (
"https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task={}&view=download_clustered_spectra"
)
GNPS_DATA_DOWNLOAD_URL_FBMN
class-attribute
instance-attribute
¶
GNPS_DATA_DOWNLOAD_URL_FBMN: str = (
"https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task={}&view=download_cytoscape_data"
)
GNPS2_DATA_DOWNLOAD_URL
class-attribute
instance-attribute
¶
GNPS2_DATA_DOWNLOAD_URL: str = (
"https://gnps2.org/taskzip?task={}"
)
download
¶
Download GNPS data.
Source code in src/nplinker/metabolomics/gnps/gnps_downloader.py
get_download_file
¶
get_download_file() -> str
get_url
¶
get_url() -> str
Get the download URL.
Returns:
-
str
–URL pointing to the GNPS data to be downloaded.
Source code in src/nplinker/metabolomics/gnps/gnps_downloader.py
GNPSExtractor
¶
Extract files from a GNPS molecular networking archive.
Concept
Four files are extracted and renamed to the following names:
- file_mappings(.tsv/.csv)
- spectra.mgf
- molecular_families.tsv
- annotations.tsv
The files to be extracted are selected based on the GNPS workflow type, as described below (in the order of the files above):
- METABOLOMICS-SNETS
- clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.tsv
- METABOLOMICS-SNETS*.mgf
- networkedges_selfloop/*.pairsinfo
- result_specnets_DB/*.tsv
- METABOLOMICS-SNETS-V2
- clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.clustersummary
- METABOLOMICS-SNETS-V2*.mgf
- networkedges_selfloop/*.selfloop
- result_specnets_DB/.tsv
- FEATURE-BASED-MOLECULAR-NETWORKING
- quantification_table/.csv
- spectra/*.mgf
- networkedges_selfloop/*.selfloop
- DB_result/*.tsv
- GNPS2 classical_networking_workflow
- nf_output/clustering/featuretable_reformatted_presence.csv
- nf_output/clustering/specs_ms.mgf
- nf_output/networking/filtered_pairs.tsv
- nf_output/library/merged_results_with_gnps.tsv
- GNPS2 feature_based_molecular_networking_workflow
- nf_output/clustering/featuretable_reformated.csv
- nf_output/clustering/specs_ms.mgf
- nf_output/networking/filtered_pairs.tsv
- nf_output/library/merged_results_with_gnps.tsv
Attributes:
-
gnps_format
(GNPSFormat
) –The GNPS workflow type.
-
extract_dir
(str
) –The path where to extract the files to.
Parameters:
-
file
(str | PathLike
) –The path to the GNPS archive file.
-
extract_dir
(str | PathLike
) –path to the directory where to extract the files to.
Raises:
-
ValueError
–If the given file is an invalid GNPS archive.
Examples:
>>> gnps_extractor = GNPSExtractor("path/to/gnps_archive.zip", "path/to/extract_dir")
>>> gnps_extractor.gnps_format
<GNPSFormat.SNETS: 'METABOLOMICS-SNETS'>
>>> gnps_extractor.extract_dir
'path/to/extract_dir'
Source code in src/nplinker/metabolomics/gnps/gnps_extractor.py
gnps_format
property
¶
gnps_format: GNPSFormat
GNPSSpectrumLoader
¶
Bases: SpectrumLoaderBase
Load mass spectra from the given GNPS MGF file.
Concept
The file mappings file is from GNPS output archive, as described below for each GNPS workflow type:
- METABOLOMICS-SNETS
- METABOLOMICS-SNETS*.mgf
- METABOLOMICS-SNETS-V2
- METABOLOMICS-SNETS-V2*.mgf
- FEATURE-BASED-MOLECULAR-NETWORKING
- spectra/*.mgf
- GNPS2 classical_networking_workflow
- nf_output/clustering/specs_ms.mgf
- GNPS2 feature_based_molecular_networking_workflow
- nf_output/clustering/specs_ms.mgf
Parameters:
Raises:
-
ValueError
–Raises ValueError if the file is not valid.
Examples:
Source code in src/nplinker/metabolomics/gnps/gnps_spectrum_loader.py
GNPSMolecularFamilyLoader
¶
Bases: MolecularFamilyLoaderBase
Load molecular families from GNPS data.
Concept
The molecular family file is from GNPS output archive, as described below for each GNPS workflow type:
- METABOLOMICS-SNETS
- networkedges_selfloop/*.pairsinfo
- METABOLOMICS-SNETS-V2
- networkedges_selfloop/*.selfloop
- FEATURE-BASED-MOLECULAR-NETWORKING
- networkedges_selfloop/*.selfloop
- GNPS2 classical_networking_workflow
- nf_output/networking/filtered_pairs.tsv
- GNPS2 feature_based_molecular_networking_workflow
- nf_output/networking/filtered_pairs.tsv
The ComponentIndex
column in the GNPS molecular family file is treated
as family id.
But for molecular families that have only one member (i.e. spectrum),
named singleton molecular families, their files have the same value of
-1
in the ComponentIndex
column. To make the family id unique,the
spectrum id plus a prefix singleton-
is used as the family id of
singleton molecular families.
Parameters:
Raises:
-
ValueError
–Raises ValueError if the file is not valid.
Examples:
>>> loader = GNPSMolecularFamilyLoader("gnps_molecular_families.tsv")
>>> print(loader.families)
[<MolecularFamily 1>, <MolecularFamily 2>, ...]
>>> print(loader.families[0].spectra_ids)
{'1', '3', '7', ...}
Source code in src/nplinker/metabolomics/gnps/gnps_molecular_family_loader.py
get_mfs
¶
get_mfs(
keep_singleton: bool = False,
) -> list[MolecularFamily]
Get MolecularFamily objects.
Parameters:
-
keep_singleton
(bool
, default:False
) –True to keep singleton molecular families. A singleton molecular family is a molecular family that contains only one spectrum.
Returns:
-
list[MolecularFamily]
–A list of MolecularFamily objects with their spectra ids.
Source code in src/nplinker/metabolomics/gnps/gnps_molecular_family_loader.py
GNPSAnnotationLoader
¶
Bases: AnnotationLoaderBase
Load annotations from GNPS output file.
Concept
The annotation file is a .tsv
file from GNPS output archive, as described
below for each GNPS workflow type:
- METABOLOMICS-SNETS
- result_specnets_DB/*.tsv
- METABOLOMICS-SNETS-V2
- result_specnets_DB/.tsv
- FEATURE-BASED-MOLECULAR-NETWORKING
- DB_result/*.tsv
- GNPS2 classical_networking_workflow
- nf_output/library/merged_results_with_gnps.tsv
- GNPS2 feature_based_molecular_networking_workflow
- nf_output/library/merged_results_with_gnps.tsv
Parameters:
Examples:
>>> loader = GNPSAnnotationLoader("gnps_annotations.tsv")
>>> print(loader.annotations["100"])
{'#Scan#': '100',
'Adduct': 'M+H',
'CAS_Number': 'N/A',
'Charge': '1',
'Compound_Name': 'MLS002153841-01!Iobenguane sulfate',
'Compound_Source': 'NIH Pharmacologically Active Library',
'Data_Collector': 'VP/LMS',
'ExactMass': '274.992',
'INCHI': 'N/A',
'INCHI_AUX': 'N/A',
'Instrument': 'qTof',
'IonMode': 'Positive',
'Ion_Source': 'LC-ESI',
'LibMZ': '276.003',
'LibraryName': 'lib-00014.mgf',
'LibraryQualityString': 'Gold',
'Library_Class': '1',
'MQScore': '0.704152',
'MZErrorPPM': '405416',
'MassDiff': '111.896',
'Organism': 'GNPS-NIH-SMALLMOLECULEPHARMACOLOGICALLYACTIVE',
'PI': 'Dorrestein',
'Precursor_MZ': '276.003',
'Pubmed_ID': 'N/A',
'RT_Query': '795.979',
'SharedPeaks': '7',
'Smiles': 'NC(=N)NCc1cccc(I)c1.OS(=O)(=O)O',
'SpecCharge': '1',
'SpecMZ': '164.107',
'SpectrumFile': 'spectra/specs_ms.pklbin',
'SpectrumID': 'CCMSLIB00000086167',
'TIC_Query': '986.997',
'UpdateWorkflowName': 'UPDATE-SINGLE-ANNOTATED-GOLD',
'tags': ' ',
'png_url': 'https://metabolomics-usi.gnps2.org/png/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',
'json_url': 'https://metabolomics-usi.gnps2.org/json/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',
'svg_url': 'https://metabolomics-usi.gnps2.org/svg/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',
'spectrum_url': 'https://metabolomics-usi.gnps2.org/spectrum/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167'}
Source code in src/nplinker/metabolomics/gnps/gnps_annotation_loader.py
GNPSFileMappingLoader
¶
Bases: FileMappingLoaderBase
Class to load file mappings from GNPS output file.
Concept
File mappings refers to the mapping from spectrum id to files in which this spectrum occurs.
The file mappings file is from GNPS output archive, as described below for each GNPS workflow type:
- METABOLOMICS-SNETS
- clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.tsv
- METABOLOMICS-SNETS-V2
- clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.clustersummary (.tsv file)
- FEATURE-BASED-MOLECULAR-NETWORKING
- quantification_table/.csv
- GNPS2 classical_networking_workflow
- nf_output/clustering/featuretable_reformatted_presence.csv
- GNPS2 feature_based_molecular_networking_workflow
- nf_output/clustering/featuretable_reformated.csv
The tsv
files from different workflows have different headers, while the .csv
files from
different workflows have consistent headers.
Parameters:
Raises:
-
ValueError
–Raises ValueError if the file is not valid.
Examples:
>>> loader = GNPSFileMappingLoader("gnps_file_mappings.tsv")
>>> print(loader.mappings["1"])
['26c.mzXML']
>>> print(loader.mapping_reversed["26c.mzXML"])
{'1', '3', '7', ...}
Source code in src/nplinker/metabolomics/gnps/gnps_file_mapping_loader.py
mappings
property
¶
gnps_format_from_archive
¶
gnps_format_from_archive(
file: str | PathLike,
) -> GNPSFormat
Detect GNPS format or workflow from GNPS archive file.
GNPS archive files can be in two formats: GNPS1 (.zip) and GNPS2 (.tar).
For GNPS1 data, the detection of workflow format is based on the filename of the zip archive and the names of the files contained in the zip archive.
For GNPS2 data, the workflow format is taken from the submission_parameters.yaml
file in the
tar archive, which has a key workflowname
.
Parameters:
Returns:
-
GNPSFormat
–The format identified in the GNPS archive file.
Examples:
>>> gnps_format_from_archive("ProteoSAFe-METABOLOMICS-SNETS-c22f44b1-download_clustered_spectra.zip")
<GNPSFormat.SNETS: 'METABOLOMICS-SNETS'>
>>> gnps_format_from_archive("ProteoSAFe-METABOLOMICS-SNETS-V2-189e8bf1-download_clustered_spectra.zip")
<GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'>
>>> gnps_format_from_archive("ProteoSAFe-FEATURE-BASED-MOLECULAR-NETWORKING-672d0a53-download_cytoscape_data.zip")
<GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'>
>>> gnps_format_from_archive("206a7b40b7ed41c1ae6b4fbd2def3636.tar")
<GNPSFormat.GNPS2CN: 'classical_networking_workflow'>
>>> gnps_format_from_archive("2014f321d72542afb5216c932e0d5079.tar")
<GNPSFormat.GNPS2FBMN: 'feature_based_molecular_networking_workflow'>