GNPS
nplinker.metabolomics.gnps
¶
GNPSFormat
¶
Bases: Enum
Enum class for GNPS formats or workflows.
Concept
The name of the enum is a short name for the workflow, and the value of the enum is the workflow name used on the GNPS website.
GNPSDownloader
¶
Download GNPS zip archive for the given task id.
Concept
Note that only GNPS workflows listed in the GNPSFormat enum are supported.
Attributes:
-
GNPS_DATA_DOWNLOAD_URL
(str
) –URL template for downloading GNPS data.
-
GNPS_DATA_DOWNLOAD_URL_FBMN
(str
) –URL template for downloading GNPS data for FBMN.
-
gnps_format
(GNPSFormat
) –GNPS workflow type.
Parameters:
-
task_id
(str
) –GNPS task id, identifying the data to be downloaded.
-
download_root
(str | PathLike
) –Path where to store the downloaded archive.
Raises:
-
ValueError
–If the given task id does not correspond to a supported GNPS workflow.
Examples:
Source code in src/nplinker/metabolomics/gnps/gnps_downloader.py
GNPS_DATA_DOWNLOAD_URL
class-attribute
instance-attribute
¶
GNPS_DATA_DOWNLOAD_URL: str = (
"https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task={}&view=download_clustered_spectra"
)
GNPS_DATA_DOWNLOAD_URL_FBMN
class-attribute
instance-attribute
¶
GNPS_DATA_DOWNLOAD_URL_FBMN: str = (
"https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task={}&view=download_cytoscape_data"
)
gnps_format
property
¶
gnps_format: GNPSFormat
download
¶
Download GNPS data.
Note: GNPS data is downloaded using the POST method (empty payload is OK).
Source code in src/nplinker/metabolomics/gnps/gnps_downloader.py
get_download_file
¶
get_download_file() -> str
GNPSExtractor
¶
Extract files from a GNPS molecular networking archive (.zip).
Concept
Four files are extracted and renamed to the following names:
- file_mappings(.tsv/.csv)
- spectra.mgf
- molecular_families.tsv
- annotations.tsv
The files to be extracted are selected based on the GNPS workflow type, as described below (in the order of the files above):
- METABOLOMICS-SNETS
- clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.tsv
- METABOLOMICS-SNETS*.mgf
- networkedges_selfloop/*.pairsinfo
- result_specnets_DB/*.tsv
- METABOLOMICS-SNETS-V2
- clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.clustersummary
- METABOLOMICS-SNETS-V2*.mgf
- networkedges_selfloop/*.selfloop
- result_specnets_DB/.tsv
- FEATURE-BASED-MOLECULAR-NETWORKING
- quantification_table/.csv
- spectra/*.mgf
- networkedges_selfloop/*.selfloop
- DB_result/*.tsv
Attributes:
-
gnps_format
(GNPSFormat
) –The GNPS workflow type.
-
extract_dir
(str
) –The path where to extract the files to.
Parameters:
-
file
(str | PathLike
) –The path to the GNPS zip file.
-
extract_dir
(str | PathLike
) –path to the directory where to extract the files to.
Raises:
-
ValueError
–If the given file is an invalid GNPS archive.
Examples:
>>> gnps_extractor = GNPSExtractor("path/to/gnps_archive.zip", "path/to/extract_dir")
>>> gnps_extractor.gnps_format
<GNPSFormat.SNETS: 'METABOLOMICS-SNETS'>
>>> gnps_extractor.extract_dir
'path/to/extract_dir'
Source code in src/nplinker/metabolomics/gnps/gnps_extractor.py
gnps_format
property
¶
gnps_format: GNPSFormat
GNPSSpectrumLoader
¶
Bases: SpectrumLoaderBase
Load mass spectra from the given GNPS MGF file.
Concept
The file mappings file is from GNPS output archive, as described below for each GNPS workflow type:
- METABOLOMICS-SNETS
- METABOLOMICS-SNETS*.mgf
- METABOLOMICS-SNETS-V2
- METABOLOMICS-SNETS-V2*.mgf
- FEATURE-BASED-MOLECULAR-NETWORKING
- spectra/*.mgf
Parameters:
Raises:
-
ValueError
–Raises ValueError if the file is not valid.
Examples:
Source code in src/nplinker/metabolomics/gnps/gnps_spectrum_loader.py
GNPSMolecularFamilyLoader
¶
Bases: MolecularFamilyLoaderBase
Load molecular families from GNPS data.
Concept
The molecular family file is from GNPS output archive, as described below for each GNPS workflow type:
- METABOLOMICS-SNETS
- networkedges_selfloop/*.pairsinfo
- METABOLOMICS-SNETS-V2
- networkedges_selfloop/*.selfloop
- FEATURE-BASED-MOLECULAR-NETWORKING
- networkedges_selfloop/*.selfloop
The ComponentIndex
column in the GNPS molecular family file is treated
as family id.
But for molecular families that have only one member (i.e. spectrum),
named singleton molecular families, their files have the same value of
-1
in the ComponentIndex
column. To make the family id unique,the
spectrum id plus a prefix singleton-
is used as the family id of
singleton molecular families.
Parameters:
Raises:
-
ValueError
–Raises ValueError if the file is not valid.
Examples:
>>> loader = GNPSMolecularFamilyLoader("gnps_molecular_families.tsv")
>>> print(loader.families)
[<MolecularFamily 1>, <MolecularFamily 2>, ...]
>>> print(loader.families[0].spectra_ids)
{'1', '3', '7', ...}
Source code in src/nplinker/metabolomics/gnps/gnps_molecular_family_loader.py
get_mfs
¶
get_mfs(
keep_singleton: bool = False,
) -> list[MolecularFamily]
Get MolecularFamily objects.
Parameters:
-
keep_singleton
(bool
, default:False
) –True to keep singleton molecular families. A singleton molecular family is a molecular family that contains only one spectrum.
Returns:
-
list[MolecularFamily]
–A list of MolecularFamily objects with their spectra ids.
Source code in src/nplinker/metabolomics/gnps/gnps_molecular_family_loader.py
GNPSAnnotationLoader
¶
Bases: AnnotationLoaderBase
Load annotations from GNPS output file.
Concept
The annotation file is a .tsv
file from GNPS output archive, as described
below for each GNPS workflow type:
- METABOLOMICS-SNETS
- result_specnets_DB/*.tsv
- METABOLOMICS-SNETS-V2
- result_specnets_DB/.tsv
- FEATURE-BASED-MOLECULAR-NETWORKING
- DB_result/*.tsv
Parameters:
Examples:
>>> loader = GNPSAnnotationLoader("gnps_annotations.tsv")
>>> print(loader.annotations["100"])
{'#Scan#': '100',
'Adduct': 'M+H',
'CAS_Number': 'N/A',
'Charge': '1',
'Compound_Name': 'MLS002153841-01!Iobenguane sulfate',
'Compound_Source': 'NIH Pharmacologically Active Library',
'Data_Collector': 'VP/LMS',
'ExactMass': '274.992',
'INCHI': 'N/A',
'INCHI_AUX': 'N/A',
'Instrument': 'qTof',
'IonMode': 'Positive',
'Ion_Source': 'LC-ESI',
'LibMZ': '276.003',
'LibraryName': 'lib-00014.mgf',
'LibraryQualityString': 'Gold',
'Library_Class': '1',
'MQScore': '0.704152',
'MZErrorPPM': '405416',
'MassDiff': '111.896',
'Organism': 'GNPS-NIH-SMALLMOLECULEPHARMACOLOGICALLYACTIVE',
'PI': 'Dorrestein',
'Precursor_MZ': '276.003',
'Pubmed_ID': 'N/A',
'RT_Query': '795.979',
'SharedPeaks': '7',
'Smiles': 'NC(=N)NCc1cccc(I)c1.OS(=O)(=O)O',
'SpecCharge': '1',
'SpecMZ': '164.107',
'SpectrumFile': 'spectra/specs_ms.pklbin',
'SpectrumID': 'CCMSLIB00000086167',
'TIC_Query': '986.997',
'UpdateWorkflowName': 'UPDATE-SINGLE-ANNOTATED-GOLD',
'tags': ' ',
'png_url': 'https://metabolomics-usi.gnps2.org/png/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',
'json_url': 'https://metabolomics-usi.gnps2.org/json/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',
'svg_url': 'https://metabolomics-usi.gnps2.org/svg/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',
'spectrum_url': 'https://metabolomics-usi.gnps2.org/spectrum/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167'}
Source code in src/nplinker/metabolomics/gnps/gnps_annotation_loader.py
GNPSFileMappingLoader
¶
Bases: FileMappingLoaderBase
Class to load file mappings from GNPS output file.
Concept
File mappings refers to the mapping from spectrum id to files in which this spectrum occurs.
The file mappings file is from GNPS output archive, as described below for each GNPS workflow type:
- METABOLOMICS-SNETS
- clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.tsv
- METABOLOMICS-SNETS-V2
- clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.clustersummary
- FEATURE-BASED-MOLECULAR-NETWORKING
- quantification_table/.csv
Parameters:
Raises:
-
ValueError
–Raises ValueError if the file is not valid.
Examples:
>>> loader = GNPSFileMappingLoader("gnps_file_mappings.tsv")
>>> print(loader.mappings["1"])
['26c.mzXML']
>>> print(loader.mapping_reversed["26c.mzXML"])
{'1', '3', '7', ...}
Source code in src/nplinker/metabolomics/gnps/gnps_file_mapping_loader.py
mappings
property
¶
gnps_format_from_archive
¶
gnps_format_from_archive(
zip_file: str | PathLike,
) -> GNPSFormat
Detect GNPS format from GNPS zip archive.
The detection is based on the filename of the zip file and the names of the files contained in the zip file.
Parameters:
Returns:
-
GNPSFormat
–The format identified in the GNPS zip file.
Examples:
>>> gnps_format_from_archive("ProteoSAFe-METABOLOMICS-SNETS-c22f44b1-download_clustered_spectra.zip")
<GNPSFormat.SNETS: 'METABOLOMICS-SNETS'>
>>> gnps_format_from_archive("ProteoSAFe-METABOLOMICS-SNETS-V2-189e8bf1-download_clustered_spectra.zip")
<GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'>
>>> gnps_format_from_archive("ProteoSAFe-FEATURE-BASED-MOLECULAR-NETWORKING-672d0a53-download_cytoscape_data.zip")
<GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'>
Source code in src/nplinker/metabolomics/gnps/gnps_format.py
gnps_format_from_file_mapping
¶
gnps_format_from_file_mapping(
file: str | PathLike,
) -> GNPSFormat
Detect GNPS format from the given file mapping file.
The GNPS file mapping file is located in different folders depending on the GNPS workflow. Here are the locations in corresponding GNPS zip archives:
METABOLOMICS-SNETS
workflow: the.tsv
file in the folderclusterinfosummarygroup_attributes_withIDs_withcomponentID
METABOLOMICS-SNETS-V2
workflow: the.clustersummary
file (tsv) in the folderclusterinfosummarygroup_attributes_withIDs_withcomponentID
FEATURE-BASED-MOLECULAR-NETWORKING
workflow: the.csv
file in the folderquantification_table
Parameters:
Returns:
-
GNPSFormat
–GNPS format identified in the file.
Source code in src/nplinker/metabolomics/gnps/gnps_format.py
gnps_format_from_task_id
¶
gnps_format_from_task_id(task_id: str) -> GNPSFormat
Detect GNPS format for the given task id.
Parameters:
-
task_id
(str
) –GNPS task id.
Returns:
-
GNPSFormat
–The format identified in the GNPS task.
Examples:
>>> gnps_format_from_task_id("c22f44b14a3d450eb836d607cb9521bb")
<GNPSFormat.SNETS: 'METABOLOMICS-SNETS'>
>>> gnps_format_from_task_id("189e8bf16af145758b0a900f1c44ff4a")
<GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'>
>>> gnps_format_from_task_id("92036537c21b44c29e509291e53f6382")
<GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'>
>>> gnps_format_from_task_id("0ad6535e34d449788f297e712f43068a")
<GNPSFormat.Unknown: 'Unknown-GNPS-Workflow'>