AntiSMASH
nplinker.genomics.antismash
¶
AntismashBGCLoader
¶
Bases: BGCLoaderBase
Data loader for AntiSMASH BGC genbank (.gbk) files.
Parameters:
-
data_dir
(str | PathLike
) –Path to AntiSMASH directory that contains a collection of AntiSMASH outputs.
Notes
The input data_dir
must follow the structure defined in the
Working Directory Structure for AntiSMASH data, e.g.:
Source code in src/nplinker/genomics/antismash/antismash_loader.py
get_bgc_genome_mapping
¶
Get the mapping from BGC to genome.
Info
The directory name of the gbk files is treated as genome id.
Returns:
-
dict[str, str]
–The key is BGC name (gbk file name) and value is genome id (the directory name of the
-
dict[str, str]
–gbk file).
Source code in src/nplinker/genomics/antismash/antismash_loader.py
get_files
¶
Get BGC gbk files.
Returns:
GenomeStatus
¶
GenomeStatus(
original_id: str,
resolved_refseq_id: str = "",
resolve_attempted: bool = False,
bgc_path: str = "",
)
Class to represent the status of a single genome.
The status of genomes is tracked in the file GENOME_STATUS_FILENAME.
Parameters:
-
original_id
(str
) –The original ID of the genome.
-
resolved_refseq_id
(str
, default:''
) –The resolved RefSeq ID of the genome. Defaults to "".
-
resolve_attempted
(bool
, default:False
) –A flag indicating whether an attempt to resolve the RefSeq ID has been made. Defaults to False.
-
bgc_path
(str
, default:''
) –The path to the downloaded BGC file for the genome. Defaults to "".
Source code in src/nplinker/genomics/antismash/podp_antismash_downloader.py
resolved_refseq_id
instance-attribute
¶
resolved_refseq_id = (
""
if resolved_refseq_id == "None"
else resolved_refseq_id
)
read_json
staticmethod
¶
Get a dict of GenomeStatus objects by loading given genome status file.
Note that an empty dict is returned if the given file doesn't exist.
Parameters:
Returns:
-
dict[str, 'GenomeStatus']
–Dict keys are genome original id and values are GenomeStatus objects. An empty dict is returned if the given file doesn't exist.
Source code in src/nplinker/genomics/antismash/podp_antismash_downloader.py
to_json
staticmethod
¶
to_json(
genome_status_dict: Mapping[str, "GenomeStatus"],
file: str | PathLike | None = None,
) -> str | None
Convert the genome status dictionary to a JSON string.
If a file path is provided, the JSON string is written to the file. If the file already exists, it is overwritten.
Parameters:
-
genome_status_dict
(Mapping[str, 'GenomeStatus']
) –A dictionary of genome status objects. The keys are the original genome IDs and the values are GenomeStatus objects.
-
file
(str | PathLike | None
, default:None
) –The path to the output JSON file. If None, the JSON string is returned but not written to a file.
Returns:
-
str | None
–The JSON string if
file
is None, otherwise None.
Source code in src/nplinker/genomics/antismash/podp_antismash_downloader.py
download_and_extract_antismash_data
¶
download_and_extract_antismash_data(
antismash_id: str,
download_root: str | PathLike,
extract_root: str | PathLike,
) -> None
Download and extract antiSMASH BGC archive for a specified genome.
The antiSMASH database (https://antismash-db.secondarymetabolites.org/) is used to download the BGC archive. And antiSMASH use RefSeq assembly id of a genome as the id of the archive.
Parameters:
-
antismash_id
(str
) –The id used to download BGC archive from antiSMASH database. If the id is versioned (e.g., "GCF_004339725.1") please be sure to specify the version as well.
-
download_root
(str | PathLike
) –Path to the directory to place downloaded archive in.
-
extract_root
(str | PathLike
) –Path to the directory data files will be extracted to. Note that an
antismash
directory will be created in the specifiedextract_root
if it doesn't exist. The files will be extracted to<extract_root>/antismash/<antismash_id>
directory.
Raises:
-
ValueError
–if
<extract_root>/antismash/<refseq_assembly_id>
dir is not empty.
Examples:
Source code in src/nplinker/genomics/antismash/antismash_downloader.py
parse_bgc_genbank
¶
Parse a single BGC gbk file to BGC object.
Parameters:
Returns:
-
BGC
–BGC object
Examples:
>>> bgc = AntismashBGCLoader.parse_bgc(
... "/data/antismash/GCF_000016425.1/NC_009380.1.region001.gbk")
Source code in src/nplinker/genomics/antismash/antismash_loader.py
get_best_available_genome_id
¶
Get the best available ID from genome_id_data dict.
Parameters:
-
genome_id_data
(Mapping[str, str]
) –dictionary containing information for each genome record present.
Returns:
-
str | None
–ID for the genome, if present, otherwise None.
Source code in src/nplinker/genomics/antismash/podp_antismash_downloader.py
podp_download_and_extract_antismash_data
¶
podp_download_and_extract_antismash_data(
genome_records: Sequence[
Mapping[str, Mapping[str, str]]
],
project_download_root: str | PathLike,
project_extract_root: str | PathLike,
)
Download and extract antiSMASH BGC archive for the given genome records.
Parameters:
-
genome_records
(Sequence[Mapping[str, Mapping[str, str]]]
) –list of dicts representing genome records.
The dict of each genome record contains a key of genome ID with a value of another dict containing information about genome type, label and accession ids (RefSeq, GenBank, and/or JGI).
-
project_download_root
(str | PathLike
) –Path to the directory to place downloaded archive in.
-
project_extract_root
(str | PathLike
) –Path to the directory downloaded archive will be extracted to.
Note that an
antismash
directory will be created in the specifiedextract_root
if it doesn't exist. The files will be extracted to<extract_root>/antismash/<antismash_id>
directory.
Warns:
-
UserWarning
–when no antiSMASH data is found for some genomes.
Source code in src/nplinker/genomics/antismash/podp_antismash_downloader.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
|