Skip to content

Data Models




Class to represent the links between objects in NPLinker.

This class wraps the networkx.Graph class to provide a more user-friendly interface for working with the links.

The links between objects are stored as edges in a graph, while the objects themselves are stored as nodes.

The scoring data for each link (or link data) is stored as the key/value attributes of the edge.


Create a LinkGraph object:

>>> lg = LinkGraph()

Display the empty LinkGraph object:

>>> lg
|   index |   genomic_object_id |   genomic_object_type |   metabolomic_object_id |   metabolomic_object_type |   metcalf_score |   rosetta_score |

Add a link between a GCF and a Spectrum object:

>>> lg.add_link(gcf, spectrum, metcalf=Score("metcalf", 1.0, {"cutoff": 0.5}))

Display all links in LinkGraph object:

>>> lg
|   index |   genomic_object_id |   genomic_object_type |   metabolomic_object_id |   metabolomic_object_type |   metcalf_score |   rosetta_score |
|       1 |                   1 |                   GCF |                       1 |                  Spectrum |            1.00 |                 |

Get all links for a given object:

>>> lg[gcf]
{spectrum: {"metcalf": Score("metcalf", 1.0, {"cutoff": 0.5})}}

Get all links in the LinkGraph:

>>> lg.links
[(gcf, spectrum, {"metcalf": Score("metcalf", 1.0, {"cutoff": 0.5})})]

Check if there is a link between two objects:

>>> lg.has_link(gcf, spectrum)

Get the link data between two objects:

>>> lg.get_link_data(gcf, spectrum)
{"metcalf": Score("metcalf", 1.0, {"cutoff": 0.5})}

Filter the links for gcf1 and gcf2:

>>> new_lg = lg.filter([gcf1, gcf2])

Filter the links for spectrum1 and spectrum2:

>>> new_lg = lg.filter([spectrum1, spectrum2])

Filter the links between two lists of objects:

>>> new_lg = lg.filter([gcf1, gcf2], [spectrum1, spectrum2])

Export the links to a file:

>>> lg.to_tsv("links.tsv")
Source code in src/nplinker/scoring/
def __init__(self) -> None:
    """Initialize a LinkGraph object.

        Create a LinkGraph object:
        >>> lg = LinkGraph()

        Display the empty LinkGraph object:
        >>> lg
        |   index |   genomic_object_id |   genomic_object_type |   metabolomic_object_id |   metabolomic_object_type |   metcalf_score |   rosetta_score |

        Add a link between a GCF and a Spectrum object:
        >>> lg.add_link(gcf, spectrum, metcalf=Score("metcalf", 1.0, {"cutoff": 0.5}))

        Display all links in LinkGraph object:
        >>> lg
        |   index |   genomic_object_id |   genomic_object_type |   metabolomic_object_id |   metabolomic_object_type |   metcalf_score |   rosetta_score |
        |       1 |                   1 |                   GCF |                       1 |                  Spectrum |            1.00 |                 |

        Get all links for a given object:
        >>> lg[gcf]
        {spectrum: {"metcalf": Score("metcalf", 1.0, {"cutoff": 0.5})}}

        Get all links in the LinkGraph:
        >>> lg.links
        [(gcf, spectrum, {"metcalf": Score("metcalf", 1.0, {"cutoff": 0.5})})]

        Check if there is a link between two objects:
        >>> lg.has_link(gcf, spectrum)

        Get the link data between two objects:
        >>> lg.get_link_data(gcf, spectrum)
        {"metcalf": Score("metcalf", 1.0, {"cutoff": 0.5})}

        Filter the links for `gcf1` and `gcf2`:
        >>> new_lg = lg.filter([gcf1, gcf2])

        Filter the links for `spectrum1` and `spectrum2`:
        >>> new_lg = lg.filter([spectrum1, spectrum2])

        Filter the links between two lists of objects:
        >>> new_lg = lg.filter([gcf1, gcf2], [spectrum1, spectrum2])

        Export the links to a file:
        >>> lg.to_tsv("links.tsv")
    self._g: Graph = Graph()
links: list[LINK]

Get all links.


  • list[LINK]

    A list of tuples containing the links between objects.


>>> lg.links
[(gcf, spectrum, {"metcalf": Score("metcalf", 1.0, {"cutoff": 0.5})})]


__repr__() -> str

Return a string representation of the LinkGraph.

Source code in src/nplinker/scoring/
def __repr__(self) -> str:
    """Return a string representation of the LinkGraph."""
    return self._get_table_repr()


__len__() -> int

Get the number of objects.

Source code in src/nplinker/scoring/
def __len__(self) -> int:
    """Get the number of objects."""
    return len(self._g)


__getitem__(u: Entity) -> dict[Entity, LINK_DATA]

Get all links for a given object.


  • u (Entity) –

    the given object


  • dict[Entity, LINK_DATA]

    A dictionary of links for the given object.


  • KeyError

    if the input object is not found in the link graph.

Source code in src/nplinker/scoring/
def __getitem__(self, u: Entity) -> dict[Entity, LINK_DATA]:
    """Get all links for a given object.

        u: the given object

        A dictionary of links for the given object.

        KeyError: if the input object is not found in the link graph.
        links = self._g[u]
    except KeyError:
        raise KeyError(f"{u} not found in the link graph.")

    return {**links}  # type: ignore
add_link(u: Entity, v: Entity, **data: Score) -> None

Add a link between two objects.

The objects u and v must be different types, i.e. one must be a GCF and the other must be a Spectrum or MolecularFamily.


  • u (Entity) –

    the first object, either a GCF, Spectrum, or MolecularFamily

  • v (Entity) –

    the second object, either a GCF, Spectrum, or MolecularFamily

  • data (Score, default: {} ) –

    keyword arguments. At least one scoring method and its data must be provided. The key must be the name of the scoring method defined in ScoringMethod, and the value is a Score object, e.g. metcalf=Score("metcalf", 1.0, {"cutoff": 0.5}).


>>> lg.add_link(gcf, spectrum, metcalf=Score("metcalf", 1.0, {"cutoff": 0.5}))
Source code in src/nplinker/scoring/
def add_link(
    u: Entity,
    v: Entity,
    **data: Score,
) -> None:
    """Add a link between two objects.

    The objects `u` and `v` must be different types, i.e. one must be a GCF and the other must be
    a Spectrum or MolecularFamily.

        u: the first object, either a GCF, Spectrum, or MolecularFamily
        v: the second object, either a GCF, Spectrum, or MolecularFamily
        data: keyword arguments. At least one scoring method and its data must be provided.
            The key must be the name of the scoring method defined in `ScoringMethod`, and the
            value is a `Score` object, e.g. `metcalf=Score("metcalf", 1.0, {"cutoff": 0.5})`.

        >>> lg.add_link(gcf, spectrum, metcalf=Score("metcalf", 1.0, {"cutoff": 0.5}))
    # validate the data
    if not data:
        raise ValueError("At least one scoring method and its data must be provided.")
    for key, value in data.items():
        if not ScoringMethod.has_value(key):
            raise ValueError(
                f"{key} is not a valid name of scoring method. See `ScoringMethod` for valid names."
        if not isinstance(value, Score):
            raise TypeError(f"{value} is not a Score object.")

    self._g.add_edge(u, v, **data)
has_link(u: Entity, v: Entity) -> bool

Check if there is a link between two objects.


  • u (Entity) –

    the first object, either a GCF, Spectrum, or MolecularFamily

  • v (Entity) –

    the second object, either a GCF, Spectrum, or MolecularFamily


  • bool

    True if there is a link between the two objects, False otherwise


>>> lg.has_link(gcf, spectrum)
Source code in src/nplinker/scoring/
def has_link(self, u: Entity, v: Entity) -> bool:
    """Check if there is a link between two objects.

        u: the first object, either a GCF, Spectrum, or MolecularFamily
        v: the second object, either a GCF, Spectrum, or MolecularFamily

        True if there is a link between the two objects, False otherwise

        >>> lg.has_link(gcf, spectrum)
    return self._g.has_edge(u, v)  # type: ignore
get_link_data(u: Entity, v: Entity) -> LINK_DATA | None

Get the data for a link between two objects.


  • u (Entity) –

    the first object, either a GCF, Spectrum, or MolecularFamily

  • v (Entity) –

    the second object, either a GCF, Spectrum, or MolecularFamily


  • LINK_DATA | None

    A dictionary of scoring methods and their data for the link between the two objects, or

  • LINK_DATA | None

    None if there is no link between the two objects.


>>> lg.get_link_data(gcf, spectrum)
{"metcalf": Score("metcalf", 1.0, {"cutoff": 0.5})}
Source code in src/nplinker/scoring/
def get_link_data(
    u: Entity,
    v: Entity,
) -> LINK_DATA | None:
    """Get the data for a link between two objects.

        u: the first object, either a GCF, Spectrum, or MolecularFamily
        v: the second object, either a GCF, Spectrum, or MolecularFamily

        A dictionary of scoring methods and their data for the link between the two objects, or
        None if there is no link between the two objects.

        >>> lg.get_link_data(gcf, spectrum)
        {"metcalf": Score("metcalf", 1.0, {"cutoff": 0.5})}
    return self._g.get_edge_data(u, v)  # type: ignore


    u_nodes: Sequence[Entity],
    v_nodes: Sequence[Entity] = [],
) -> LinkGraph

Return a new LinkGraph object with the filtered links between the given objects.

The new LinkGraph object will only contain the links between u_nodes and v_nodes.

If u_nodes or v_nodes is empty, the new LinkGraph object will contain the links for the given objects in v_nodes or u_nodes, respectively. If both are empty, return an empty LinkGraph object.

Note that not all objects in u_nodes and v_nodes need to be present in the original LinkGraph.


  • u_nodes (Sequence[Entity]) –

    a sequence of objects used as the first object in the links

  • v_nodes (Sequence[Entity], default: [] ) –

    a sequence of objects used as the second object in the links


  • LinkGraph

    A new LinkGraph object with the filtered links between the given objects.


Filter the links for gcf1 and gcf2:

>>> new_lg = lg.filter([gcf1, gcf2])
Filter the links for `spectrum1` and `spectrum2`:
>>> new_lg = lg.filter([spectrum1, spectrum2])
Filter the links between two lists of objects:
>>> new_lg = lg.filter([gcf1, gcf2], [spectrum1, spectrum2])
Source code in src/nplinker/scoring/
def filter(self, u_nodes: Sequence[Entity], v_nodes: Sequence[Entity] = [], /) -> LinkGraph:
    """Return a new LinkGraph object with the filtered links between the given objects.

    The new LinkGraph object will only contain the links between `u_nodes` and `v_nodes`.

    If `u_nodes` or `v_nodes` is empty, the new LinkGraph object will contain the links for
    the given objects in `v_nodes` or `u_nodes`, respectively. If both are empty, return an
    empty LinkGraph object.

    Note that not all objects in `u_nodes` and `v_nodes` need to be present in the original

        u_nodes: a sequence of objects used as the first object in the links
        v_nodes: a sequence of objects used as the second object in the links

        A new LinkGraph object with the filtered links between the given objects.

        Filter the links for `gcf1` and `gcf2`:
        >>> new_lg = lg.filter([gcf1, gcf2])
        Filter the links for `spectrum1` and `spectrum2`:
        >>> new_lg = lg.filter([spectrum1, spectrum2])
        Filter the links between two lists of objects:
        >>> new_lg = lg.filter([gcf1, gcf2], [spectrum1, spectrum2])
    lg = LinkGraph()

    # exchange u_nodes and v_nodes if u_nodes is empty but v_nodes not
    if len(u_nodes) == 0 and len(v_nodes) != 0:
        u_nodes = v_nodes
        v_nodes = []

    if len(v_nodes) == 0:
        for u in u_nodes:
            self._filter_one_node(u, lg)

    for u in u_nodes:
        for v in v_nodes:
            self._filter_two_nodes(u, v, lg)

    return lg
link_to_dict(link: LINK) -> dict[str, Any]

Convert a link to a dictionary representation.


  • link (LINK) –

    A tuple containing the link information (u, v, data).


  • dict[str, Any]

    A dictionary containing the link information with the following keys:

    • genomic_object_id (str): The ID of the genomic object.
    • genomic_object_type (str): The type of the genomic object.
    • metabolomic_object_id (str): The ID of the metabolomic object.
    • metabolomic_object_type (str): The type of the metabolomic object.
    • metcalf_score (float | str): The Metcalf score, rounded to 2 decimal places.
    • rosetta_score (float | str): The Rosetta score, rounded to 2 decimal places.
Source code in src/nplinker/scoring/
def link_to_dict(link: LINK) -> dict[str, Any]:
    """Convert a link to a dictionary representation.

        link: A tuple containing the link information (u, v, data).

        A dictionary containing the link information with the following keys:

            - genomic_object_id (str): The ID of the genomic object.
            - genomic_object_type (str): The type of the genomic object.
            - metabolomic_object_id (str): The ID of the metabolomic object.
            - metabolomic_object_type (str): The type of the metabolomic object.
            - metcalf_score (float | str): The Metcalf score, rounded to 2 decimal places.
            - rosetta_score (float | str): The Rosetta score, rounded to 2 decimal places.
    u, v, data = link
    genomic_types = (GCF,)
    genomic_object = u if isinstance(u, genomic_types) else v
    metabolomic_object = v if isinstance(u, genomic_types) else u
    metcalf_score = data.get("metcalf")
    rosetta_score = data.get("rosetta")
    return {
        "genomic_object_type": genomic_object.__class__.__name__,
        "metabolomic_object_type": metabolomic_object.__class__.__name__,
        "metcalf_score": round(metcalf_score.value, 2) if metcalf_score else "",
        "rosetta_score": round(rosetta_score.value, 2) if rosetta_score else "",


to_tsv(file: str | PathLike) -> None

Exports the links in the LinkGraph to a TSV file.


  • file (str | PathLike) –

    the path to the output TSV file.


>>> lg.to_tsv("links.tsv")
Source code in src/nplinker/scoring/
def to_tsv(self, file: str | PathLike) -> None:
    """Exports the links in the LinkGraph to a TSV file.

        file: the path to the output TSV file.

        >>> lg.to_tsv("links.tsv")
    table_data = self._links_to_dicts()
    headers = table_data[0].keys()
    with open(file, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=headers, delimiter="\t")

Score dataclass

Score(name: str, value: float, parameter: dict)

A data class to represent score data.


  • name (str) –

    the name of the scoring method. See ScoringMethod for valid values.

  • value (float) –

    the score value.

  • parameter (dict) –

    the parameters used for the scoring method.

name instance-attribute

name: str

value instance-attribute

value: float

parameter instance-attribute

parameter: dict


__post_init__() -> None

Check if the value of name is valid.


  • ValueError

    if the value of name is not valid.

Source code in src/nplinker/scoring/
def __post_init__(self) -> None:
    """Check if the value of `name` is valid.

        ValueError: if the value of `name` is not valid.
    if ScoringMethod.has_value( is False:
        raise ValueError(
            f"{} is not a valid value. Valid values are: {[e.value for e in ScoringMethod]}"


Source code in src/nplinker/scoring/
def __getitem__(self, key):
    if key in { for field in fields(self)}:
        return getattr(self, key)
        raise KeyError(f"{key} not found in {self.__class__.__name__}")


__setitem__(key, value)
Source code in src/nplinker/scoring/
def __setitem__(self, key, value):
    # validate the value of `name`
    if key == "name" and ScoringMethod.has_value(value) is False:
        raise ValueError(
            f"{value} is not a valid value. Valid values are: {[e.value for e in ScoringMethod]}"

    if key in { for field in fields(self)}:
        setattr(self, key, value)
        raise KeyError(f"{key} not found in {self.__class__.__name__}")