RIAssigner.data

Submodules

Classes

Data

Base class for data managers.

PandasData

Class to handle data from filetypes which can be imported into a pandas dataframe.

MatchMSData

Class to handle data from filetypes which can be imported

SimpleData

Class to handle data from numpy arrays

ValidateSimpleData

Class to handle data from numpy arrays

Package Contents

class RIAssigner.data.Data(filename: str, filetype: str, rt_unit: str)[source]

Bases: abc.ABC

Base class for data managers.

RetentionTimeType
RetentionIndexType
CommentFieldType
URegistry
_keys_conversions
_rt_possible_keys
_ri_possible_keys
static is_valid(value: RetentionTimeType | RetentionIndexType) bool[source]

Determine whether a retention time value is valid

Parameters:

rt (RetentionTimeType) – Value to check for validity.

Returns:

State of validity (True/False).

Return type:

bool

static can_be_float(rt: pint.Quantity | float | int) bool[source]

Determine whether a value can be converted to a float.

This function checks if the provided input is an instance of either Quantity, float, or int.

Parameters:

rt (Union[Quantity, float, int]) – Value to check for float conversion.

Returns:

True if the input is an instance of Quantity, float, or int, False otherwise.

Return type:

bool

classmethod add_possible_rt_keys(keys: List[str]) None[source]

A method that adds new identifiers to get retention time information.

Parameters:

keys (List[str]) – A list of new identifiers (keys) to be added to the _rt_possible_keys.

Returns:

None

classmethod add_possible_ri_keys(keys: List[str]) None[source]

A method that adds new identifiers to get retention index information.

Parameters:

keys (List[str]) – A list of new identifiers (keys) to be added to the _ri_possible_keys.

Returns:

None

classmethod get_possible_rt_keys() List[str][source]

A method that returns the possible keys to get retention times.

Returns:

A list of possible keys to get retention times.

Return type:

List[str]

classmethod get_possible_ri_keys() List[str][source]

A method that returns the possible keys to get retention indices.

Returns:

A list of possible keys to get retention indices.

Return type:

List[str]

_filename
_filetype
_rt_unit
_unit
abstractmethod write(filename: str) None[source]

Store current content to disk.

Parameters:

filename (str) – Path to output filename.

property filename: str

Getter for filename property.

Returns:

Filename of originally loaded data.

Return type:

str

property retention_times: Iterable[RetentionTimeType]
Abstractmethod:

Getter for retention_times property.

Returns:

RT values contained in data.

Return type:

Iterable[RetentionTimeType]

property retention_indices: Iterable[RetentionIndexType]
Abstractmethod:

Getter for retention_indices property.

Returns:

RI values stored in data.

Return type:

Iterable[RetentionIndexType]

has_retention_indices() bool[source]

Check if all retention indices in the spectra exist.

This method iterates over the retention indices in the spectra. If it encounters a value that is None, it immediately returns False. If it iterates over all retention indices without finding a None value, it returns True.

Returns:

True if all retention indices exist, False otherwise.

Return type:

bool

has_retention_times() bool[source]

Check if all retention times in the spectra exist.

This method iterates over the retention times in the spectra. If it encounters a value that is None, it immediately returns False. If it iterates over all retention times without finding a None value, it returns True.

Returns:

True if all retention times exist, False otherwise.

Return type:

bool

property comment: Iterable[CommentFieldType]
Abstractmethod:

Getter for comment property.

Returns:

Comment field values stored in data.

Return type:

Iterable[CommentFieldType]

init_ri_from_comment(ri_source: str) None[source]

Extract RI from comment field. Extracts the RI from the comment field of the data file. The RI is expected to be in the format ‘ri_source=RI_value’. The function extracts the RI value and sets it on the retention_index property.

Parameters:
  • content_comment – Comment field of the data file.

  • ri_source – String that is expected to be in the comment field before the RI value.

class RIAssigner.data.PandasData(filename: str, filetype: str, rt_unit: str)[source]

Bases: RIAssigner.data.Data.Data

Class to handle data from filetypes which can be imported into a pandas dataframe.

_carbon_number_column_names
_rt_key = 'rt'
_read()[source]

Load content from file into PandasData object.

_read_into_dataframe() None[source]

Read the data from file into dataframe.

write(filename: str) None[source]

Write data on disk. Supports ‘csv’, ‘tsv’, ‘tabular’ and ‘parquet’ formats.

_init_carbon_number_index() None[source]

Find key of carbon number column and store it.

_init_rt_column_info() None[source]

Find key of retention time column and store it.

_init_ri_column_info() None[source]

Initialize retention index column name and set its position next to the retention time column.

_init_ri_indices() None[source]

Initialize retention indices to a factor of 100 of carbon numbers or None if carbon numbers are not present.

_sort_by_rt() None[source]

Sort peaks by their retention times.

_replace_nans_with_0s() None[source]

Replace NaN values (including blank strings and invalid values) with 0s.

__eq__(o: object) bool[source]

Comparison operator ==.

Parameters:

o (object) – Object to compare with.

Returns:

State of equality.

Return type:

bool

property retention_times: Iterable[RIAssigner.data.Data.Data.RetentionTimeType]

Get retention times in seconds.

property retention_indices: Iterable[RIAssigner.data.Data.Data.RetentionIndexType]

Get retention indices from data or computed from carbon numbers.

_ri_from_carbon_numbers() Iterable[int][source]

Returns the RI of compound based on carbon number.

property comment: Iterable[RIAssigner.data.Data.Data.CommentFieldType]

Get comments.

Returns:

Comments.

Return type:

Iterable[Data.CommentFieldType]

class RIAssigner.data.MatchMSData(filename: str, filetype: str, rt_unit: str)[source]

Bases: RIAssigner.data.Data.Data

Class to handle data from filetypes which can be imported using ‘matchms’.

_read()[source]

Load data into object and initialize properties.

write(filename: str) None[source]

Write data to back to the spectra file

Parameters:

filename (str) – Path to filename under which to store the data.

_write_RIs_to_spectra() None[source]

Write the RI values stored in the object to the spectra metadata.

_read_retention_times() None[source]

Read retention times from spectrum metadata.

_read_retention_indices() None[source]

Read retention indices from spectrum metadata.

_sort_spectra_by_rt() None[source]

Sort objects (peaks) in spectra list by their retention times.

__eq__(o: object) bool[source]

Comparison operator ==.

Parameters:

o (object) – Object to compare with.

Returns:

State of equality.

Return type:

bool

property retention_times: Iterable[RIAssigner.data.Data.Data.RetentionTimeType]

Get retention times in seconds.

property retention_indices: Iterable[RIAssigner.data.Data.Data.RetentionIndexType]

Get retention indices.

property comment: Iterable[RIAssigner.data.Data.Data.CommentFieldType]

Get comments.

property spectra_metadata: Tuple[numpy.array, List[str]]
class RIAssigner.data.SimpleData(retention_times: Iterable[float], rt_unit: str, retention_indices: Iterable[float] = None)[source]

Bases: RIAssigner.data.Data.Data

Class to handle data from numpy arrays

_read(retention_times, retention_indices)[source]
abstractmethod write()[source]

Store current content to disk.

Parameters:

filename (str) – Path to output filename.

property retention_indices: Iterable[RIAssigner.data.Data.Data.RetentionIndexType]

Getter for retention_indices property.

Returns:

RI values stored in data.

Return type:

Iterable[RetentionIndexType]

property retention_times: Iterable[RIAssigner.data.Data.Data.RetentionTimeType]

Getter for retention_times property.

Returns:

RT values contained in data.

Return type:

Iterable[RetentionTimeType]

property comment: Iterable[str | None]

Getter for comment property.

Returns:

Comment field values stored in data.

Return type:

Iterable[CommentFieldType]

class RIAssigner.data.ValidateSimpleData(retention_times: Iterable[float], rt_unit: str, retention_indices: Iterable[float] = None)[source]

Bases: RIAssigner.data.SimpleData.SimpleData

Class to handle data from numpy arrays

_validate_input(retention_times, retention_indices)[source]