NEWS

Version 1.5.2 (2025-12-07)

Uplot layout

Introduced theme_uplots() to unify the layout of plot functions.

Documentation improvements

Removed all examples and replaced them with CRAN-compliant examples (using tempdir() where writing is required).
Added or improved @return documentation for many functions.
Expanded and standardized parameter descriptions for int_col, grp, z_var, palname, and others.
Added full roxygen documentation for all plotting functions and internal helpers.

CRAN compliance: file writing and network access

All functions now avoid writing to the user’s home directory unless explicitly requested.
All examples and tests write only to tempdir().
download_library() was rewritten for full CRAN compliance:
- no automatic downloads at install/load time,
- no downloads in non-interactive sessions (e.g. CRAN checks),
- explicit user confirmation required before downloading libraries,
- SHA256 checksums verified for all downloads,
- safe local caching under ~/.ume/.

CRAN compliance: graphical parameters and working directory

Removed all global changes to par()

Plotting system overhaul

Updated legacy base-graphics plots to ggplot2, or ensured safe usage without altering global par().
Added optional Plotly output to multiple plotting functions.
Standardized UME logo placement; uplot_layout() is now internal (non-exported).
Improved colour handling for:
- uplot_vk()
- uplot_kmd()
- uplot_dbe_vs_c()
- isotope precision plots.
Added optional data-reduction binning to speed up rendering of large scatterplots.

Stability and bug fixes

Rewrote remove_empty_columns() to avoid data.table warnings related to .. scoping.
Updated calc_neutral_mass() error messages to satisfy unit tests; removed unnecessary options() calls.
Improved uplot_ratios():
- corrected intensity-ratio computation,
- fixed conservative and non-conservative unique-MF logic,
- enhanced VK colour mapping and Plotly conversion.
Strengthened robustness of uplot_vk() when z_var is missing or not numeric.
Removed all hidden side effects from exported functions (options, par(), working directory, filesystem writes).

Internal quality improvements

Simplified internal helpers:
- .msg()
- .prepare_peaklist_columns()
- .normalize_column_aliases()
Removed deprecated internal utilities (e.g. wcsv()).
Strengthened validation logic in peaklist and data-cleaning functions.
Harmonized column checking and naming behaviour across the entire package.

Version 1.5.1 (2025-12-01)

Modifications

download_library() now follows CRAN rules for downloading external data.
As calc_norm_int() always recalculates n_occurrence and n_assignments, the argument ms_id is passed to calc_number_assignments().

Bug fixes

calc_norm_int(): Fixed a corrupt message when using verbose = TRUE.

Version 1.5.0 (2025-11-23)

This version is the first submission to CRAN.

Depricated and changed arguments:

filter_mf_data: The select_file_ids argument is deprecated.
msg is depricated and replaced verbose.
add_known_mf now only adds a single column (categories) to mfd
check_peaklist was renamed as as_peaklist and now allows the import of external peaklists from e.g. csv-files.

External Formula Libraries via Zenodo

UME’s large molecular formula libraries (15–125 MB) are now hosted on Zenodo (https://doi.org/10.5281/zenodo.17606457) for open and persistent access.

lib_02.rds: medium-sized balanced library
lib_05.rds: extended high-coverage library (default)

These library objects are now S3 class objects.

New functions

download_library(), allows users to download and load molecular formula libraries from Zenodo:
- Downloads missing libraries automatically
- Verifies file integrity via SHA256 checksums
- Caches libraries in memory to avoid repeated loading
- Avoids repeated downloads unless overwrite = TRUE
- Loads the library directly as a data.table
Lookup function .f_label() looks for pretty labels in the table ume::nice_labels_dt.

Improvements

Added detailed documentation and examples for working with external libraries.
Some functions were declared as internal and not exported.
uplot_cluster() now returns a list object with cluster and mds results and figures.
uplot_pca() now returns a list object that includes a plotly PCA figure.
several documentation elements transferred to main_doc.R
References added to function descriptions.

Version 1.4.2 (2025-11-06)

New features

Added centralized validation for core ume data.table types: peaklist, formula_table, and formula_library to ensure consistent structure, types, and column names.
Introduced schema definitions and generic check_table_schema() helper to ensure consistent column names and types.
Legacy check_*() functions (check_peaklist(), check_mfd(), check_formula_library()) kept as type-specific validators, now routed through a unified system.
Package functions (ume_assign_formulas(), ume_filter_formulas(), etc.) now automatically verify input table structures to prevent runtime errors.
Provides clearer error messages and allows easy extension for new table types in the future.
New internal function .msg() for handling messages (verbose).
ume_vignette.pdf added to repository and can be accessed via gitlab readme.
New helper function classify_files() to automatically group files into categories (e.g., blanks, standards, pools, samples) based on pattern rules. The function is fully flexible: both the search column and the returned ID column can be specified by the user.

Bug fix

calc_norm_int(): Normalization via “sum_rank” fixed so that the sum is always based on the exact number of argument n_rank.

Version 1.4.1 (2025-10-30)

Enhancements

Updates of documentation.
Added a new internal helper normalize_verbose() to standardize message control across functions. Both verbose (preferred) and the legacy msg argument are supported. If both are provided, verbose takes precedence and a warning is issued.

Deprecated

The argument msg is now deprecated in favor of verbose. Existing code using msg will continue to work but may trigger a warning.

Version 1.4.0 (2025-10-20)

This version introduces a new nomenclature. All columns carrying information on isotopes are now named according to the official IUPAC nomenclature to avoid ambiguities.

For example, the column ‘c’ that contains the number of atoms of 12C is now called ‘12C’ (capital “C”!).

This had implications for the entire ume data pipeline. Functions such as check_mfd(), check_formula_library(), and check_peaklist() can now help to enforce the new nomenclature.

Function updates

calc_recalibrate_ms.R() now expects a filename (file). The new argument insufficient_calibrants (valid argument values: “extrapolate”, “remove_spectrum”) handles spectra, in which no calibrant masses were identified. The argument value “extrapolate” takes the median of calibration slope and intercept for all spectra that could be calibrated with at least two masses and uses these values to calibrate the spectra that showed no calibration masses. The argument value “remove_spectrum” deletes all peaks of those spectra for which no calibrant masses were identified.
get_isotope_info() is now a fundamental function that identifies element / isotope information in any table. It returns the original names of the isotope columns of the table and related IUPAC information on the isotope.
identify_isotope_columns() is depricated and merged into get_isotope_info().
add_known_mf() now provides a column that contains all category labels. In future versions separate columns for each category (such as “CRAM”) will be depricated.
assign_formulas(): pl (peaklist) can now also be a numeric mass vector or a single mass. For numeric input, a minimal peaklist is constructed internally. The result data.table is now returned visibly (before: return(invisible(mfd))). The consistency of the numeric peaklist is now checked by the function check- _peaklist().
check_peaklist() now allows manual assignment of the column names containing the mass spectrum filename, file identifier (numeric column), the m/z values, and peak magnitude. The columns will be renamed according to the internal naming of these column in ume.
check_mfd() and check_formula_library() now enforce the new isotope nomenclature.
calc_data_summary(), calc_eval_params(), calc_norm_int(), calc_recalibrate(), convert_molecular_formula_to_data_table(), create_custom_formula_library(), eval_isotopes(), eval_isotopes(), order_columns(), filter functions, and plotting functions were all modified to match the new isotope nomenclature.
calc_data_summary(): hard-coded conversion of i_magnitude column to data type numeric.
calc_dbe() now stops if the valence of an element in the formula is not provide. Function modified to match new isotope nomenclature.
convert_data_table_to_molecular_formulas has a new argument keep_element_sums that provides columns for the count of atoms for each element (sum of isotope counts such as ‘C_tot’). Function modified to match new isotope nomenclature.

Other

Documentations and Vignette updated.
All functions that perform calculations are now summarized in the function family ‘calculations’.
Unit tests added.

Version 1.3.1 (2025-09-03)

Bug fixes

Fixed a bug in assign_formulas() that was introduced in version 1.2.1 (April 7 2025).

Function updates

convert_molecular_formula_to_data_table: nominal mass (nm) is now also returned.
calc_exact_mass: Now returns a single numeric vector. If mfd is a character value, it is interpreted as a molecular formula and evaluated:
- calc_exact_mass("C2H4") returns 28.031300129.
calc_nm: Now always returns a single numeric vector. If mfd is a character value, it is interpreted as a molecular formula and evaluated:
- calc_nm("C2H4") returns 28.
uplot_ms(): the column specified by the argument label is now internally converted by as.factor().

Data update

ume::peaklist_demo: integer column file_id added (to be consistent with changes in version 0.2.4. Column file now contains the names of the MS spectra. Columns m_min, m_max, and m were removed because they can be calculated using calc_neutral_mass() and calc_ma_abs().
Other:
- Vignette updated
- Documentation of package data updated

Version 1.3.0

Function updates

assign_formulas(): The arguments memory_efficient (FALSE / TRUE) and chunk_n (number of peaks in each chunk) allow processing in chunks to be more memory efficient.
remove_blanks(): if a column for retention time is detected or provided (via the ret_time_col argument), blanks will be removed only for a given retention time and not for the entire spectrum. The argument LCMS is deprecated.
main_docu.R updated.

Version 1.2.2

internal table known_mf updated (corrected one false formula from Hertkorn paper)
calc_dbe now also accepts molecular formula strings or character vectors as input

Version 1.2.1

Function updates:

- `calc_recalibrate_ms()`: argument `formula_library` was removed.
  The calibration is now only based on lists of molecular formulas provided 
  either by `calibr_list` or by `custom_calibr_list`.
- `assign_formulas()` is now much faster for small libraries (n<=10 entries), 
  because the peaklist is pre-filtered before matching with the library.
- `identify_isotope_columns()`: column names "sn" and "sc" are explicitly 
  excluded to avoid confusions with element names.

Unit tests added.

Version 1.2

New Functions

identify_isotope_columns(): Apply this function to a data.table to identify columns that have element or isotope information.
convert_data_table_to_molecular_formulas(): Create molecular formula strings for a table that has element or isotope information.
create_ume_formula_library() completely renovated. The function now excepts any element and isotope by providing two molecular formulas for the upper and lower limit of each isotope in the final library.
Functions calc_dbe(), calc_nm(), calc_exact_mass() now consider all element and isotopes and a flexible usage of spelling.
Major update in package documentation: The internal helper function main_docu() documents arguments that occur in many ume functions (@inheritParams main_docu).

Version 1.1.2

Function update

convert_molecular_formula_to_data_table() is now fundamentally faster, recognizes isotopes in a formula (square brackets), and also returns the exact mass of a formula. The function can now be used to build small custom formula libraries. #### Minor changes
assign_formulas() now checks if all required function arguments are available.

Version 1.1.1

Documentation improved.
All internal function moved to R folder and declared as internal.
uplot_cluster() now supports custom column names.

Version 1.1.0

New and updated plot functions

- `uplot_cluster()`: Cluster + NMDS function added: `uplot_cluster()`
- `uplot_pca()`: 
- `uplot_ms()`: Revised and a `data_reduction` argument was added 
   to accelerate plotting. 
- `uplot_ratios()`: For comparing peak intensities of molecular formulas 
  between two spectra.
- `uplot_cvm()`: new                          
- `uplot_freq()`: new
- `uplot_freq_ma()`: new
- `uplot_freq_vs_ppm()`: new
- `uplot_hc_vs_m()`: new
- `uplot_heteroatoms()`: new
- `uplot_isotope_precision()`: update
- `uplot_kmd()`: new
- `uplot_layout()`: new
- `uplot_lcms()`: update
- `uplot_ma_vs_mz()`: new
- `uplot_n_mf_per_sample()`: new
- `uplot_pca()`: new
- `uplot_ratios()`: new
- `uplot_reproducibility()`: new
- `uplot_ri_vs_sample()`: new
- `uplot_vk()`: update
- `uplot_ppm_average()`: new
- `uplot_dbe_vs_o()`: new
- `uplot_dbe_vs_c()`: new
- `uplot_dbe_vs_ppm()`: new
- `uplot_dbe_minus_o()`: new
- `ustats_outlier()`: moved from `stats.R`

Plot functions are now in separate R file in the repository

Version 1.0.6

package xml2 has been removed as a dependency. It is only required for the function read_xml_peaklist(), which now checks for the xml2 package installation specifically.
known_mf updated to UTF-8 encoding.
Results from functions assign_formulas() and check_formula_library() are now returned invisibly.

Version 1.0.5

Bug fixes
- calc_neutral_mass(): now takes a vector as function argument.
- remove_blanks(): There was an error for LC data.

Version 1.0.4

assign_formulas() adapted for formula libraries containing only one formula (e.g. when assigning the post-column standard in LCMS)
calc_recalibrate_ms() adapted for formula libraries containing only one formula (e.g. when assigning the post-column standard in LCMS)
Check added for ume:::extract_metadata_from_ufz_files()
Bug fix solved in check_formula_library()
calc_recalibrate_ms() udated because of changes in calc_neutral_mass()

Version 1.0.3

New plot added for LCMS data (as provided by Dr. Xianyu Kong)
Post-column standard (Naproxen) added to ume::known_mf.
Error handling in add_known_mf() improved if a molecular formula column is not existing in the source table mfd.
Documentations updated:
- add_missing_element_columns()
- assign_formulas()

Version 1.0.2

Vignette updated
Improvement of internal function read_xml_files(). Default folder_path now is NULL, which opens a dialogue box for folder selection.
Improvement of internal function extract_metadata_from_UFZ_files(). Default folder_path now is NULL, which opens a dialogue box for folder selection.
calc_db() error handling updated and argument element_names added, which handles the style of element / isotope symbols (“lower case” (default) / “upper case”)

Version 1.0.1

Internal function added to retrieve metadata from UFZ filenames: extract_metadata_from_UFZ_files()

Notes

New structure of News.md

Version 1.0.0

BREAKING CHANGES

First draft version for upload to CRAN
Unnecessary dependencies removed
Unit tests added

Notes

Improved documentation for many functions

Version 0.3.2

Internal functions separated from package.

Version 0.3.1

Main changes

This version now includes unit tests. Following versions will allow that functions can be applied to single values and vectors in tables. - Updates for calc_ma() and calc_ma_abs(): - checks added in functions - unit tests added - application of functions now - Dependencies DT and pander removed from package.

Version 0.2.20

New functions

Create a custom molecular formula library from a list of molecular formulas (create_custom_formula_library())
Internal function (database access required): Search for molecular formula targets in database (search_for_mf_target())

Bugfix

eval_isotopes(): Provide warning, if there is no isotope information available in molecular formula data.

Version 0.2.19

First implementation of function by Shuxian Gao (UFZ Leipzig) for the elimination of molecular formula multiple assignments.
Function added for converting a vector of molecular formula strings into a data.table: convert_molecular_formulas()
pdf manual added to git repository (https://gitlab.awi.de/bkoch/ume)
Help documentation for data objects expanded.

Version 0.2.18

Function added for reading a set of xml peaklists and store them into a single data.table: read_xml_peaklist()
Function added for searching for molecular formulas and molecular masses in MarChem Database: search_for_mf_target()

Version 0.2.17

List of metabolome target formulas (courtesy to F. Bussmann) added to ume::known_mf
List of formulas added to ume::known_mf that indicates photo- and biodegradation (Seibt, 2017; PhD thesis)

Version 0.2.16

Calibration function calc_recalibrate_ms() now allows for customized formula lists as reference for calibration.

Version 0.2.15

Diversity calculations added (calc_shannon_index(), calc_simpson_index(), calc_pielou_eveness()).
Function for automated recalibration improved.
Documentation updated.

Version 0.2.14

Formula assignment procedure updated. The library search now uses data.table::foverlap(), which doubles the speed of formula assignment.

Version 0.2.13

First release of function process_orbi_data() that reads scans from a list of mzML files and assigns formulas to each scan.

Version 0.2.12

function create_ume_formula_library() updated
Vignette updated (installation procedure)

Version 0.2.11

Wrapper function for Orbitrap added for internal use

Version 0.2.10

Changes in calibration procedure (calc_recalibrate_ms.R):
calibr_list was extended by “E_coli_metabolome” for the calibration of metabolome samples
Mass accuracy plot (uplot.freq_ma in plot.R):
bug fix: calculation of median and quantile mass accuracies now ignores missing values.
Updates in documentation

Version 0.2.9

Calculation of aromaticity modified:
calc_eval_params.R: AI will only be calculated for molecular formulas in which C > O + N + P calc_data_summary.R: wa(AI) is now calculated from intensity weighted average element numbers.

Version 0.2.8

Automated re-calibration:
calc_recalibrate_ms.R: At least 5 calibrants must now be detected in an analyses for recalibration. Otherwise the respective analysis (file_id) will be removed from the recalibration peaklist.
Documentation
Minor bug fixes

Version 0.2.7

Added a NEWS.md file to track changes to the package.
Added a test version of assign_formulas_new.R that is more memory efficient and faster (now based on data.table::foverlap())
Applied pkgdown to generate a website for the ume package

Version 0.2.6

Major changes in function documentations. Using @inheritDotParams, wrapper functions now have argument descriptions of the sub-functions available.

Version 0.2.4

file_id now ALWAYS has to be numeric.