| Title: | Cytokine Profiling Analysis Tool |
| Version: | 0.2.4 |
| Description: | Provides comprehensive cytokine profiling analysis through quality control using biologically meaningful cutoffs on raw cytokine measurements and by testing for distributional symmetry to recommend appropriate transformations. Offers exploratory data analysis with summary statistics, enhanced boxplots, and barplots, along with univariate and multivariate analytical capabilities for in-depth cytokine profiling such as Principal Component Analysis based on Andrzej Maćkiewicz and Waldemar Ratajczak (1993) <doi:10.1016/0098-3004(93)90090-R>, Sparse Partial Least Squares Discriminant Analysis based on Lê Cao K-A, Boitard S, and Besse P (2011) <doi:10.1186/1471-2105-12-253>, Random Forest based on Breiman, L. (2001) <doi:10.1023/A:1010933404324>, and Extreme Gradient Boosting based on Tianqi Chen and Carlos Guestrin (2016) <doi:10.1145/2939672.2939785>. |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| URL: | https://github.com/saraswatsh/CytoProfile, https://cytoprofile.cytokineprofile.org/ |
| Depends: | R (≥ 4.3) |
| Imports: | mixOmics, dplyr, tidyr, pROC, plot3D, caret, xgboost, randomForest, pheatmap, e1071, ggplot2, ggrepel, gridExtra, reshape2, lifecycle, data.table |
| Suggests: | spelling, BiocManager, Ckmeans.1d.dp, prodlim, testthat, knitr, rmarkdown, devtools |
| NeedsCompilation: | no |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| LazyData: | true |
| VignetteBuilder: | knitr |
| BugReports: | https://github.com/saraswatsh/CytoProfile/issues |
| Language: | en-US |
| Packaged: | 2026-02-27 16:01:55 UTC; ssa390 |
| Author: | Shubh Saraswat |
| Maintainer: | Shubh Saraswat <shubh.saraswat00@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-27 16:32:05 UTC |
CytoProfile: Cytokine Profiling Analysis Tool
Description
Provides comprehensive cytokine profiling analysis through quality control using biologically meaningful cutoffs on raw cytokine measurements and by testing for distributional symmetry to recommend appropriate transformations. Offers exploratory data analysis with summary statistics, enhanced boxplots, and barplots, along with univariate and multivariate analytical capabilities for in-depth cytokine profiling such as Principal Component Analysis based on Andrzej Maćkiewicz and Waldemar Ratajczak (1993) doi:10.1016/0098-3004(93)90090-R, Sparse Partial Least Squares Discriminant Analysis based on Lê Cao K-A, Boitard S, and Besse P (2011) doi:10.1186/1471-2105-12-253, Random Forest based on Breiman, L. (2001) doi:10.1023/A:1010933404324, and Extreme Gradient Boosting based on Tianqi Chen and Carlos Guestrin (2016) doi:10.1145/2939672.2939785.
Author(s)
Maintainer: Shubh Saraswat shubh.saraswat00@gmail.com (ORCID) [copyright holder]
Authors:
Xiaohua Douglas Zhang douglas.zhang@uky.edu (ORCID)
See Also
Useful links:
Report bugs at https://github.com/saraswatsh/CytoProfile/issues
Example Cytokine Profiling Data 1.
Description
Contains observed concentrations of cytokines and their respective treatment and groups, derived from:
Usage
ExampleData1
Format
A data frame with 297 rows and 29 columns:
- Group
Group assigned to the subjects.
- Treatment
Treatment received by subjects.
- Time
Time point of the measurement.
- IL-17F
Observed concentration of IL-17F cytokine.
- GM-CSF
Observed concentration of GM-CSF cytokine.
- IFN-G
Observed concentration of IFN-G cytokine.
- IL-10
Observed concentration of IL-10 cytokine.
- CCL-20/MIP-3A
Observed concentration of CCL-20/MIP-3A cytokine.
- IL-12/P70
Observed concentration of IL-12/P70 cytokine.
- IL-13
Observed concentration of IL-13 cytokine.
- IL-15
Observed concentration of IL-15 cytokine.
- IL-17A
Observed concentration of IL-17A cytokine.
- IL-22
Observed concentration of IL-22 cytokine.
- IL-9
Observed concentration of IL-9 cytokine.
- IL-1B
Observed concentration of IL-1B cytokine.
- IL-33
Observed concentration of IL-33 cytokine.
- IL-2
Observed concentration of IL-2 cytokine.
- IL-21
Observed concentration of IL-21 cytokine.
- IL-4
Observed concentration of IL-4 cytokine.
- IL-23
Observed concentration of IL-23 cytokine.
- IL-5
Observed concentration of IL-5 cytokine.
- IL-6
Observed concentration of IL-6 cytokine.
- IL-17E/IL-25
Observed concentration of IL-17E/IL-25 cytokine.
- IL-27
Observed concentration of IL-27 cytokine.
- IL-31
Observed concentration of IL-31 cytokine.
- TNF-A
Observed concentration of TNF-A cytokine.
- TNF-B
Observed concentration of TNF-B cytokine.
- IL-28A
Observed concentration of IL-28A cytokine.
Source
Example data compiled for cytokine profiling.
References
Pugh GH, Fouladvand S, SantaCruz-Calvo S, Agrawal M, Zhang XD, Chen J, Kern PA, Nikolajczyk BS. T cells dominate peripheral inflammation in a cross-sectional analysis of obesity-associated diabetes. Obesity (Silver Spring). 2022;30(10): 1983–1994. doi:10.1002/oby.23528.
Examples
data(ExampleData1)
Example Cytokine Profiling Data 2.
Description
Contains observed concentrations of cytokines and their respective treatment and groups, derived from:
Usage
ExampleData2
Format
A data frame with 66 rows and 20 columns:
- Stimulation
Stimulation assigned to the subjects.
- Group
Group assigned to the subjects.
- IL-17F
Observed concentration of IL-17F cytokine.
- GM-CSF
Observed concentration of GM-CSF cytokine.
- IFN-G
Observed concentration of IFN-G cytokine.
- IL-10
Observed concentration of IL-10 cytokine.
- CCL-20
Observed concentration of CCL-20 cytokine.
- IL-12
Observed concentration of IL-12 cytokine.
- IL-13
Observed concentration of IL-13 cytokine.
- IL-17A
Observed concentration of IL-17A cytokine.
- IL-22
Observed concentration of IL-22 cytokine.
- IL-9
Observed concentration of IL-9 cytokine.
- IL-1B
Observed concentration of IL-1B cytokine.
- IL-2
Observed concentration of IL-2 cytokine.
- IL-21
Observed concentration of IL-21 cytokine.
- IL-4
Observed concentration of IL-4 cytokine.
- IL-5
Observed concentration of IL-5 cytokine.
- IL-6
Observed concentration of IL-6 cytokine.
- TNF-A
Observed concentration of TNF-A cytokine.
- TNF-B
Observed concentration of TNF-B cytokine.
Source
Example data compiled for cytokine profiling.
References
SantaCruz-Calvo S, Saraswat S, Hasturk H, Dawson DR, Zhang XD, Nikolajczyk BS. Periodontitis and Diabetes Differentially Affect Inflammation in Obesity. J Dent Res. 2024;103(12):1313-1322. doi:10.1177/00220345241280743
Examples
data(ExampleData2)
Example Cytokine Profiling Data 3.
Description
Contains observed concentrations of cytokines and their respective treatment and groups, derived from:
Usage
ExampleData3
Format
A data frame with 64 rows and 14 columns:
- Stimulation
Stimulation assigned to the subjects.
- Group
Group assigned to the subjects.
- GM-CSF
Observed concentration of GM-CSF cytokine.
- IFN-G
Observed concentration of IFN-G cytokine.
- IL-10
Observed concentration of IL-10 cytokine.
- CCL-20/MIP-3A
Observed concentration of CCL-20/MIP-3A cytokine.
- IL-12/P70
Observed concentration of IL-12/P70 cytokine.
- IL-13
Observed concentration of IL-13 cytokine.
- IL-15
Observed concentration of IL-15 cytokine.
- IL-9
Observed concentration of IL-9 cytokine.
- IL-1B
Observed concentration of IL-1B cytokine.
- IL-21
Observed concentration of IL-21 cytokine.
- IL-6
Observed concentration of IL-6 cytokine.
- TNF-A
Observed concentration of TNF-A cytokine.
Source
Example data compiled for cytokine profiling.
References
SantaCruz-Calvo S, Saraswat S, Hasturk H, Dawson DR, Zhang XD, Nikolajczyk BS. Periodontitis and Diabetes Differentially Affect Inflammation in Obesity. J Dent Res. 2024;103(12):1313-1322. doi:10.1177/00220345241280743
Examples
data(ExampleData3)
Example Cytokine Profiling Data 4.
Description
Contains observed concentrations of cytokines and their respective treatment and groups, derived from:
Usage
ExampleData4
Format
A data frame with 64 rows and 14 columns:
- Group
Group assigned to the subjects.
- Treatment
Treatment received by subjects.
- IL-17F
Observed concentration of IL-17F cytokine.
- GM-CSF
Observed concentration of GM-CSF cytokine.
- IFN-G
Observed concentration of IFN-G cytokine.
- IL-10
Observed concentration of IL-10 cytokine.
- CCL-20
Observed concentration of CCL-20 cytokine.
- IL-12
Observed concentration of IL-12 cytokine.
- IL-13
Observed concentration of IL-13 cytokine.
- IL-17A
Observed concentration of IL-17A cytokine.
- IL-22
Observed concentration of IL-22 cytokine.
- IL-9
Observed concentration of IL-9 cytokine.
- IL-2
Observed concentration of IL-2 cytokine.
- IL-21
Observed concentration of IL-21 cytokine.
- IL-4
Observed concentration of IL-4 cytokine.
- IL-23
Observed concentration of IL-23 cytokine.
- IL-5
Observed concentration of IL-5 cytokine.
- IL-6
Observed concentration of IL-6 cytokine.
- TNF-A
Observed concentration of TNF-A cytokine.
- TNF-B
Observed concentration of TNF-B cytokine.
Source
Example data compiled for cytokine profiling.
References
SantaCruz-Calvo, S., Saraswat, S., Kalantar, G. H., Zukowski, E., Marszalkowski, H., Javidan, A., Gholamrezaeinejad, F., Bharath, L. P., Kern, P. A., Zhang, X. D., & Nikolajczyk, B. S. (2024). A unique inflammaging profile generated by T cells from people with obesity is metformin resistant. GeroScience, 10.1007/s11357-024-01441-4. Advance online publication. https://doi.org/10.1007/s11357-024-01441-4
Examples
data(ExampleData4)
Example Cytokine Profiling Data 5.
Description
Contains observed concentrations of cytokines and their respective treatment and groups, derived from:
Usage
ExampleData5
Format
A data frame with 297 rows and 29 columns:
- Group
Group assigned to the subjects.
- Treatment
Treatment received by subjects.
- Batch
Batch number corresponding to the sample.
- Time
Time point of the measurement.
- IL-17F
Observed concentration of IL-17F cytokine.
- GM-CSF
Observed concentration of GM-CSF cytokine.
- IFN-G
Observed concentration of IFN-G cytokine.
- IL-10
Observed concentration of IL-10 cytokine.
- CCL-20/MIP-3A
Observed concentration of CCL.20.MIP.3A cytokine.
- IL-12/P70
Observed concentration of IL-12/P70 cytokine.
- IL-13
Observed concentration of IL-13 cytokine.
- IL-15
Observed concentration of IL-15 cytokine.
- IL-17A
Observed concentration of IL-17A cytokine.
- IL-22
Observed concentration of IL-22 cytokine.
- IL-9
Observed concentration of IL-9 cytokine.
- IL-1B
Observed concentration of IL-1B cytokine.
- IL-33
Observed concentration of IL-33 cytokine.
- IL-2
Observed concentration of IL-2 cytokine.
- IL-21
Observed concentration of IL-21 cytokine.
- IL-4
Observed concentration of IL-4 cytokine.
- IL-23
Observed concentration of IL-23 cytokine.
- IL-5
Observed concentration of IL-5 cytokine.
- IL-6
Observed concentration of IL-6 cytokine.
- IL-17E/IL-25
Observed concentration of IL-17E/IL-25 cytokine.
- IL-27
Observed concentration of IL-27 cytokine.
- IL-31
Observed concentration of IL-31 cytokine.
- TNF-A
Observed concentration of TNF-A cytokine.
- TNF-B
Observed concentration of TNF-B cytokine.
- IL-28A
Observed concentration of IL-28A cytokine.
Note
The ExampleData5 dataset is the same data as ExampleData1 but with a new column "Batch" added to indicate the batch number corresponding to each sample. The "Batch" column was randomly generated to simulate batch effects in the data.
Source
Example data compiled for cytokine profiling.
References
Pugh GH, Fouladvand S, SantaCruz-Calvo S, Agrawal M, Zhang XD, Chen J, Kern PA, Nikolajczyk BS. T cells dominate peripheral inflammation in a cross-sectional analysis of obesity-associated diabetes. Obesity (Silver Spring). 2022;30(10): 1983–1994. doi:10.1002/oby.23528.
Examples
data(ExampleData5)
Adjust p-values using a specified method
Description
A thin wrapper around stats::p.adjust that defaults to the
Benjamini–Hochberg procedure. Useful for unifying multiple
testing adjustments across the package.
Usage
adjust_p(p_values, method = "BH")
Arguments
p_values |
A numeric vector of raw p-values. |
method |
A character string specifying the p‑value adjustment
method. Passed directly to |
Value
A numeric vector of adjusted p-values of the same length as
p_values.
Apply a scale transformation to numeric columns
Description
This helper function applies a chosen scaling or transformation to specified numeric columns in a data frame. Supported built‑in transformations include no transformation ("none"), log2, log10, and z‑score scaling. A custom function can also be supplied to perform arbitrary transformations.
Usage
apply_scale(
data,
columns = NULL,
scale = c("none", "log2", "log10", "zscore", "custom"),
custom_fn = NULL
)
Arguments
data |
A data.frame or matrix containing the data to be transformed. |
columns |
A character vector of column names to transform. If NULL (default) all numeric columns will be transformed. |
scale |
A character string specifying the transformation to
apply. Possible values are "none", "log2", "log10",
"zscore", or "custom". When set to "custom" the function
specified in |
custom_fn |
A function that takes a numeric vector and returns a
transformed numeric vector. Only used when |
Value
A data.frame with the same dimensions as data with
transformed numeric columns.
ANOVA Analysis on Continuous Variables. ![[Deprecated]](./figures/lifecycle-deprecated.svg)
Description
This function performs an analysis of variance (ANOVA) for each continuous variable against every categorical predictor in the input data. Character columns are automatically converted to factors; all factor columns are used as predictors while numeric columns are used as continuous outcomes. For each valid predictor (i.e., with more than one level and no more than 10 levels), Tukey's Honest Significant Difference (HSD) test is conducted and the adjusted p-values for pairwise comparisons are extracted.
Usage
cyt_anova(data, format_output = FALSE)
Arguments
data |
A data frame or matrix containing both categorical and continuous variables. Character columns will be converted to factors and used as predictors, while numeric columns will be used as continuous outcomes. |
format_output |
Logical. If TRUE, returns the results as a tidy data frame instead of a list. Default is FALSE. |
Value
If format_output is FALSE (default), a list of adjusted p-values from Tukey's HSD tests
for each combination of continuous outcome and categorical predictor. List elements are named
in the format "Outcome_Categorical".
If format_output is TRUE, a data frame in a tidy format.
Author(s)
Shubh Saraswat
Examples
data("ExampleData1")
cyt_anova(ExampleData1[, c(1:2, 5:6)], format_output = TRUE)
Boxplots for Continuous Variables with Optional Grouping
Description
This function generates boxplots for numeric variables in a data frame or matrix. It supports optional grouping by one or more categorical variables. Numeric variables can be scaled using various transformations before plotting. When grouping is not used, boxplots are arranged in pages with a specified maximum number of plots per page. Plots can be saved to a PDF file or displayed on the current graphics device.
Usage
cyt_bp(
data,
output_file = NULL,
group_by = NULL,
bin_size = 25,
y_lim = NULL,
scale = c("none", "log2", "log10", "zscore", "custom"),
custom_fn = NULL
)
Arguments
data |
A matrix or data frame containing numeric and categorical variables. |
output_file |
Optional string specifying the name of the file
to be created. When |
group_by |
Optional character vector specifying one or more
columns to use for grouping. If |
bin_size |
Integer. Maximum number of boxplots per page when
grouping is not used. Default is 25, as in the original
|
y_lim |
Optional numeric vector giving y‑axis limits for the plots. Applies to all plots. |
scale |
Character specifying a transformation for numeric
variables. Accepts |
custom_fn |
A user supplied function to transform numeric
columns when |
Value
Invisibly returns a list of ggplot objects. When
output_file is provided, plots are written to the PDF file.
Examples
data("ExampleData1")
# Boxplots without grouping
cyt_bp(ExampleData1[, -c(1:3)], output_file = NULL, scale = "log2")
# Boxplots grouped by Group
cyt_bp(ExampleData1[, -c(3,5:28)], group_by = "Group", scale = "zscore")
Boxplot Function Enhanced for Specific Group Comparisons. ![[Deprecated]](./figures/lifecycle-deprecated.svg)
Description
This function generates a PDF file containing boxplots for each combination of numeric and factor variables in the provided data. It first converts any character columns to factors and checks that the data contains at least one numeric and one factor column. If the scale argument is set to "log2", all numeric columns are log2-transformed. The function then creates boxplots using ggplot2 for each numeric variable grouped by each factor variable.
Usage
cyt_bp2(data, pdf_title, scale = NULL, y_lim = NULL)
Arguments
data |
A matrix or data frame of raw data. |
pdf_title |
A string representing the title
(and filename) of the PDF file. If |
scale |
Transformation option for continuous variables. Options are NULL (default) and "log2". When set to "log2", numeric columns are transformed using the log2 function. |
y_lim |
An optional numeric vector defining the y-axis limits for the plots. |
Value
A PDF file containing the boxplots.
Author(s)
Shubh Saraswat
Examples
# Loading data
data_df <- ExampleData1[, -c(3, 5:28)]
data_df <- dplyr::filter(data_df, Group == "T2D", Treatment == "Unstimulated")
cyt_bp2(data_df, pdf_title = NULL, scale = "log2")
Dual-flashlight Plot
Description
This function reshapes the input data and computes summary statistics (mean and variance) for each variable grouped by a specified factor column. It then calculates the SSMD (Strictly Standardized Mean Difference) and log2 fold change between two groups (group1 and group2) and categorizes the effect strength as "Strong Effect", "Moderate Effect", or "Weak Effect". A dual flash plot is generated using ggplot2 where the x-axis represents the average log2 fold change and the y-axis represents the SSMD. Additionally, the function prints the computed statistics to the console. A ggplot object is returned for further modification.
Usage
cyt_dualflashplot(
data,
group_var,
group1,
group2,
ssmd_thresh = 1,
log2fc_thresh = 1,
top_labels = 15,
category_labels = c(Strong = "Strong Effect", Moderate = "Moderate Effect", Weak =
"Weak Effect"),
colors = c(`Strong Effect` = "red", `Moderate Effect` = "orange", `Weak Effect` =
"blue"),
shapes = c(`FALSE` = 16, `TRUE` = 17),
verbose = FALSE
)
Arguments
data |
A data frame containing at least one numeric column and a grouping column. |
group_var |
Character. Name of the grouping column. |
group1 |
Character strings identifying the two levels
of |
group2 |
Character strings identifying the two levels
of |
ssmd_thresh |
Numeric. Absolute SSMD threshold for highlighting observations as significant. Default is 1. |
log2fc_thresh |
Numeric. Absolute log2 fold change threshold for significance. Default is 1. |
top_labels |
Integer. Number of variables with the largest absolute SSMD to label on the plot. Default is 15. |
category_labels |
Optional named character vector of length
three providing alternative labels for the SSMD effect
categories. Names must include "Strong", "Moderate" and
"Weak". Defaults are |
colors |
Optional named character vector mapping the effect
categories to colors. Names must match those in
|
shapes |
Optional named numeric vector of length two giving
the plotting characters for non‑significant and significant
points. Defaults are |
verbose |
Logical. If |
Value
A ggplot object representing the dual‑flashlight plot.
When verbose is TRUE, the summary statistics data frame is
printed to the console.
Author(s)
Xiaohua Douglas Zhang and Shubh Saraswat
Examples
# Loading data
data_df <- ExampleData1[, -c(2:3)]
cyt_dualflashplot(data_df, group_var = "Group", group1 = "T2D", group2 = "ND",
ssmd_thresh = 0.2, log2fc_thresh = 1,
top_labels = 10)
Error-bar Plot
Description
This function generates an error-bar plot to visually compare different groups against a designated baseline group. It displays the central tendency (mean or median) as a bar and overlays error bars to represent the data's spread (e.g., standard deviation, MAD, or standard error). The plot can also include p-value and effect size labels (based on SSMD), presented either as symbols or numeric values, to highlight significant differences and the magnitude of effects. When an output filename is provided the plot is saved to disk; otherwise the ggplot object is returned and drawn on the current graphics device.
Usage
cyt_errbp(
data,
group_col = NULL,
p_lab = TRUE,
es_lab = TRUE,
class_symbol = FALSE,
x_lab = "",
y_lab = "",
title = "",
stat = c("mean", "median"),
error = c("se", "sd", "mad", "ci"),
scale = c("none", "log2", "log10", "zscore", "custom"),
custom_fn = NULL,
method = c("auto", "ttest", "wilcox"),
p_adjust_method = NULL,
output_file = NULL,
label_size = 4
)
Arguments
data |
A data frame containing at least one numeric column and a grouping column. |
group_col |
Character string naming the column that defines groups. This column will be coerced to a factor. |
p_lab |
Logical. If |
es_lab |
Logical. If |
class_symbol |
Logical. If |
x_lab |
Character string for the x-axis label. If empty a default label is generated. |
y_lab |
Character string for the y-axis label. If empty a default label is generated. |
title |
Character string for the plot title. If empty a default title is generated. |
stat |
Character. Central tendency statistic to use. Choices are "mean" or "median"; default is "mean". Added to support non‑mean summaries. |
error |
Character. Error measure visualized around the statistic. Options are "se" (standard error; default), "sd" (standard deviation), "mad" (median absolute deviation) or "ci" (approximate 95 % confidence interval). |
scale |
Character controlling data transformation before analysis. Accepts "none" (default), "log2", "log10", "zscore" or "custom". |
custom_fn |
A user‑supplied function applied to numeric
columns when |
method |
Character controlling the statistical test used for pairwise comparisons. Options are "auto" (default; choose between t‑test and Wilcoxon based on a normality test), "ttest" or "wilcox". |
p_adjust_method |
Character. If non‑NULL, specifies the
method used by |
output_file |
Optional file path. If provided, the plot is
saved using |
label_size |
Numeric. Font size for p‑value and effect‑size labels. Default is 4. |
Value
An error-bar plot (a ggplot object) is produced and optionally
saved as a PDF. If output_file is specified, the function returns
returns the ggplot object.
Author(s)
Xiaohua Douglas Zhang and Shubh Saraswat
Examples
# Basic usage with default settings
df <- ExampleData1[, c("Group", "CCL-20/MIP-3A", "IL-10")]
cyt_errbp(df, group_col = "Group")
# Use mean and SD, log2 transform and show significance
cyt_errbp(df, group_col = "Group", stat = "mean", error = "sd",
scale = "log2", class_symbol = TRUE, method = "ttest")
Generic export function for Cytokine plots
Description
The purpose of this helper is to save a collection of plots
generated throughout the CytoProfile package to a user specified
format. It is designed to handle objects created with ggplot2
(or other grid based plots), recorded base plots (see
grDevices::recordPlot()), or custom drawing functions stored
as closures. If a multi-page PDF is requested, all plots are
written into a single file; for raster formats (PNG, JPEG and
TIFF), each element of the plot list is saved to its own file
with a numbered suffix. Nothing is returned except invisibly.
Usage
cyt_export(
plots,
filename = "cyto_output",
format = c("pdf", "png", "jpeg", "tiff", "svg"),
width = 7,
height = 5,
dpi = 300,
which = NULL
)
Arguments
plots |
A single plot object or a list of plots. Each
element may be a |
filename |
Base file name (without extension). The
appropriate extension is added automatically depending on
|
format |
Character string giving the desired output format. One of "pdf", "png", "jpeg" or "tiff". Partial matching is performed. The default is "pdf". |
width, height |
Width and height of the device in inches. These arguments are passed directly to the graphics device. |
dpi |
Resolution in dots per inch used for raster formats (ignored for PDF). |
which |
Optional character vector naming the subset of
|
Examples
# Example usage within CytoProfile:
# res <- cyt_splsda(...)
# cyt_export(res$plots, filename = "splsda", format = "png")
Heat Map
Description
This function creates a heatmap using the numeric columns from the
provided data frame. It provides the ability to hide row and
column names, adjust font sizes and clustering, and apply
additional transformations such as log10 or combined z‑scoring. A
file name with extension may be provided via title to save the
heat map to disk; otherwise the plot is drawn on the active
graphics device.
Usage
cyt_heatmap(
data,
scale = c(NULL, "log2", "log10", "row_zscore", "col_zscore", "zscore"),
annotation_col = NULL,
annotation_side = c("auto", "row", "col"),
show_row_names = FALSE,
show_col_names = TRUE,
fontsize_row = 10,
fontsize_col = 10,
cluster_rows = TRUE,
cluster_cols = TRUE,
title = NULL
)
Arguments
data |
A data frame. Only numeric columns are used to construct the heat map. |
scale |
Character specifying an optional scaling. Accepts
|
annotation_col |
Optional. Either the name of a column in
|
annotation_side |
Character. One of "auto", "row" or
"col". When "auto" (default) the side is determined by
matching the length of |
show_row_names |
Logical. If |
show_col_names |
Logical. If |
fontsize_row |
Numeric. Font size for row names. Default is 10. |
fontsize_col |
Numeric. Font size for column names. Default is 10. |
cluster_rows |
Logical. If |
cluster_cols |
Logical. If |
title |
Character. The heat map title or file name. If
|
Value
Invisibly returns the pheatmap object created by
pheatmap::pheatmap().
Author(s)
Shubh Saraswat
Examples
# Load sample data
data("ExampleData1")
data_df <- ExampleData1
# Generate a heatmap with log2 scaling and annotation based on
# the "Group" column
cyt_heatmap(
data = data_df[, -c(2:3)],
scale = "log2", # Optional scaling
annotation_col = "Group",
annotation_side = "auto",
title = NULL
)
Analyze data with Multivariate INTegration (MINT) Sparse Partial Least Squares Discriminant Analysis (sPLS-DA).
Description
This function performs a MINT (Multivariate INTegrative) sPLS-DA to handle
batch effects by modeling a global biological signal across different studies or batches.
If a second grouping column (group_col2) is provided, the analysis is stratified
and performed for each level of that column.
Usage
cyt_mint_splsda(
data,
group_col,
batch_col,
group_col2 = NULL,
colors = NULL,
output_file = NULL,
ellipse = TRUE,
bg = FALSE,
var_num = 20,
comp_num = 2,
scale = c("none", "log2", "log10", "zscore", "custom"),
custom_fn = NULL,
tune = FALSE,
cim = FALSE,
roc = FALSE,
verbose = FALSE
)
Arguments
data |
A matrix or data frame containing the variables. Columns not
specified by |
group_col |
A string specifying the first grouping column name that contains grouping
information. If |
batch_col |
A string specifying the column that identifies the batch or study for each sample. |
group_col2 |
A string specifying the second grouping column name. Default is
|
colors |
A vector of splsda_colors for the groups or treatments. If
|
output_file |
Optional string specifying the name of the file
to be created. When |
ellipse |
Logical. Whether to draw a 95\
Default is |
bg |
Logical. Whether to draw the prediction background in the figures.
Default is |
var_num |
Numeric. The number of variables to be used in the PLS-DA model. |
comp_num |
Numeric. The number of components to calculate in the sPLS-DA model. Default is 2. |
scale |
Character string specifying a transformation to apply to the
numeric predictor columns prior to model fitting. Options are
"none", "log2", "log10", "zscore", or "custom". When
"custom" is selected a user defined function must be supplied via
|
custom_fn |
A custom transformation function used when
|
tune |
Logical. If |
cim |
Logical. Whether to compute and plot the Clustered Image Map (CIM) heatmap. Default is |
roc |
Logical. Whether to compute and plot the ROC curve for the model.
Default is |
verbose |
A logical value indicating whether to print additional
informational output to the console. When |
Details
When verbose is set to TRUE, additional information about the analysis and confusion matrices
are printed to the console. These can be suppressed by keeping verbose = FALSE.
Value
Plots consisting of the classification figures, ROC curves, correlation circle plots, and heatmaps.
Author(s)
Shubh Saraswat
References
Rohart F, Eslami A, Matigian, N, Bougeard S, Lê Cao K-A (2017). MINT: A multivariate integrative approach to identify a reproducible biomarker signature across multiple experiments and platforms. BMC Bioinformatics 18:128.
Examples
# Loading ExampleData5 dataset with batch column
data_df <- ExampleData5[,-c(2,4)]
data_df <- dplyr::filter(data_df, Group != "ND")
cyt_mint_splsda(data_df, group_col = "Group",
batch_col = "Batch", colors = c("black", "purple"),
ellipse = TRUE, var_num = 25, comp_num = 2,
scale = "log2", verbose = FALSE)
Analyze Data with Principal Component Analysis (PCA) for Cytokines.
Description
This function performs Principal Component Analysis (PCA) on cytokine data and generates several types of plots, including:
2D PCA plots using mixOmics'
plotIndivfunction,3D scatter plots (if
styleis "3d" or "3D" andcomp_numis 3) via the plot3D package,Scree plots showing both individual and cumulative explained variance,
Loadings plots, and
Biplots and correlation circle plots.
Usage
cyt_pca(
data,
group_col = NULL,
group_col2 = NULL,
colors = NULL,
output_file,
ellipse = FALSE,
comp_num = 2,
scale = c("none", "log2", "log10", "zscore", "custom"),
custom_fn = NULL,
pch_values = NULL,
style = NULL
)
Arguments
data |
A data frame containing cytokine data. It should include at least one column representing grouping information and optionally a second column representing treatment or stimulation. |
group_col |
A string specifying the column name that contains the first group
information. If |
group_col2 |
A string specifying the second grouping column. Default is
|
colors |
A vector of colors corresponding to the groups.
If set to NULL, a palette is generated using |
output_file |
Optional string specifying the name of the file
to be created. When |
ellipse |
Logical. If TRUE, a 95% confidence ellipse is drawn on the PCA individuals plot. Default is FALSE. |
comp_num |
Numeric. The number of principal components to compute and display. Default is 2. |
scale |
Character string specifying a transformation to apply to
numeric variables before PCA. Options are "none" (no
transformation), "log2", "log10", "zscore", or "custom". When
"custom" is selected, a user supplied function must be given via
|
custom_fn |
A custom function used when |
pch_values |
A vector of plotting symbols (pch values) to be used in the PCA plots. Default is NULL. |
style |
Character. If set to "3d" or "3D" and |
Value
A PDF file containing the PCA plots is generated and saved when
output_file is provided. Otherwise, plots are displayed on the current
graphics device.
Author(s)
Shubh Saraswat
Examples
# Load sample data
data <- ExampleData1[, -c(3,23)]
data_df <- dplyr::filter(data, Group != "ND" & Treatment != "Unstimulated")
# Run PCA analysis and save plots to a PDF file
cyt_pca(
data = data_df,
output_file = NULL,
colors = c("black", "red2"),
scale = "log2",
comp_num = 3,
pch_values = c(16, 4),
style = "3D",
group_col = "Group",
group_col2 = "Treatment",
ellipse = FALSE
)
Run Random Forest Classification on Cytokine Data,
Description
This function trains and evaluates a Random Forest classification model on cytokine data. It includes feature importance visualization, cross- validation for feature selection, and performance metrics such as accuracy, sensitivity, and specificity. Optionally, for binary classification, the function also plots the ROC curve and computes the AUC.
Usage
cyt_rf(
data,
group_col,
ntree = 500,
mtry = 5,
train_fraction = 0.7,
plot_roc = FALSE,
k_folds = 5,
step = 0.5,
run_rfcv = TRUE,
verbose = FALSE,
seed = 123,
cv = FALSE,
cv_folds = 5,
scale = c("none", "log2", "log10", "zscore", "custom"),
custom_fn = NULL
)
Arguments
data |
A data frame containing the cytokine measurements. One column should correspond to the grouping variable (the outcome) and the remaining columns should be numeric predictors. |
group_col |
A string naming the column in |
ntree |
Integer specifying the number of trees to grow. Default is 500. |
mtry |
Integer specifying the number of variables randomly sampled at each split. Default is 5. |
train_fraction |
Numeric between 0 and 1 giving the proportion of data used for training. The remainder is used for testing. Default is 0.7. |
plot_roc |
Logical. If |
k_folds |
Integer specifying the number of folds for |
step |
Numeric specifying the fraction of variables removed at each
step during |
run_rfcv |
Logical indicating whether to run Random Forest
cross-validation for feature selection. Default is |
verbose |
Logical indicating whether to print intermediate results.
When |
seed |
Optional integer seed for reproducibility. Default is 123. |
cv |
Logical indicating whether to perform a separate k-fold
classification cross-validation using |
cv_folds |
Integer specifying the number of folds for classification
cross-validation when |
scale |
Character string specifying a transformation to apply to the
numeric predictor columns prior to model fitting. Options are
"none" (no transformation), "log2", "log10", "zscore", or
"custom". When "custom" is selected a user defined function
must be supplied via |
custom_fn |
A custom transformation function used when |
Details
The function first coerces the grouping variable to a factor and splits
the dataset into training and test subsets according to
train_fraction. A Random Forest classifier is fit to the training
data using the specified ntree and mtry parameters. The model
performance is assessed on both the training and test sets, and
results are printed when verbose = TRUE. If plot_roc = TRUE and
the grouping variable has exactly two levels, an ROC curve is computed
on the test set and a plot is returned. Variable importance is
extracted and visualized with a bar plot. Optionally, cross-
validation for feature selection (rfcv) is performed and the error
curve is plotted. A separate k-fold classification cross-
validation using caret::train can be requested via cv = TRUE.
Value
An invisible list with components:
model |
The fitted |
confusion_matrix |
Confusion matrix on the test set. |
importance_plot |
A |
importance_data |
A data frame of variable importance values. |
rfcv_result |
The |
rfcv_plot |
A |
rfcv_data |
A data frame summarizing the |
roc_plot |
A |
cv_results |
A |
Examples
data.df0 <- ExampleData1
data.df <- data.frame(data.df0[, 1:3], log2(data.df0[, -c(1:3)]))
data.df <- data.df[, -c(2:3)]
data.df <- dplyr::filter(data.df, Group != "ND")
cyt_rf(
data = data.df, group_col = "Group", k_folds = 5, ntree = 1000,
mtry = 4, run_rfcv = TRUE, plot_roc = TRUE, verbose = FALSE
)
Distribution of the Data Set Shown by Skewness and Kurtosis.
Description
This function computes summary statistics — including sample
size, mean, standard error, skewness, and kurtosis — for each numeric
measurement column in a data set. If grouping columns are provided via
group_cols, the function computes the metrics separately for each group
defined by the combination of these columns (using the first element as
the treatment variable and the second as the grouping variable, or
the same column for both if only one is given). When no grouping columns
are provided, the entire data set is treated as a single group ("Overall").
A log2 transformation (using a cutoff equal to one-tenth of the smallest
positive value in the data) is applied to generate alternative metrics.
Histograms showing the distribution of skewness and kurtosis for both raw
and log2-transformed data are then generated and saved to a PDF if a file
name is provided.
Usage
cyt_skku(
data,
group_cols = NULL,
output_file = NULL,
print_res_raw = FALSE,
print_res_log = FALSE
)
Arguments
data |
A matrix or data frame containing the raw data. If
|
group_cols |
A character vector specifying the names of the grouping columns. When provided, the first element is treated as the treatment variable and the second as the group variable. If not provided, the entire data set is treated as one group. |
output_file |
Optional string specifying the name of the file
to be created. When |
print_res_raw |
Logical. If |
print_res_log |
Logical. If |
Details
A cutoff is computed as one-tenth of the minimum positive value among all numeric measurement columns to avoid taking logarithms of zero. When grouping columns are provided, the function loops over unique grouping columns and computes the metrics for each measurement column within each subgroup. Without grouping columns, the entire data set is analyzed as one overall group.
Value
The function generates histograms of skewness and kurtosis for both
raw and log2-transformed data. Additionally, if either
printResRaw and/or printResLog is TRUE, the function
returns the corresponding summary statistics as a data frame or a list of
data frames.
Author(s)
Xiaohua Douglas Zhang and Shubh Saraswat
Examples
# Example with grouping columns (e.g., "Group" and "Treatment")
data(ExampleData1)
cyt_skku(ExampleData1[, -c(2:3)], output_file = NULL,
group_cols = c("Group")
)
# Example without grouping columns (analyzes the entire data set)
cyt_skku(ExampleData1[, -c(1:3)], output_file = NULL)
Analyze data with Sparse Partial Least Squares Discriminant Analysis (sPLS-DA).
Description
This function conducts Sparse Partial Least Squares Discriminant Analysis
(sPLS-DA) on the provided data. It uses the specified group_col (and
optionally group_col2) to define class labels while assuming the remaining
columns contain continuous variables. The function supports transformations
via the scale parameter and generates a series of plots,
including classification plots, scree plots, loadings plots, and VIP score
plots. Optionally, ROC curves are produced when roc is TRUE.
Additionally, cross-validation is supported via LOOCV or Mfold methods. When
both group_col and group_col2 are provided and differ, the function
analyzes each treatment level separately.
Usage
cyt_splsda(
data,
group_col = NULL,
group_col2 = NULL,
multilevel_col = NULL,
batch_col = NULL,
ind_names = FALSE,
colors = NULL,
output_file = NULL,
ellipse = FALSE,
bg = FALSE,
conf_mat = FALSE,
var_num,
cv_opt = NULL,
fold_num = 5,
scale = c("none", "log2", "log10", "zscore", "custom"),
custom_fn = NULL,
tune = FALSE,
tune_folds = 5,
comp_num = 2,
pch_values,
style = NULL,
roc = FALSE,
verbose = FALSE,
seed = 123
)
Arguments
data |
A matrix or data frame containing the variables. Columns not
specified by |
group_col |
A string specifying the column name that contains the first group
information. If |
group_col2 |
A string specifying the second grouping column. Default is
|
multilevel_col |
A string specifying the column name that identifies
repeated measurements (e.g., patient or sample IDs). If provided, a
multilevel analysis will be performed. Default is |
batch_col |
A string specifying the column that identifies the batch or study for each sample. |
ind_names |
If |
colors |
A vector of colors for the groups or treatments. If
|
output_file |
Optional string specifying the name of the file
to be created. When |
ellipse |
Logical. Whether to draw a 95\
figures. Default is |
bg |
Logical. Whether to draw the prediction background in the figures.
Default is |
conf_mat |
Logical. Whether to print the confusion matrix for the
classifications. Default is |
var_num |
Numeric. The number of variables to be used in the PLS-DA model. |
cv_opt |
Character. Option for cross-validation method: either
"loocv" or "Mfold". Default is |
fold_num |
Numeric. The number of folds to use if |
scale |
Character string specifying a transformation to apply to the
numeric predictor columns prior to model fitting. Options are
"none", "log2", "log10", "zscore", or "custom". When
"custom" is selected a user defined function must be supplied via
|
custom_fn |
A custom transformation function used when
|
tune |
Logical. If |
tune_folds |
Integer. Number of folds in cross‑validation when tuning. Default is 5. |
comp_num |
Numeric. The number of components to calculate in the sPLS-DA model. Default is 2. |
pch_values |
A vector of integers specifying the plotting characters (pch values) to be used in the plots. |
style |
Character. If set to |
roc |
Logical. Whether to compute and plot the ROC curve for the model.
Default is |
verbose |
A logical value indicating whether to print additional
informational output to the console. When |
seed |
An integer specifying the seed for reproducibility (default is 123). |
Details
When verbose is set to TRUE, additional information about the analysis and confusion matrices
are printed to the console. These can be suppressed by keeping verbose = FALSE.
Value
Plots consisting of the classification figures, component figures with Variable of Importance in Projection (VIP) scores, and classifications based on VIP scores greater than 1. ROC curves and confusion matrices are also produced if requested.
Author(s)
Xiaohua Douglas Zhang and Shubh Saraswat
References
Lê Cao, K.-A., Boitard, S. and Besse, P. (2011). Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12:253.
Examples
# Loading Sample Data
data_df <- ExampleData1[,-c(3)]
data_df <- dplyr::filter(data_df, Group != "ND", Treatment != "Unstimulated")
cyt_splsda(data_df, output_file = NULL,
colors = c("black", "purple"), bg = FALSE, scale = "log2",
conf_mat = FALSE, var_num = 25, cv_opt = NULL, comp_num = 2,
pch_values = c(16, 4), style = NULL, ellipse = TRUE,
group_col = "Group", group_col2 = "Treatment", roc = FALSE, verbose = FALSE)
Two Sample T-test Comparisons. ![[Deprecated]](./figures/lifecycle-deprecated.svg)
Description
This function performs pairwise comparisons between two groups for each combination
of a categorical predictor (with exactly two levels) and a continuous outcome variable.
It first converts any character variables in data to factors and, if specified,
applies a log2 transformation to the continuous variables. Depending on the value of
scale, the function conducts either a two-sample t-test (if scale = "log2")
or a Mann-Whitney U test (if scale is NULL). The resulting p-values are printed
and returned.
Usage
cyt_ttest(data, scale = NULL, verbose = TRUE, format_output = FALSE)
Arguments
data |
A matrix or data frame containing continuous and categorical variables. |
scale |
A character specifying a transformation for continuous variables.
Options are |
verbose |
A logical indicating whether to print the p-values of the statistical tests.
Default is |
format_output |
Logical. If TRUE, returns the results as a tidy data frame.
Default is |
Value
If format_output is FALSE, returns a list of p-values (named by Outcome and Categorical variable).
If TRUE, returns a data frame in a tidy format.
Author(s)
Shubh Saraswat
Examples
data_df <- ExampleData1[, -c(3)]
data_df <- dplyr::filter(data_df, Group != "ND", Treatment != "Unstimulated")
# Test example
cyt_ttest(
data_df[, c(1:2, 5:6)],
scale = "log2",
verbose = TRUE,
format_output = TRUE
)
Pairwise Univariate Tests Between Two Groups
Description
cyt_univariate supports additional scaling options and explicit
choice of statistical test.
For each categorical predictor with exactly two levels and each
numeric outcome, a two‑sample t‑test or Wilcoxon rank–sum test is
performed. Results are returned either as a list of test objects
or, if format_output = TRUE, as a tidy data frame with one
row per comparison.
Usage
cyt_univariate(
data,
scale = NULL,
method = c("auto", "ttest", "wilcox"),
verbose = TRUE,
format_output = FALSE,
custom_fn = NULL,
p_adjust_method = NULL
)
Arguments
data |
A data frame or matrix containing both categorical and numeric variables. |
scale |
A character specifying a transformation to apply to
numeric variables prior to testing. Choices are |
method |
Character specifying the test to perform. Use "auto" (default) to select between t‑test and Wilcoxon based on Shapiro–Wilk normality tests for each outcome; "ttest" to always use Student’s t‑test; or "wilcox" to always use the Wilcoxon rank–sum test. |
verbose |
Logical indicating whether to return the results. Provided for backward compatibility but has no effect on printing. |
format_output |
Logical. If |
custom_fn |
A function to apply when |
p_adjust_method |
Character or |
Value
If format_output = FALSE, a named list of test objects
keyed by "Outcome_Categorical". If format_output = TRUE, a
data frame with columns Outcome, Categorical, Comparison,
Test, Estimate, Statistic, and P_value.
Author(s)
Shubh Saraswat
Examples
data_df <- ExampleData1[, -c(3)]
data_df <- dplyr::filter(data_df, Group != "ND", Treatment != "Unstimulated")
cyt_univariate(data_df[, c(1:2, 5:6)], scale = "log2",
method = "auto", format_output = TRUE)
Univariate Tests for Multi‑Level Categorical Predictors
Description
cyt_univariate_multiprovides univariate statistical testing for
categorical predictors with more than two levels. For each
categorical predictor and numeric outcome pair, a global test is
performed, followed by pairwise comparisons if the global test is
significant. Users may choose between two methods,
classical ANOVA with Tukey’s Honest Significant Difference (HSD)
or a non‑parametric Kruskal–Wallis test followed by pairwise
Wilcoxon rank–sum tests. The return format can either be a list
of adjusted p‑values for each outcome–predictor pair or, if
format_output = TRUE, a tidy data frame summarizing all
pairwise comparisons.
Usage
cyt_univariate_multi(
data,
method = c("anova", "kruskal"),
cat_vars = NULL,
cont_vars = NULL,
p_adjust_method = "BH",
format_output = FALSE
)
Arguments
data |
A data frame or matrix containing both categorical and continuous variables. Character columns will be converted to factors. |
method |
Character specifying the type of global test to perform. Use "anova" (default) for one‑way ANOVA with Tukey HSD or "kruskal" for Kruskal–Wallis with pairwise Wilcoxon tests. |
cat_vars |
Optional character vector of predictor column
names. When |
cont_vars |
Optional character vector of numeric outcome
variable names. When |
p_adjust_method |
Character string specifying the method for
p‑value adjustment across pairwise comparisons. Passed to
|
format_output |
Logical. If |
Value
Either a list (if format_output = FALSE) or a data
frame (if format_output = TRUE).
Author(s)
Shubh Saraswat
Examples
data("ExampleData1")
cyt_univariate_multi(ExampleData1[, c(1:2, 5:6)], method = "kruskal",
format_output = TRUE)
Violin Plots for Continuous Variables with Optional Grouping
Description
cyt_violin produces violin plots for each numeric variable in
data, optionally grouped by one or more categorical variables.
When grouping is not specified, the function behaves similarly to
cyt_bp but uses violins instead of boxplots and supports
pagination via the bin_size argument. When grouping is
provided, a separate violin is drawn for each level (or
interaction of levels) of the grouping variables. Users may
optionally overlay boxplots within each violin to visualize the
median and interquartile range.
Usage
cyt_violin(
data,
output_file = NULL,
group_by = NULL,
bin_size = 25,
y_lim = NULL,
scale = c("none", "log2", "log10", "zscore", "custom"),
custom_fn = NULL,
boxplot_overlay = FALSE
)
Arguments
data |
A matrix or data frame containing numeric and categorical variables. |
output_file |
Optional string specifying the name of the file
to be created. When |
group_by |
Optional character vector specifying one or more
columns to use for grouping. If |
bin_size |
Integer. Maximum number of violins per page when
grouping is not used. Default is 25, mirroring |
y_lim |
Optional numeric vector giving y‑axis limits for the plots. Applies to all plots. |
scale |
Character specifying a transformation for numeric
variables. Accepts "none", "log2", "log10",
"zscore", or "custom". When "custom", supply a
function via |
custom_fn |
A user supplied function to transform numeric
columns when |
boxplot_overlay |
Logical. When |
Value
Invisibly returns a list of ggplot objects. When
output_file is provided, plots are written to the PDF file.
Author(s)
Shubh Saraswat
Examples
# Violin plots without grouping
cyt_violin(ExampleData1[, -c(1:3)], output_file = NULL, scale = "zscore")
# Violin plots grouped by Group with boxplot overlay
cyt_violin(ExampleData1[, -c(3,5:28)], group_by = "Group",
boxplot_overlay = TRUE, scale = "log2")
Volcano plot
Description
This function subsets the numeric columns from the input data and compares them based on a selected grouping column. It computes the fold changes (as the ratio of means) and associated p-values for each numeric variable between two groups. The results are log2-transformed (for fold change) and -log10-transformed (for p-values) to generate a volcano plot. Additionally, there is a choice between t‑tests and Wilcoxon rank‑sum tests and adjusting p‑values for multiple comparisons.
Usage
cyt_volc(
data,
group_col,
cond1 = NULL,
cond2 = NULL,
fold_change_thresh = 2,
p_value_thresh = 0.05,
top_labels = 10,
method = c("ttest", "wilcox"),
p_adjust_method = NULL,
add_effect = FALSE,
verbose = FALSE
)
Arguments
data |
A data frame containing numeric variables and a grouping column. |
group_col |
Character. Name of the grouping column. |
cond1 |
Character string specifying the level of |
cond2 |
Character strings specifying the second level of
|
fold_change_thresh |
Numeric. Threshold for absolute fold change (in original scale). Default is 2. |
p_value_thresh |
Numeric. Threshold for the p‑value (raw or adjusted). Default is 0.05. |
top_labels |
Integer. Number of top points to label in each plot. Default is 10. |
method |
Character. Statistical test to use. "ttest" (default) uses two‑sample t‑tests; "wilcox" uses Wilcoxon rank‑sum tests. |
p_adjust_method |
Character or |
add_effect |
Logical. If |
verbose |
Logical. If |
Value
A list of ggplot objects (one per comparison). Each plot
visualizes log2 fold change on the x‑axis and –log10 of the
(adjusted) p‑value on the y‑axis. The underlying data used to
construct the final plot are printed when verbose = TRUE.
Note
If cond1 and cond2 are not provided, the function
automatically generates all possible pairwise combinations of groups from
the specified group_col for comparisons.
Author(s)
Xiaohua Douglas Zhang and Shubh Saraswat
Examples
# Loading data
data_df <- ExampleData1[,-c(2:3)]
cyt_volc(data_df, "Group", cond1 = "T2D", cond2 = "ND", fold_change_thresh = 2.0, top_labels= 15)
Run XGBoost Classification on Cytokine Data.
Description
This function trains and evaluates an XGBoost classification model on cytokine data. It allows for hyperparameter tuning, cross-validation, and visualizes feature importance.
Usage
cyt_xgb(
data,
group_col,
train_fraction = 0.7,
nrounds = 500,
max_depth = 6,
learning_rate = 0.1,
nfold = 5,
cv = FALSE,
objective = "multi:softprob",
early_stopping_rounds = NULL,
eval_metric = "mlogloss",
min_split_loss = 0,
colsample_bytree = 1,
subsample = 1,
min_child_weight = 1,
top_n_features = 10,
verbose = 0,
plot_roc = FALSE,
print_results = FALSE,
seed = 123,
scale = c("none", "log2", "log10", "zscore", "custom"),
custom_fn = NULL
)
Arguments
data |
A data frame containing numeric predictor variables and one grouping column. Non-numeric predictor columns will be coerced to numeric if possible. |
group_col |
Character string naming the column that contains the class labels (target variable). Required. |
train_fraction |
Numeric between 0 and 1 specifying the proportion of samples used for model training. The remainder is reserved for testing. Default is 0.7. |
nrounds |
Integer specifying the number of boosting rounds. Default is 500. |
max_depth |
Integer specifying the maximum depth of each tree. Default is 6. |
learning_rate |
Numeric specifying the learning rate. Default is 0.1. |
nfold |
Integer specifying the number of folds for
cross-validation when |
cv |
Logical indicating whether to perform cross-validation
using |
objective |
Character string specifying the objective function. For multi-class classification use "multi:softprob"; for binary classification use "binary:logistic". Default is "multi:softprob". |
early_stopping_rounds |
Integer specifying the number of rounds without improvement to trigger early stopping. Default is NULL (no early stopping). |
eval_metric |
Character specifying the evaluation metric used during training. Default is "mlogloss". |
min_split_loss |
Numeric specifying the minimum loss reduction required to make a further partition. Default is 0. |
colsample_bytree |
Numeric specifying the subsample ratio of columns when constructing each tree. Default is 1. |
subsample |
Numeric specifying the subsample ratio of the training instances. Default is 1. |
min_child_weight |
Numeric specifying the minimum sum of instance weight needed in a child. Default is 1. |
top_n_features |
Integer specifying the number of top features to display in the importance plot. Default is 10. |
verbose |
Integer (0, 1 or 2) controlling the verbosity of
|
plot_roc |
Logical indicating whether to plot the ROC curve and
compute the AUC for binary classification. Default is |
print_results |
Logical. If |
seed |
Optional integer seed for reproducibility. Default is 123. |
scale |
Character string specifying a transformation to apply
to numeric predictors prior to model fitting. Possible values
are "none", "log2", "log10", "zscore" or "custom". When set
to "custom" a user defined function must be supplied via
|
custom_fn |
A custom transformation function used when
|
Value
An invisible list with elements: model (the trained
xgboost object), confusion_matrix (test set confusion
matrix), importance (variable importance matrix),
class_mapping (mapping from class names to numeric labels),
cv_results (cross-validation results when cv=TRUE),
importance_plot (a ggplot2 object of the top feature
importance), and roc_plot (a ROC curve for binary
classification when plot_roc=TRUE). All plots are printed
automatically.
Examples
data_df0 <- ExampleData1
data_df <- data.frame(data_df0[, 1:3], log2(data_df0[, -c(1:3)]))
data_df <- data_df[, -c(2:3)]
data_df <- dplyr::filter(data_df, Group != "ND")
cyt_xgb(
data = data_df,
group_col = "Group",
nrounds = 250,
max_depth = 4,
min_split_loss = 0,
learning_rate = 0.05,
nfold = 5,
cv = FALSE,
objective = "multi:softprob",
eval_metric = "auc",
plot_roc = TRUE,
print_results = FALSE)