| Type: | Package |
| Title: | Hierarchical Adaptive 'RT-QuIC' Classification for Complex Matrices |
| Version: | 1.0.0 |
| Description: | Extends 'RT-QuIC' (Real-Time Quaking-Induced Conversion) statistical analysis to complex environmental matrices through hierarchical adaptive classification. 'KWELA' is named after a deity of the Fore people of Papua New Guinea, among whom Kuru, a notable human prion disease, was identified. Implements a 6-layer architecture: hard gate biological constraints, per-well adaptive scoring, separation-aware combination, Youden-optimized cutoffs, replicate consensus, and matrix instability detection. Features dual-mode operation (diagnostic/research), auto-profile selection (Standard/Sensitive/Matrix-Robust), RAF integration for artifact detection, matrix-aware baseline correction, and multiple consensus rules. Methods include energy distance (Szekely and Rizzo (2013) <doi:10.1016/j.jspi.2013.03.018>), CRPS (Gneiting and Raftery (2007) <doi:10.1198/016214506000001437>), SSMD (Zhang (2007) <doi:10.1016/j.ygeno.2007.01.005>), and Jensen-Shannon divergence (Lin (1991) <doi:10.1109/18.61115>). This package implements methodology currently under peer review; please contact the author before publication using this approach. Development followed an iterative human-machine collaboration where all algorithmic design, statistical methodologies, and biological validation logic were conceptualized, tested, and iteratively refined by Richard A. Feiss through repeated cycles of running experimental data, evaluating analytical outputs, and selecting among candidate algorithms and approaches. AI systems ('Anthropic Claude' and 'OpenAI GPT') served as coding assistants and analytical sounding boards under continuous human direction. The selection of statistical methods, evaluation of biological plausibility, and all final methodology decisions were made by the human author. AI systems did not independently originate algorithms, statistical approaches, or scientific methodologies. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/RFeissIV/KWELA |
| BugReports: | https://github.com/RFeissIV/KWELA/issues |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Depends: | R (≥ 4.0.0) |
| Imports: | stats, graphics |
| Suggests: | testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-02-23 21:10:41 UTC; root |
| Author: | Richard A. Feiss IV
|
| Maintainer: | Richard A. Feiss IV <feiss026@umn.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-02-28 21:10:02 UTC |
KWELA: Hierarchical Adaptive 'RT-QuIC' Classification
Description
Extends 'RT-QuIC' (Real-Time Quaking-Induced Conversion) statistical analysis to complex environmental matrices through hierarchical adaptive classification. 'KWELA' is named after a deity of the Fore people of Papua New Guinea, among whom Kuru, a notable human prion disease, was identified.
Value
No return value. This is a package documentation page.
6-Layer Architecture
- Layer 1: Hard Gate
Biological constraint filter
- Layer 2: Per-Well Scoring
Profile-dependent adaptive transforms
- Layer 3: Adaptive Combination
Separation-aware score combiner
- Layer 4: Adaptive Cutoff
Youden-optimized threshold
- Layer 5: Replicate Consensus
Treatment-level classification
- Layer 6: Instability Detection
Matrix interference override
Dual-Mode Operation
- diagnostic
Default mode. Deterministic classification from TTT/MP/RAF evidence only. No stochastic rescue or score blending.
- research
Full adaptive architecture with stochastic rescue and distributional scoring.
Main Functions
kwela_analyzeMain analysis function
kwela_summarizeTreatment-level summary
kwela_diagnosticsDiagnostic information
compute_instability_flagsMatrix instability detection
Plate/Batch Functions
kwela_plate_normalizeMulti-plate normalization
kwela_bootstrap_summaryBootstrap confidence intervals
Separation Quality Assessment
Description
cohens_d computes Cohen's d effect size between two distributions.
assess_separation evaluates PC/NC separation quality and recommends
a classification profile. Cohen's d thresholds are heuristic, calibrated
against three RT-QuIC datasets.
Usage
cohens_d(x, y)
assess_separation(pc_mp, nc_mp, pc_ttt, nc_ttt)
Arguments
x |
First distribution (numeric vector) |
y |
Second distribution (numeric vector) |
pc_mp |
Positive control MP values |
nc_mp |
Negative control MP values |
pc_ttt |
Positive control TTT values |
nc_ttt |
Negative control TTT values |
Value
cohens_d returns a single numeric value (positive = x > y),
or NA_real_ if insufficient data.
assess_separation returns a list with:
- d_mp
Cohen's d for MP
- d_ttt
Cohen's d for TTT
- d_combined
Median of absolute d values
- regime
Separation regime: "poor_separation", "moderate_separation", or "strong_separation"
- recommended_profile
Recommended profile: "matrix_robust", "sensitive", or "standard"
Examples
set.seed(42)
pc_mp <- rnorm(8, 100, 10)
nc_mp <- rnorm(8, 20, 5)
pc_ttt <- rnorm(8, 8, 1)
nc_ttt <- rnorm(8, 72, 5)
cohens_d(pc_mp, nc_mp)
assess_separation(pc_mp, nc_mp, pc_ttt, nc_ttt)
Compute Instability Flags for a Treatment
Description
Evaluates 6 deterministic metrics to detect matrix interference that compromises classification reliability. A treatment is flagged as unstable when it behaves unlike both positive and negative controls.
Usage
compute_instability_flags(
trt_mp,
trt_ttt,
pc_mp,
nc_mp,
pc_ttt,
nc_ttt,
trt_raf = NULL,
trt_mp_raf = NULL,
pc_raf_mp_ratios = NULL,
crossing_threshold = NA_real_,
strictness = "moderate",
has_raf = FALSE
)
Arguments
trt_mp |
Finite MP values for treatment wells |
trt_ttt |
TTT values for treatment wells (may contain NA/Inf) |
pc_mp, nc_mp |
Positive/negative control MP values (finite) |
pc_ttt, nc_ttt |
Positive/negative control TTT values |
trt_raf |
RAF values for treatment wells (NULL if unavailable) |
trt_mp_raf |
MP values corresponding to RAF wells (NULL if unavailable) |
pc_raf_mp_ratios |
RAF/MP ratios from positive controls (NULL if unavailable) |
crossing_threshold |
TTT crossing threshold for this treatment's group |
strictness |
"moderate" (2+ flags), "strict" (1+), or "lenient" (3+) |
has_raf |
Whether RAF data is available |
Details
The six metrics evaluated are:
Fano factor deviation from positive controls
Crossing variability (ambiguous threshold crossing rate)
Wasserstein distance from BOTH controls (normalized by NC spread)
Energy distance from BOTH controls (normalized by NC spread)
TTT dispersion ratio vs positive controls
RAF-MP ratio inconsistency vs positive controls
All metrics are deterministic (no random number generation).
Value
List with components: unstable (logical), reasons
(character vector), n_flags (integer), min_flags_required
(integer), metrics (named list).
Examples
set.seed(42)
flags <- compute_instability_flags(
trt_mp = rnorm(8, 50, 20),
trt_ttt = rnorm(8, 40, 15),
pc_mp = rnorm(8, 100, 10),
nc_mp = rnorm(8, 20, 5),
pc_ttt = rnorm(8, 8, 1),
nc_ttt = rnorm(8, 72, 5),
crossing_threshold = 40,
strictness = "moderate"
)
Compute Stochastic Profile
Description
Computes comprehensive stochastic metrics for a sample compared to controls.
Usage
compute_stochastic_profile(sample_values, pc_values, nc_values)
Arguments
sample_values |
Numeric vector of sample measurements |
pc_values |
Numeric vector of positive control measurements |
nc_values |
Numeric vector of negative control measurements |
Value
List containing comprehensive stochastic metrics:
- mean, var, cv, fano
Basic sample statistics
- cv_ratio_pc, cv_ratio_nc
CV ratios vs controls
- fano_ratio_pc
Fano factor ratio vs PC
- stochastic_index
Normalized position between NC and PC variance
- ssmd_vs_nc, ssmd_vs_pc
SSMD vs controls
- snr
Signal-to-noise ratio vs NC
- js_vs_pc, js_vs_nc, js_ratio
Jensen-Shannon metrics
- energy_vs_pc, energy_vs_nc, energy_ratio
Energy distance metrics
- wasserstein_vs_pc, wasserstein_vs_nc, wasserstein_ratio
Wasserstein metrics
- log_dist_pc, log_dist_nc
Log-space Euclidean distances
Examples
set.seed(42)
pc <- rnorm(20, 100, 10)
nc <- rnorm(20, 20, 5)
sample <- rnorm(20, 80, 15)
profile <- compute_stochastic_profile(sample, pc, nc)
Proper Scoring Rules
Description
Proper scoring rules for evaluating probabilistic forecasts.
Usage
crps_empirical(forecast_samples, observation)
dawid_sebastiani(observation, predicted_mean, predicted_var)
log_predictive_score(observation, predicted_mean, predicted_sd)
interval_score(observation, lower, upper, alpha = 0.1)
Arguments
forecast_samples |
Numeric vector of samples from forecast distribution |
observation |
Single observed value |
predicted_mean |
Predicted mean |
predicted_var |
Predicted variance |
predicted_sd |
Predicted standard deviation |
lower |
Lower bound of prediction interval |
upper |
Upper bound of prediction interval |
alpha |
Nominal miscoverage rate (default 0.1 for 90% interval) |
Value
All functions return a single numeric value or NA_real_ if inputs
are invalid.
crps_empirical returns a non-negative score (lower is better).
dawid_sebastiani returns the DS score (calibration metric).
log_predictive_score returns negative log density.
interval_score returns width plus penalties for miscoverage.
References
Gneiting T, Raftery AE (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102(477):359-378.
Examples
set.seed(42)
forecast <- rnorm(50, 5, 1)
crps_empirical(forecast, 5.5)
dawid_sebastiani(5.5, 5, 1)
log_predictive_score(5.5, 5, 1)
interval_score(5.5, 3, 7, alpha = 0.1)
Stochasticity Metrics
Description
Functions for quantifying stochastic variability in RT-QuIC data.
Usage
cv(x)
fano_factor(x)
ssmd(treatment, control)
normalized_stochastic_index(treatment_var, pc_var, nc_var)
snr_vs_control(treatment, control)
Arguments
x |
Numeric vector |
treatment |
Numeric vector of treatment values |
control |
Numeric vector of control values |
treatment_var |
Treatment variance |
pc_var |
Positive control variance |
nc_var |
Negative control variance |
Value
All functions return a single numeric value or NA_real_ if
insufficient data.
cv returns the coefficient of variation (MAD-based).
fano_factor returns the variance-to-mean ratio.
ssmd returns the strictly standardized mean difference.
normalized_stochastic_index returns a value in [0, 1].
snr_vs_control returns the signal-to-noise ratio.
References
Zhang XHD (2007). A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics 89(4):552-61.
Examples
set.seed(42)
x <- rnorm(30, 10, 2)
y <- rnorm(30, 5, 2)
cv(x)
fano_factor(x + 10)
ssmd(x, y)
snr_vs_control(x, y)
normalized_stochastic_index(var(x), 4, 1)
Distance Metrics
Description
Geometric distance metrics for comparing distributions.
energy_distance uses deterministic quantile subsampling for
large datasets (no RNG dependency).
Usage
energy_distance(x, y, max_n = NULL)
wasserstein_1d(x, y)
log_euclidean_distance(x, y)
mahalanobis_distance(sample_params, reference_mean, reference_cov)
Arguments
x |
First distribution (numeric vector) |
y |
Second distribution (numeric vector) |
max_n |
Maximum sample size before subsampling (NULL = no limit) |
sample_params |
Named numeric vector of sample parameters |
reference_mean |
Named numeric vector of reference means |
reference_cov |
Covariance matrix of reference distribution |
Value
All functions return a single non-negative numeric value or
NA_real_ if insufficient data or computation fails.
References
Szekely GJ, Rizzo ML (2013). Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference 143(8):1249-72.
Examples
set.seed(42)
x <- rnorm(30, 0, 1)
y <- rnorm(30, 2, 1)
energy_distance(x, y)
wasserstein_1d(x, y)
log_euclidean_distance(exp(x), exp(y))
mahalanobis_distance(c(1, 2), c(0, 0), diag(2))
Information-Theoretic Metrics
Description
entropy computes Shannon entropy from discretized distribution.
jensen_shannon computes symmetric Jensen-Shannon divergence
with Laplace smoothing.
Usage
entropy(x, n_bins = 10)
jensen_shannon(p, q, n_bins = 15)
Arguments
x |
Numeric vector |
n_bins |
Number of histogram bins |
p |
First distribution (numeric vector) |
q |
Second distribution (numeric vector) |
Value
entropy returns entropy in nats, or NA_real_ if fewer
than 3 values.
jensen_shannon returns JSD bounded in [0, ln(2)], or NA_real_
if either distribution has fewer than 5 values.
References
Lin J (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 37(1):145-151.
Examples
entropy(rnorm(100))
set.seed(42)
p <- rnorm(50, 0, 1)
q <- rnorm(50, 2, 1)
jensen_shannon(p, q)
KWELA Main Analysis Function
Description
Implements hierarchical adaptive classification for 'RT-QuIC' data:
- Layer 1: Hard Gate
Biological constraint filter
- Layer 2: Per-Well Scoring
Profile-dependent transforms
- Layer 3: Adaptive Combination
Separation-aware combiner
- Layer 4: Adaptive Cutoff
Youden-optimized threshold
- Layer 5: Replicate Consensus
Treatment-level classification
- Layer 6: Instability Detection
Matrix interference override
Usage
kwela_analyze(
data,
pc_pattern = "\\bPositive\\s*Control\\b|^POS\\b|\\bPC\\b",
nc_pattern = "\\bNegative\\s*Control\\b|^NEG\\b|\\bNC\\b|\\bTDB\\b|\\bBlank\\b",
spiked_pattern = "\\+\\s*Pos|Pos\\s*\\d*%|spiked|\\+\\s*CWD",
profile = c("auto", "standard", "sensitive", "matrix_robust"),
consensus = c("majority", "strict", "flexible", "threshold"),
consensus_threshold = 0.5,
matrix_groups = NULL,
use_raf = TRUE,
mode = c("diagnostic", "research"),
instability_check = TRUE,
instability_strictness = c("moderate", "strict", "lenient"),
verbose = TRUE
)
Arguments
data |
Data frame with Treatment, TTT, MP columns (RAF optional) |
pc_pattern |
Regex for positive controls |
nc_pattern |
Regex for negative controls |
spiked_pattern |
Regex for spiked samples |
profile |
Classification profile: "auto", "standard", "sensitive", or "matrix_robust". "auto" selects based on separation quality. |
consensus |
Replicate consensus rule: "majority", "strict", "flexible", or "threshold" |
consensus_threshold |
For "threshold" consensus: minimum mean well score to classify treatment as positive (default 0.5) |
matrix_groups |
Optional: column name for matrix grouping (enables per-group baseline correction). NULL = global baselines. |
use_raf |
Logical: include RAF in scoring (default TRUE if present) |
mode |
"diagnostic" (deterministic, no stochastic rescue) or "research" (full adaptive architecture). Default: "diagnostic". |
instability_check |
Logical: run instability detection (default TRUE) |
instability_strictness |
"moderate" (2+ flags), "strict" (1+), or "lenient" (3+). Controls override sensitivity. |
verbose |
Print progress |
Value
Data frame with per-well results including:
- Type
Well type: "PC", "NC", or "Sample"
- is_spiked
Logical indicating if sample is spiked
- crossed_threshold
Logical for TTT threshold crossing
- has_signal
Logical for signal above NC threshold
- hard_gate_pass
Logical for Layer 1 gate passage
- artifact_flag
Logical for suspected artifacts
- stoch_rescue
Logical for stochastic rescue (research mode only)
- ttt_score
TTT component score (0-1)
- signal_score
Signal component score (0-1)
- raf_score
RAF component score (0-1, if available)
- stoch_score
Stochastic component score (0-1)
- well_score
Combined well score (0-1)
- well_positive
Logical for well-level positive classification
- seed_score
Alias for well_score
- classification
Well classification: "POSITIVE", "NEGATIVE", "INCONCLUSIVE", "ARTIFACT_SUSPECT", or "INVALID"
- confidence
Confidence level: "VERY_HIGH", "HIGH", "MODERATE", "LOW", or "CONTROL"
- trt_classification
Treatment-level consensus classification (may be "INCONCLUSIVE_MATRIX_EFFECT" if instability detected)
- trt_confidence
Treatment-level confidence
- trt_positive_rate
Fraction of positive wells in treatment
- trt_mean_score
Mean well score for treatment
- matrix_instability
Logical for Layer 6 instability override
Plus attributes: pc_stats, nc_stats, separation, profile, consensus, optimal_cutoff, youden_j, thresholds, trt_summary, control_check, score_semantics, version, mode, instability_check, instability_strictness, instability_summary, group_controls, group_profiles, group_separation.
Examples
set.seed(42)
df <- data.frame(
Treatment = c(rep("Positive Control", 8), rep("Negative Control", 8),
rep("Sample_A", 8)),
TTT = c(rnorm(8, 8, 1), rnorm(8, 72, 5), rnorm(8, 12, 3)),
MP = c(rnorm(8, 100, 10), rnorm(8, 20, 5), rnorm(8, 85, 15))
)
result <- kwela_analyze(df)
summary <- kwela_summarize(result)
Plate Normalization and Bootstrap
Description
kwela_plate_normalize normalizes TTT and MP values within each
plate using negative control baselines.
kwela_bootstrap_summary computes bootstrap confidence intervals
for treatment scores. Set seed externally for reproducibility.
Usage
kwela_plate_normalize(
data,
plate_col = "Plate",
pc_pattern = "\\bPositive\\s*Control\\b|^POS\\b|\\bPC\\b",
nc_pattern = "\\bNegative\\s*Control\\b|^NEG\\b|\\bNC\\b|\\bTDB\\b|\\bBlank\\b",
method = c("zscore", "median_ratio")
)
kwela_bootstrap_summary(result, B = 1000, conf = 0.95)
Arguments
data |
Data frame with Treatment, TTT, MP columns |
plate_col |
Name of plate identifier column |
pc_pattern |
Regex for positive controls |
nc_pattern |
Regex for negative controls |
method |
Normalization method: "zscore" or "median_ratio" |
result |
Output from |
B |
Number of bootstrap replicates (default 1000) |
conf |
Confidence level (default 0.95) |
Value
kwela_plate_normalize returns a data frame with added columns:
- plate_id
Plate identifier
- TTT_norm
Normalized TTT values
- MP_norm
Normalized MP values
kwela_bootstrap_summary returns a data frame with:
- Treatment
Treatment name
- n_wells
Number of wells
- mean_score
Mean well score
- score_lo
Lower CI bound for mean score
- score_hi
Upper CI bound for mean score
- positive_rate
Observed positive rate
- pr_lo
Lower CI bound for positive rate
- pr_hi
Upper CI bound for positive rate
Examples
set.seed(42)
df <- data.frame(
Treatment = c(rep("Positive Control", 4), rep("Negative Control", 4),
rep("Sample_A", 4)),
TTT = c(rnorm(4, 8, 1), rnorm(4, 72, 5), rnorm(4, 12, 3)),
MP = c(rnorm(4, 100, 10), rnorm(4, 20, 5), rnorm(4, 85, 15)),
Plate = rep("Plate1", 12)
)
normalized <- kwela_plate_normalize(df)
# Bootstrap example
result <- kwela_analyze(df, verbose = FALSE)
set.seed(123)
boot <- kwela_bootstrap_summary(result, B = 500)
KWELA Summary and Diagnostics
Description
kwela_summarize extracts treatment-level consensus results.
kwela_diagnostics returns detailed diagnostic information.
Usage
kwela_summarize(result)
kwela_diagnostics(result)
Arguments
result |
Output from |
Value
kwela_summarize returns a data frame with treatment-level results:
- Treatment
Treatment name
- n_wells
Number of wells
- n_positive
Number of positive wells
- positive_rate
Fraction positive
- mean_score
Mean well score
- classification
Treatment classification
- confidence
Confidence level
- consensus_rule
Consensus rule used
- is_spiked
Whether treatment is spiked
kwela_diagnostics returns a list with:
- version
KWELA version string
- mode
Analysis mode ("diagnostic" or "research")
- profile
Profile used
- consensus
Consensus rule used
- optimal_cutoff
Youden-optimized cutoff
- youden_j
Youden J statistic
- separation
Separation quality metrics
- well_level
Well-level statistics
- treatment_level
Treatment-level statistics
- instability
Instability detection results: check_enabled, strictness, n_inconclusive_matrix, treatments_flagged
- n_samples
Number of sample wells
- n_spiked
Number of spiked wells
- n_unspiked
Number of unspiked wells
Examples
set.seed(42)
df <- data.frame(
Treatment = c(rep("Positive Control", 8), rep("Negative Control", 8),
rep("Sample_A", 8)),
TTT = c(rnorm(8, 8, 1), rnorm(8, 72, 5), rnorm(8, 12, 3)),
MP = c(rnorm(8, 100, 10), rnorm(8, 20, 5), rnorm(8, 85, 15))
)
result <- kwela_analyze(df, verbose = FALSE)
summary <- kwela_summarize(result)
diag <- kwela_diagnostics(result)
Utility Functions
Description
robust_scale provides MAD-based scale estimation.
safe_sd provides a guaranteed-positive standard deviation.
Usage
robust_scale(x)
safe_sd(x, na.rm = TRUE)
Arguments
x |
Numeric vector |
na.rm |
Logical; should NA values be removed? |
Value
robust_scale returns a robust scale estimate using MAD with fallback
to IQR/1.349 and SD. Returns NA_real_ if fewer than 2 values.
safe_sd returns standard deviation, guaranteed to be positive
(returns machine epsilon if zero or NA).
Examples
robust_scale(rnorm(50))
robust_scale(c(1, 2, 3, 100))
safe_sd(c(5, 5, 5))