---
title: "Methodology"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Methodology}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(respondeR)
```

This vignette sets out the statistics behind respondeR: the cut-point approach,
each pooling method and its variance, the relative effect measures, the
threshold-free common-language effect size, the standardized-mean-difference
bridge, random effects, the refinement options, and the assumptions and their
limits. It closes with a guide to choosing a method.

## The cut-point approach

For one study arm with mean change $\mu$, standard deviation $\sigma$ and a
minimal important difference (MID) threshold $m$, assume the patient-level change
$X$ is Normally distributed. A *responder* is a patient whose change crosses the
threshold. The responder probability is

$$
p = \Pr(X > m) = \Phi\!\left(\frac{\mu - m}{\sigma}\right)
\quad\text{(higher change is better),}
$$

or $p = \Phi\!\left(\frac{m - \mu}{\sigma}\right)$ when a *lower* change is
better. This is the cut-point ("dichotomization") method reviewed by Thorlund and
colleagues (2011) and detailed by Anzures-Cabrera, Sarpatwari & Higgins (2011).
The between-arm contrast is then a familiar binary effect measure: by default
the **risk difference** $\mathrm{RD} = p_e - p_c$.

respondeR keeps proportions on the $[0, 1]$ scale internally and converts to
percentages only for display.

## The pooling methods

Studies report per-arm summaries; the methods differ in how those are combined.
Throughout, study $i$ contributes $(\bar d_{e,i}, s_{e,i}, n_{e,i})$ for the
experimental arm and $(\bar d_{c,i}, s_{c,i}, n_{c,i})$ for the control arm.

### Individual (the default workhorse)

Dichotomize each study, form its risk difference, then pool. With
$p_{e,i} = \Phi((\bar d_{e,i} - m)/s_{e,i})$ and likewise $p_{c,i}$,

$$
\mathrm{RD}_i = p_{e,i} - p_{c,i}, \qquad
\widehat{\mathrm{RD}} = \frac{\sum_i w_i \mathrm{RD}_i}{\sum_i w_i}, \quad
w_i = 1/\widehat{\mathrm{Var}}(\mathrm{RD}_i).
$$

The per-study variance follows `se_method`:

* `"binomial"` (default): $\widehat{\mathrm{Var}}(\mathrm{RD}_i) =
  \frac{p_{e,i}(1 - p_{e,i})}{n_{e,i}} + \frac{p_{c,i}(1 - p_{c,i})}{n_{c,i}}$.
* `"delta"`: propagates the uncertainty in the estimated mean and SD through the
  Normal CDF, $\widehat{\mathrm{Var}}(p) = \phi(a)^2\left[\frac{1}{n} +
  \frac{a^2}{2(n-1)}\right]$ with $a = (\mu - m)/\sigma$.

The `"binomial"` form is a *pseudo-binomial* approximation: $p_{e,i}$ and
$p_{c,i}$ are probabilities implied by the estimated mean and SD, not proportions
of observed dichotomized patients, so it does not carry the uncertainty in the
reported mean and SD. The `"delta"` form does, and is generally preferable for
summary-statistic inputs; `"binomial"` is the default only for continuity with
earlier results. This is the most defensible method because it respects each
study's own scale.

### Weighted mean

Pool *before* dichotomizing. The mean is combined by inverse variance and the SD
by the within-study pooled SD:

$$
\bar d^{\star} = \frac{\sum_i \bar d_i / v_i}{\sum_i 1/v_i},\;
v_i = \frac{s_i^2}{n_i}; \qquad
s^{\star} = \sqrt{\frac{\sum_i (n_i - 1)\, s_i^2}{\sum_i (n_i - 1)}}.
$$

Then $p^{\star} = \Phi((\bar d^{\star} - m)/s^{\star})$ and the risk-difference
variance comes from the delta method, propagating uncertainty in **both** the
pooled mean and the pooled SD,
$$
\mathrm{Var}(p^{\star}) \approx
\left(\frac{\partial p^{\star}}{\partial \mu}\right)^2 \mathrm{Var}(\bar d^{\star})
+ \left(\frac{\partial p^{\star}}{\partial \sigma}\right)^2 \mathrm{Var}(s^{\star}),
\qquad
\mathrm{Var}(s^{\star}) \approx \frac{s^{\star 2}}{2 \sum_i (n_i - 1)} .
$$
Including the SD term keeps this method consistent with the individual delta
method and avoids intervals that are too narrow. This is the paper-aligned
"pool-then-dichotomize" estimator.

### Unweighted mean and median

Replace the pooled summaries with the arithmetic mean or the median of the study
means and SDs. These are useful robustness summaries but have **no variance
model**, so respondeR reports the point estimate with `NA` intervals rather than
a spurious confidence interval.

```{r}
responder_analysis(sample_responder_data, mid = 1)[,
  c("method", "p_e", "p_c", "rd", "rd_lb", "rd_ub")]
```

### Baseline risk: matched or median control

By default (`control = "matched"`) the control responder proportion is pooled the
same way as the experimental arm, so each summary method contrasts like with like.

The simulation study that motivated this package (Sofi-Mahmudi, 2024) instead held
the baseline risk fixed at the **median control arm** for every summary method,
varying only how the experimental arm was pooled. That choice is available via
`control = "median"`. It treats the control event rate as a single nuisance
baseline, much as a GRADE summary-of-findings table takes one representative
control risk, and reports the experimental pooling against it. Because the median
control arm carries no sampling-variance model, this option returns point
estimates only.

```{r}
matched <- responder_analysis(sample_responder_data, mid = 1)
medbase <- responder_analysis(sample_responder_data, mid = 1, control = "median")
keep <- matched$method %in% c("median", "unweighted", "weighted")
data.frame(
  method     = matched$method[keep],
  pc_matched = round(matched$p_c[keep], 3),
  pc_median  = round(medbase$p_c[keep], 3),
  rd_matched = round(matched$rd[keep], 3),
  rd_median  = round(medbase$rd[keep], 3)
)
```

Under `control = "median"` every summary method shares one control proportion (the
median control arm); the `median` method is unchanged, and the `individual` and
`smd` methods, which pool per-study contrasts, ignore the option.

## Relative effect measures

From $p_e$ and $p_c$ (and their variances) respondeR also reports relative
measures on the log scale and the number needed to treat:

$$
\mathrm{RR} = \frac{p_e}{p_c}, \quad
\mathrm{OR} = \frac{p_e/(1 - p_e)}{p_c/(1 - p_c)}, \quad
\mathrm{NNT} = \frac{1}{\mathrm{RD}}.
$$

Confidence intervals for RR and OR are formed on the log scale and
back-transformed. Following Altman (1998), when the risk-difference interval
*excludes* zero the NNT bounds are the reciprocals of the RD bounds; when it
*includes* zero the NNT is unbounded and respondeR returns `NA` bounds to flag
it.

```{r}
responder_analysis(sample_responder_data, mid = 1, method = "individual")[,
  c("rd", "rr", "rr_lb", "rr_ub", "or", "nnt")]
```

## Common-language effect size (threshold-free)

Choosing a MID can be contentious. The common-language effect size (CLES, the
probabilistic index) is the probability that a randomly chosen treated patient
has a better change than a randomly chosen control. Under a Normal model it is
exact:

$$
\mathrm{CLES} = \Phi(\delta), \qquad
\delta = \frac{\mu_e - \mu_c}{\sqrt{\sigma_e^2 + \sigma_c^2}}.
$$

Per-study $\delta_i$ are pooled by inverse variance (with a delta-method
variance) and back-transformed. No threshold is required.

```{r}
cles <- responder_cles(sample_responder_data)
c(cles = cles$cles, lb = cles$cles_lb, ub = cles$cles_ub)
```

## The SMD bridge (`method = "smd"`)

The second approach of Anzures-Cabrera et al. (2011) pools the standardized mean
difference and maps it to an odds ratio. respondeR pools Hedges' $g$, applies the
Cox logistic link $\ln\mathrm{OR} = \frac{\pi}{\sqrt 3}\, g$, and combines the
result with the weighted-pooled control responder rate to recover risks. It is a
useful cross-check on the cut-point methods because it bridges to risks through a
different distributional assumption.

```{r}
responder_analysis(sample_responder_data, mid = 1, method = "smd")[,
  c("method", "p_e", "p_c", "rd", "or", "or_lb", "or_ub")]
```

## Random effects and heterogeneity

The individual and SMD methods pool across studies and so can use random effects
(`pooling = "random"`). respondeR offers DerSimonian-Laird (closed-form,
dependency-free) or REML (`tau_method = "REML"`, via *metafor*), and reports
Cochran's $Q$, $I^2$, $\tau^2$ and a prediction interval.

```{r}
responder_analysis(sample_responder_data, mid = 1, method = "individual",
                   pooling = "random")[, c("tau2", "i2", "q", "q_p",
                                           "pi_lb", "pi_ub")]
```

Prediction intervals use a $t_{k-2}$ critical value and are unstable for very few
studies; interpret them cautiously when $k$ is small.

For the pooled confidence interval itself, the default Normal (Wald) interval can
under-cover when $k$ is small, because $\tau^2$ is poorly estimated. Set
`ci_method = "hksj"` for the Hartung-Knapp-Sidik-Jonkman interval, a $t$-based
interval whose width adapts to the observed dispersion of the study estimates and
which is better calibrated for few-study meta-analyses (Rover, Knapp & Friede,
2015). The example below has only three studies, exactly where this matters.

```{r}
rbind(
  wald = responder_analysis(sample_responder_data, mid = 1, method = "individual",
                            pooling = "random", ci_method = "wald")[, c("rd", "rd_lb", "rd_ub")],
  hksj = responder_analysis(sample_responder_data, mid = 1, method = "individual",
                            pooling = "random", ci_method = "hksj")[, c("rd", "rd_lb", "rd_ub")]
)
```

## Refinements

* **Bounded intervals** (`ci_type = "logit"`). Proportion intervals are formed on
  the logit scale and risk-difference intervals by Newcombe's MOVER method, so
  they stay within $[0, 1]$ and $[-1, 1]$ even for extreme proportions.
* **MID uncertainty** (`mid_sd`). If the threshold is itself estimated, supplying
  its SD propagates that uncertainty into the effect-measure variances, with the
  correct between-arm correlation through the shared threshold.
* **Alternative distributions** (`dist`). The change scores can be modeled as
  lognormal or Student-$t$ instead of Normal, as a sensitivity analysis for
  skewed or heavy-tailed data (variances are obtained numerically).
* **Boundary handling.** A MID far from the observed means can make a responder
  probability equal to exactly 0 or 1, which would make log ratios, logits and
  inverse-variance weights non-finite. respondeR reports the proportions and the
  risk difference unclamped, but clamps the probabilities that feed ratios, logs
  and variances away from 0 and 1 by a tiny amount, so a sensitivity sweep over
  the MID returns finite (if wide) results instead of failing.

```{r}
responder_analysis(sample_responder_data, mid = 1, method = "weighted",
                   ci_type = "logit", mid_sd = 0.2)[, c("rd", "rd_lb", "rd_ub")]
```

## Assumptions and limitations

* **Normality of change scores.** The cut-point probabilities assume the
  patient-level change is Normal within each arm. Skewed outcomes can bias the
  responder proportions; try `dist = "lognormal"`/`"t"` as a sensitivity check.
* **Summary-statistic input.** Only means, SDs and sample sizes are used; the
  method cannot recover information lost in aggregation.
* **Choice of MID.** Results depend on the threshold. Report the MID, and
  consider the threshold-free CLES alongside.
* **Normal-approximation intervals.** Wald intervals can fall outside valid
  bounds for extreme proportions or tiny samples; prefer `ci_type = "logit"`
  there.

## Choosing a method

| If you want… | Use |
|--------------|-----|
| A defensible default that respects each study's scale | `individual` (fixed or random) |
| The paper's pool-then-dichotomize estimator | `weighted` |
| A robustness or sensitivity summary | `median` / `unweighted` (point estimates) |
| A cross-check via a different bridge to risks | `smd` |
| To avoid choosing a threshold altogether | `responder_cles()` |
| Relative rather than absolute effects | the `rr` / `or` columns; `nnt` for impact |
| Between-study heterogeneity quantified | `pooling = "random"` |

## References

Sofi-Mahmudi, A. (2024). Identifying an optimal strategy for converting pain as
a continuous outcome to a responder analysis [Master's thesis, McMaster
University]. MacSphere. https://hdl.handle.net/11375/30210

Thorlund, K., Walter, S. D., Johnston, B. C., Furukawa, T. A., & Guyatt, G. H.
(2011). Pooling health-related quality of life outcomes in meta-analysis: a
tutorial and review of methods for enhancing interpretability. *Research
Synthesis Methods*, 2(3), 188 to 203. doi:10.1002/jrsm.46

Altman, D. G. (1998). Confidence intervals for the number needed to treat.
*BMJ*, 317(7168), 1309 to 1312.

Anzures-Cabrera, J., Sarpatwari, A., & Higgins, J. P. T. (2011). Expressing
findings from meta-analyses of continuous outcomes in terms of risks.
*Statistics in Medicine*, 30(25), 2867 to 2880. doi:10.1002/sim.4298

Chinn, S. (2000). A simple method for converting an odds ratio to effect size for
use in meta-analysis. *Statistics in Medicine*, 19(22), 3127 to 3131.

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic.
*Psychological Bulletin*, 111(2), 361 to 365.

Rover, C., Knapp, G., & Friede, T. (2015). Hartung-Knapp-Sidik-Jonkman approach
and its modification for random-effects meta-analysis with few studies. *BMC
Medical Research Methodology*, 15, 99. doi:10.1186/s12874-015-0091-1