This vignette gives details of the log-likelihood functions in this package.

The likelihood functions are expressed in terms of survival functions; *S*(*t*), *f*(*t*), etc...

The qualitative and quantitative behaviour of these survival functions depends on the probability distribution from which they arise (*Weibull, Gumbel*, etc...) and the value of the location and scale parameters (*a, b*) determining where and over what range mortality is distributed along the time axis. The location and scale parameters can be made functions themselves, e.g., of infectious dose, host age or gender, etc...

The next sections outline how the likelihood functions are derived. These are followed by details of each likelihood function in this package and the assumptions they make about the data analysed.

Jump to: Likelihood functions in this package

Data collected during the course of a survival experiment provides information on the frequency of individuals dying between sampling intervals. These can be used to estimate the probability density function, *f*(*t*), for the probability an individual alive at time *t*_{0} will die at time *t*.

Maximum likelihood techniques can be applied to this expression to find the combination of which probability distribution and values of their associated location and scale parameters are most likely to describe the observed pattern for the frequency of individuals dying at times *t* in an experiment.

The observed data will provide maximum information for estimating the likelihood when the time of death of all individuals in an experiment is known. In this case, the frequencies of individuals dying at each time *t* will sum to 1. However empirical studies are frequently terminated before all, or even most, individuals die. Consequently these individuals do not experience the event of interest and do contribute towards the estimation of *f*(*t*) as the time of their death is unknown. These are known as censored individuals, or more precisely, as right-censored individuals. They include individuals removed from populations during the course of an experiment, e.g. to control for infection success, those that escape, or are accidently killed, where the timing of the event is known. The next section indicates how likelihood models can adapt to right-censored individuals.

Although the time of death of right-censored individuals is unknown, it is known they survived at least until the time *t* when they were censored. This latter information can contribute towards likelihood expressions as the probability density function *f*(*t*) equals the probability of surviving until time *t*, *S*(*t*), multiplied by the probability of dying at time *t* for those surviving until time *t*, *h*(*t*),

This expression can allow for right-censored individuals,

\[\begin{equation} f(t) = \left[ h(t) \right]^d \cdot S(t) \end{equation}\] wherewhich can be used to analyse survival data, allowing for right-censored individuals.

The likelihood expression above are suitable for situations in which the observed patterns of mortality in a host population are assumed to be determined by single source of mortality, background mortality. This is assumed to be the case for individuals in uninfected or control populations. Hence the **observed** pattern of mortality in an uninfected treatment is expected to equal,

Under the assumptions of relative survival, infected individuals are assumed to experience two independent and mutually exclusive sources of mortality; background mortality and mortality due to infection. Consequently the **observed** pattern of mortality in an infected treatment is expected to equal,

where indices '1' and '2' indicate background mortality and mortality due to infection, respectively. Here it assumed all the individuals in an infected treatment are infected and virulence is homogeneous, i.e., as for background mortality, a single hazard function with a particular combination of values for its location and scale parameters can describe the pattern of mortality due to infection experienced by every member of the infected treatment or population.

These expressions can be subtituted into the likelihood expression above to get an expression for the likelihood of observing mortality in an infected treatment at time *t*. As infected and uninfected treatments are assumed to experience the same background mortality, the likelihood expressions for the observed mortality in each treatment can be combined as,

where *g* is an infection-treatment indicator taking a value of '1' for infected treatments and '0' for uninfected treatments. Hence individuals in the infected treatments contribute towards estimates of background mortality and mortality due to infection, whereas the uninfected treatments only contribute towards the estimation of background mortality.

The following describes the negative log-likelihood (*nll*) functions available in this package for estimating the rate of background mortality and virulence. The negative of the log-likelihood is returned as this is required as input for the maximum likelihood estimation of parameters by the function *mle2* in the package *bbmle* by Ben Bolker and the R Core Development team.

In each case ;

- Indices '1' and '2' denote background mortality and mortality due to infection, respectively
*d*a death indicator; '1' died during experiment, '0' right-censored during or at end of experiment*g*an infection-treatment indicator; '1' infected treatment, '0' uninfected treatment.- All models assume each member of an uninfected treatment is uninfected, but not all models assume each member of an infected treatment is infected.
- Some models assume all members of an infected treatment are infected, but allow for variation in virulence among members for the population; this variation can be discreately or continously distributed, and be directly observed or assumed to be present.
- There is also a model allowing for recovery from infection.

nll_basic | nll_basic_logscale | nll_controls | nll_exposed_infected | nll_frailty | nll_frailty_correlated | nll_frailty_logscale | nll_frailty_shared | nll_proportional_virulence | nll_recovery | nll_recovery_II | nll_two_inf_subpops_obs | nll_two_inf_subpops_unobs

This function is for the 'basic' log-likelihood expression described above. It assumes all the individuals in the infected population are infected and they all experience the same pattern of mortality due to infection, i.e. virulence is homogeneous for the population.

\[\begin{align} S_{OBS.INF}(t) &= S_1(t) \cdot S_2(t) \\ \\ f_{OBS.INF}(t) &= f_1(t) \cdot S_2(t) + f_2(t) \cdot S_1(t) \\ \\ h_{OBS.INF}(t) &= f_{OBS.INF}(t) \;/\; S_{OBS.INF}(t) \\ \\ &= h_1(t) + h_2(t) \end{align}\]Hence the observed rate of mortality in the infected population at time *t* is equal to the the rate of background mortality at time time *t*, *h*_{1}(*t*), plus the rate of mortality due to infection at time *t*, *h*_{2}(*t*), that is, the pathogen's virulence at time *t*.

The log-likelihood expression for the probability of dying at time *t* in the infected and uninfected treatments is,

As for *nll_basic*, except input values of location and scale parameters are assumed to be on a logscale.

This function is for uninfected or control data only. It assumes the observed pattern of mortality is due only to background mortality,

\[\begin{align} S_{OBS.UNINF}(t) &= S_1(t) \\ \\ f_{OBS.UNINF}(t) &= f_1(t) \\ \\ h_{OBS.UNINF}(t) &= f_1(t) \;/\; S_1(t) \\ \\ &= h_1(t) \end{align}\]The log-likelihood expression for the probability of dying at time *t* in the uninfected treatment is,

Exposure to infection does not garantee infection. This function assumes only a proportion *p* of the individuals exposed to infection experience both background mortality and an increased rate of mortality due to infection, while the remaining proportion (1 - *p*) experience only background mortality.

The expressions describing the observed patterns of mortality in the infected treatment are,

\[\begin{align} S_{OBS.INF}(t) &= p \cdot [S_1(t) \cdot S_2(t)] + (1-p) \cdot [S_1(t)] \\ \\ f_{OBS.INF}(t) &= p \cdot [f_1(t) \cdot S_2(t) + f_2(t) \cdot S_1(t)] + (1-p) \cdot f_1(t) \\ \\ h_{OBS.INF}(t) &= S_{OBS.INF}(t) \;/\; f_{OBS.INF}(t) \\ \\ \end{align}\]where *p* is a constant to be estimated; 0 ≤ *p* ≤ 1. Here the pattern of mortality experienced by members of the treatment exposed to infection is no longer homogeneous as it varies according to whether an individual is infected or not.

The overall log-likelihood expression for the observed pattern of mortality in the infected and uninfected treatments is,

\[\begin{align} \log L=\sum_{i=1}^{n}match(treatment) \begin{cases} infected & \Rightarrow d \log \left[ h_{OBS.INF}(t_i)\right] +\log\left[ S_{OBS.INF}(t_i) \right] \\ \\ uninfected & \Rightarrow d \log \left[ h_1(t_i)\right] +\log\left[ S_1(t_i) \right] \\ \end{cases} \\ \\ \end{align}\]This type of model is sometimes referred to as a 'cure' model [1]. This is not because infected hosts recover from infection, but rather because there will be proportionately fewer infected individuals in the 'infected' population over time due to their higher rate of mortality, i.e., the 'infected' population is progressively 'cured' of its infected members.

This function assumes all the individuals in an infected treatment are infected, but there is unobserved variation in the pathogen's virulence. It is assumed there is an underlying rate of mortality due to infection, *h _{V}*(

In this package \(\lambda\) is assumed to follow either the gamma or inverse Gaussian distribution. In the case of the gamma distribution, the hazard function for mortality due to infection, *h*_{2}(*t*), is

where *H _{V}*(

The corresponding cumulative survival function for mortality due to infection, *S*_{2}(*t*), is

In the case of the inverse Gaussian distribution, the hazard function for mortality due to infection, *h*_{2}(*t*), is

and the cumulative survival function, *S*_{2}(*t*),

In both cases the overall log-likelihood expression for the observed mortality in infected and uninfected treatments is,

\[\begin{equation} \log L=\sum_{i=1}^{n}\left\lbrace d \log \left[ h_1(t_i) + g \cdot h_2(t_i) \right] +\log\left[ S_1(t_i) \right] + g \cdot \log[S_2(t_i)] \right\rbrace \end{equation}\]with the expressions for *h*_{2}(*t*) and *S*_{2}(*t*) corresponding with the probability distribution describing the unobserved variation in virulence.

This function is identical to the function *nll_frailty* except it values for location and scale parameters are on a logscale; this can help convergence during maximum likelihood estimation.

This function assumes a proportional hazards relationship for virulence among infected treatments within an experiment, such that,

\[\begin{equation} h_A(t) \;/\; h_B(t) = c \end{equation}\]where *h _{A}*(

A treatment of infected hosts is chosen as the reference population. The observed patterns of survival in this treatment are described in the same way as for the function *nll_basic*,

where the indice *REF* indicates the reference treatment or population of infected hosts.

The patterns of survival and mortality in other infected treatments are estimated as a function of those for the reference population,

\[\begin{align} S_{OBS.ALT}(t) &= S_1(t) \cdot \left[ S_{REF}(t) \right]^\theta \\ \\ f_{OBS.ALT}(t) &= f_1(t) \cdot \left[ S_{REF}(t) \right]^\theta + \theta \cdot \left[ S_{REF}(t) \right]^{\theta - 1} \cdot f_{REF}(t) \cdot S_1(t) \\ \\ h_{OBS.ALT}(t) &= f_{OBS.ALT}(t) \;/\; S_{OBS.ALT}(t) \\ \\ &= h_1(t) + \theta \cdot h_{REF}(t) \end{align}\]where the indice *ALT* indicates and alternative infected treatment, and \(\theta\) is a constant scaling the pathogen's virulence relative to that in the reference population; \(\theta > 0\)

The overall log-likelihood expression for the observed pattern of mortality in the infected and uninfected treatments is,

\[\begin{align} \log L=\sum_{i=1}^{n}match(treatment) \begin{cases} uninf & \Rightarrow d \log \left[ h_1(t_i)\right] +\log \left[ S_1(t_i) \right] \\ \\ ref & \Rightarrow d \log \left[ h_1(t_i) + h_{REF}(t_i) \right] +\log \left[ S_1(t_i) \right] + \log \left[ S_{REF}(t_i) \right] \\ \\ alt & \Rightarrow d \log \left[ h_1(t_i) + \theta h_{REF}(t_i) \right] +\log \left[ S_1(t_i) \right] + \theta \log \left[ S_{REF}(t_i) \right] \\ \end{cases} \\ \\ \end{align}\]where *uninf, ref* and *alt* refer to the uninfected, infected reference and alternative infected treatments, respectively.

This function allows for infected hosts that recover from infection. It assumes that all hosts in an infected treatment are initially infected and experience increased mortality due to infection. However hosts can recover from infection, such that, recovered individuals only experience background mortality equal to that of hosts in a matching uninfected or control treatment.

The timing of recovery from infection is assumed to follow a probability distribution and to be independent of the probability distributions for the timing of background mortality and mortality due to infection. Hence the pattern of events in an infected treatment at time *t*, *S _{INF.POP}*(

where *S*_{1}(*t*) is the cumulative survival function for background mortality at time *t*, *S*_{2}(*t*) is the cumulative survival function for mortality due to infection at time *t*, and *S*_{3}(*t*) is the cumulative probaility that an infection 'survives' until time *t*. Here the index 'INF.POP' is used rather than 'OBS.INF' as recovery from infection may not be an observed event.

where the sum of the first two expressions gives the probability an infected host is infected and dies at time *t*, from either background mortality or mortality due to infection, and corresponds with data collected on the time of death of infected hosts.

The third expression describes the probability an infected host is alive and recovers from infection at time *t*, and corresponds with data collected on the timing of recovery of infected hosts. Hence the expression can be estimated when the timing of recovery is known. This will not be the case if a host's recovery status is only determined after the host has died or been censored, as the data collected correspond with the time hosts recovered and subsequently survived before dying or being censored. However it is assumed recovered individuals experience the same background mortality as uninfected hosts. This information can be used to estimate the likelihood a recovered individual dying at time *t*, recovered at an earlier time and subsequently survived until time *t*, when it died or was censored. For example, in an experiment recording survival daily, the probability a recovered individual dies on the second day *t*_{2} can be estimated from,

where the first line gives the probability an individual survives until and recovers on the second day, multiplied by the background rate of mortality on day 2. The second line gives the probability an individual recovered on the first day, survived background mortality from day 1 to day 2, *S*_{1}(*t*_{2}) / *S*_{1}(*t*_{1}), and died of background mortality on day 2. Multiplication by *h*_{1}(*t*_{2}) would be omitted in cases where a recovered individual was censored at the end of day 2. Hence observed data for the times when recovered individuals die or are censored can be used to estimate the unobserved distribution of recovery times.

The *nll_recovery* function assumes all the individuals in an infected treatment were initially infected and their recovery status is known at the time of their death or censoring, i.e., still infected vs. recovered. This information is used along with data on the timing of death or censoring in a matching uninfected or control treatment to calculate the likelihood of the events being described by the model.

NB This function requires the data to be specified in a specific format different to that of other functions.

This is essentially the same model *nll_recovery*, except it assumes there was no background mortality, such that, *S*_{1}(*t*) = 1 and *f*_{1}(*t*) = 0 throughout the experiment.

In such cases the only observed dynamics will be the death of infected hosts due to infection, before they recover from infection, *f*_{2}(*t*)*S*_{1}(*t*)*S*_{3}(*t*), or *f*_{2}(*t*)*S*_{3}(*t*) as *S*_{1}(*t*) = 1.

As recovered hosts are assumed to only experience the same background mortality as individuals in a matching control treatment, any recovered individuals will all survive to be right-censored at the end of the experiment.

This function assumes all the individuals in an infected treatment are infected but there is data identifying the presence of two discrete subpopulations, e.g., *A* and *B* in proportions *p* and (1 - *p*), respectively, where the additional rate of mortality due to infection in the two subpopulations is different, e.g., hosts dying with/without visible signs of infection or those with viral titres above/below a threshold value.

The observed patterns of mortality in the infected treatment as a whole will be,

\[\begin{align} S_{OBS.INF}(t) &= p \cdot [ S_1(t) \cdot S_{2A}(t)] + (1-p) \cdot [S_1(t) \cdot S_{2B}(t)] \\ \\ f_{OBS.INF}(t) &= p \cdot [f_1(t) \cdot S_{2A}(t) + f_{2A}(t) \cdot S_1(t)] + (1-p) \cdot [f_1(t) \cdot S_{2B}(t) + f_{2B}(t) \cdot S_1(t)] \\ \\ h_{OBS.INF}(t) &= S_{OBS.INF}(t) \;/\; f_{OBS.INF}(t) \\ \\ \end{align}\]where the indices *2A* and *2B* denote the two discrete subpopulations of infected hosts.

Mortality in the two subpopulations is assumed to act independently, consequently the likelihood expressions for each population can be estimated separately based on their membership of subpopulation *A* or *B*,

where each subpopulation shares the same background mortality as a matching control treatment.

This function is similar to the function *nll_two_obs_inf_subpops*, except the presence of two discrete subpopulations in proportions *p* and (1 - *p*) is assumed rather than based on evidence, e.g., data on visual signs of infection or viral titres were not recorded. In this case, *p* is a parameter to be estimated (0 ≤ *p* ≤ 1).

hence the patterns of mortality and the proportions of the two supposed subpopulations needs to be inferred from the pattern of mortality for the infected population as a whole.

1. Lambert PC. 2007 Modeling of the cure fraction in survival studies. *Stata Journal* **7**, 351–375.

2. Aalen OO. 1988 Heterogeneity in survival analysis. *Statistics in Medicine* **7**, 1121–1137. (doi:10.1002/sim.4780071105)

3. Hougaard P. 1984 Life table methods for heterogeneous populations - distributions describing the heterogeneity. *Biometrika* **71**, 75–83. (doi:10.1093/biomet/71.1.75)

4. Zahl PH. 1997 Frailty modelling for the excess hazard. *Statistics in Medicine* **16**, 1573–1585. (doi:10.1002/(sici)1097-0258(19970730)16:14<1573::aid-sim585>3.0.co;2-q)