In this vignette, the fullfact
package is explored using
expert (functions designated by the number 3) for the standard model
with the ability of the user to include additional fixed and/or random
effects, such as a model including environment treatments and their
interaction, for normal data type or error structures.
Simple (functions designated by no number) for the standard model only is explored in the vignette Simple Normal Data Example.
Advanced (functions designated by the number 2) for the standard model with the options of including additional random effects for one position (e.g. tank) and/or one block effect (e.g. several blocks of 2 \(\times\) 2 factorial matings) is explored in the vignette Advanced Normal Data Example.
Non-normal error structures (e.g. binary, proportion, and/or count data types) are explored in another three vignettes: (1) Simple Non-normal Data Example, (2) Advanced Non-normal Data Example, and (3) Expert Non-normal Data Example.
The example data set is an 11 \(\times\) 11 full factorial mating: 11 dams and 11 sires with all combinations resulting in 121 families. There are 10 observations per family or 5 observations for each of two replicates per family.
library("fullfact")
data(chinook_length)
head(chinook_length)
#> family repli dam sire tray cell length egg_size
#> 1 f1 r1 d1 s1 t7 1A 22.7 7.27
#> 2 f1 r2 d1 s1 t8 1A 22.0 7.27
#> 3 f1 r1 d1 s1 t7 1A 23.8 7.27
#> 4 f1 r2 d1 s1 t8 1A 21.8 7.27
#> 5 f1 r1 d1 s1 t7 1A 22.6 7.27
#> 6 f1 r2 d1 s1 t8 1A 22.4 7.27
Displayed are columns for family identities (ID), replicate ID, dam ID, sire ID, incubation tray ID, incubation cell ID (within tray), Chinook salmon length (mm) at hatch, and dam egg size (mm).
Model random effects are dam, sire, dam by sire, and any additional fixed and/or random effects. Extracts the dam, sire, dam, dam by sire, and residual variance components. Extracts any additional fixed effect and/or random effect variance components. The fixed effect variance component is as a single group using the method described by Nakagawa and Schielzeth (2013). Calculates the total variance component. Calculates the additive genetic, non-additive genetic, and maternal variance components.
Assuming the effects of epistasis are of negligible importance, the additive genetic variance (VA) component is calculated as four times the sire (VS), the non-additive genetic variance (VN) component as four times the dam by sire interaction (VD\(\times\)S), and the maternal variance component (VM) as the dam (VD) – sire (VS) (Lynch and Walsh 1998, p. 603). When there is epistasis, those variance components will be overestimated and this may explain why the percentage of phenotypic variance explained by the components can add up to more than 100% in certain cases.
Significance values for the random effects are determined using likelihood ratio tests (Bolker et al. 2009).
Significance values for any fixed effects are determined using
likelihood ratio tests (LRT) and a parametric bootstrap method (Bolker
et al. 2009) from the mixed
function of the
afex
package. LRT is not generally recommended for fixed
effects as there are issues calculating the denominator degrees of
freedom.
remain
is the remaining formula using lme4
package format.
iter
is the number of iterations for computing the
parametric bootstrap significance value for any fixed effects, typically
1,000. For the example, 100 iterations was used.
Nakagawa S, Schielzeth H. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4(2): 133-142. DOI: 10.1111/j.2041-210x.2012.00261.x
Lynch M, Walsh B. 1998. Genetics and Analysis of Quantitative Traits. Sinauer Associates, Massachusetts.
Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, White J-SS. 2009. Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology and Evolution 24(3): 127-135. DOI: 10.1016/j.tree.2008.10.008
#>length_mod3<- observLmer3(observ=chinook_length,dam="dam",sire="sire",response="length",
#>remain="egg_size + (1|tray)",iter=1000) #full
length_mod3<- observLmer3(observ=chinook_length,dam="dam",sire="sire",response="length",
remain="egg_size + (1|tray)",iter=100)
#> [1] "2024-02-04 15:49:06 PST"
#> Time difference of 11.26486 secs
length_mod3
#> $fixed
#> effect variance percent Chi.sq p.value
#> 1 egg_size NA NA 3.273533 0.07920792
#> 2 Fix_Tot 0.05967965 5.660793 NA NA
#>
#> $LRT.fixed
#> term d.AIC d.BIC Chi.sq p.value
#> 1 egg_size 1.273533 -3.824843 3.273533 0.07040586
#>
#> $random
#> effect effect2 variance percent
#> 1 dam:sire (Intercept) 1.788054e-01 1.696022e+01
#> 2 tray (Intercept) 1.197791e-01 1.136141e+01
#> 3 sire (Intercept) 2.251875e-09 2.135970e-07
#> 4 dam (Intercept) 1.461075e-01 1.385873e+01
#>
#> $LRT.random
#> term d.AIC d.BIC Chi.sq p.value
#> 1 (1 | dam) 14.15463 9.056258 16.15463 5.837558e-05
#> 2 (1 | sire) -2.00000 -7.098376 0.00000 1.000000e+00
#> 3 (1 | dam:sire) 141.43393 136.335557 143.43393 4.724486e-33
#> 4 (1 | tray) 122.90989 117.811511 124.90989 5.325918e-29
#>
#> $other
#> component variance percent
#> 1 Residual 0.5498915 52.15885
#> 2 Total 1.0542631 100.00000
#>
#> $calculation
#> component variance percent
#> 1 additive 9.007498e-09 8.543881e-07
#> 2 nonadd 7.152215e-01 6.784089e+01
#> 3 maternal 1.461075e-01 1.385873e+01
Produces a list object containing up to six data frames. Fixed, random, other, and calculation data frames contain the raw variance components and the variance components as a percentage of the total variance component. The fixed data frame also contains the parametric bootstrap Chi-square and p-value for any fixed effects. LRT.random and LRT.fixed data frames contain the difference in AIC and BIC, and likelihood ratio test Chi-square and p-value for random and any fixed effects, respectively.
Note
Default is Restricted maximum likelihood (REML) as
ml = F
. Option for maximum likelihood (ML) is
ml = T
.
Maximum likelihood (ML) estimates the parameters that maximize the likelihood of the observed data and has the advantage of using all the data and accounting for non-independence (Lynch and Walsh 1998, p. 779; Bolker et al. 2009). On the other hand, ML has the disadvantage of assuming that all fixed effects are known without error, producing a downward bias in the estimation of the residual variance component. This bias can be large if there are lots of fixed effects, especially if sample sizes are small. Restricted maximum likelihood (REML) has the advantage of not assuming the fixed effects are known and averages over the uncertainty, so there can be less bias in the estimation of the residual variance component. However, REML only maximizes a portion of the likelihood to estimate the effect parameters, but is the preferred method for analyzing large data sets with complex structure.
Power values are calculated by stochastically simulating data for a number of iterations and then calculating the proportion of P-values less than \(\alpha\) (e.g. 0.05) for each component (Bolker 2008). Simulated data are specified by inputs for variance component values and the sample sizes.
Bolker BM. 2008. Ecological Models and Data in R. Princeton University Press, Princeton.
Defaults are alpha = 0.05
for 5%,
nsim = 100
for 100 simulations, and ml = F
for
REML. Other default is ftest = "LR"
for likelihood ratio
test for fixed effects; option of “PB” for parametric bootstrap for
which the number of iterations need to be specified,
e.g. iter=1000
.
var_rand
is a vector of dam, sire, dam by sire,
residual, and remaining random variance components, i.e. c(dam,sire,dam
\(\times\)
sire,residual,rand1,rand2,etc.).
n_rand
is a vector of dam, sire, family (i.e. dam \(\times\) sire), and remaining random sample
sizes, i.e. c(dam,sire,family,rand1,rand2,etc.).
design
is a data frame of the experimental design, using
only integers. First three columns must contain and be named “dam”,
“sire”, “family”. Remaining columns are the random effects followed by
the fixed effects. Continuous fixed effects are a column containing the
values 1:nrow(design).
remain
is the remaining formula using lme4
package format. Must be random effects followed by fixed effects. No
interactions or random slopes; formulate as intercepts in design.
var_fix
is a vector of known fixed variance components,
i.e. c(fix1,fix2,etc.). Continuous fixed random values are sorted to
match column values.
n_fix
is a vector of known fixed sample sizes,
i.e. c(fix1,fix2,etc.). Continuous fixed effects must have a sample size
of 1.
For this example, the random effect variance components of
observLmer3
above are used (i.e. dam= 0.1461, sire= 0, dam
\(\times\) sire= 0.1788, residual=
0.5499, tray= 0.1198) and the sample size of the Chinook salmon data set
(i.e. dam= 11, sire= 11, family= 121 (11 \(\times\) 11), tray= 11). The actual design
was composed of 16 trays with 55–80 offspring each. However,
powerLmer3
uses an equal number of offspring per position,
so the number of trays was decreased from 16 to 11. The fixed effect
variance component (egg_size) is 0.0597 and sample size is 11 as the
mean egg size per dam was used for each offspring. If egg size was more
of a continuous effect, i.e. differed among the offspring within dam,
then the sample size would be 1 for the simulation.
The Chinook salmon data set is reworked to contain only integers for
design
.
Full analysis is 100 simulations. Example has 25 simulations.
#Reworking the Chinook salmon data set to contain only integers for `design`
#dam ID, sire ID, family ID
desn0<- data.frame(dam=rep(1:11,each=11),sire=rep(1:11,11),family=1:(11*11))
#replicate for offspring sample size (10)
desn<- do.call("rbind", replicate(10,desn0,simplify=F)); rm(desn0)
desn$tray<- rep(1:11,each=nrow(desn)/11) #equal number of offspring per tray
desn$tray<- sample(desn$tray,nrow(desn)) #shuffle tray numbers
desn$egg_size<- desn$dam #egg size is related to dam
head(desn)
#> dam sire family tray egg_size
#> 1 1 1 1 8 1
#> 2 1 2 2 3 1
#> 3 1 3 3 7 1
#> 4 1 4 4 3 1
#> 5 1 5 5 7 1
#> 6 1 6 6 1 1
#>powerLmer3(var_rand=c(0.1461,0,0.1788,0.5499,0.1198),n_rand=c(11,11,121,11),var_fix=0.0597,
#>n_fix=11,design=desn,remain="(1|tray) + egg_size") #full with LR
#full with PB and 1000 iterations
#>powerLmer3(var_rand=c(0.1461,0,0.1788,0.5499,0.1198),n_rand=c(11,11,121,11),var_fix=0.0597,
#>n_fix=11,design=desn,remain="(1|tray) + egg_size",ftest="PB",iter=1000)
powerLmer3(var_rand=c(0.1461,0,0.1788,0.5499,0.1198),n_rand=c(11,11,121,11),var_fix=0.0597,
n_fix=11,design=desn,remain="(1|tray) + egg_size",nsim=25) #25 simulations with LR
#> [1] "2024-02-04 15:49:18 PST"
#> [1] "Starting simulation: 1"
#> [1] "Starting simulation: 2"
#> [1] "Starting simulation: 3"
#> [1] "Starting simulation: 4"
#> [1] "Starting simulation: 5"
#> [1] "Starting simulation: 6"
#> [1] "Starting simulation: 7"
#> [1] "Starting simulation: 8"
#> [1] "Starting simulation: 9"
#> [1] "Starting simulation: 10"
#> [1] "Starting simulation: 11"
#> [1] "Starting simulation: 12"
#> [1] "Starting simulation: 13"
#> [1] "Starting simulation: 14"
#> [1] "Starting simulation: 15"
#> [1] "Starting simulation: 16"
#> [1] "Starting simulation: 17"
#> [1] "Starting simulation: 18"
#> [1] "Starting simulation: 19"
#> [1] "Starting simulation: 20"
#> [1] "Starting simulation: 21"
#> [1] "Starting simulation: 22"
#> [1] "Starting simulation: 23"
#> [1] "Starting simulation: 24"
#> [1] "Starting simulation: 25"
#> Time difference of 12.43282 secs
#> $group
#> group var_in var_out
#> 1 fix_eff 0.0597 0.0222310889901148
#>
#> $fixed
#> term n power
#> 1 egg_size 11 0.08
#>
#> $random
#> term n var_in var_out power
#> 1 dam 11 0.1461 0.224844825853844 1
#> 2 sire 11 0 0.0060524340096173 0
#> 3 dam.sire 121 0.1788 0.173533124995259 1
#> 4 tray 11 0.1198 0.136720128827227 1
#> 5 residual <NA> 0.5499 0.535497995336952 <NA>
There is sufficient power (\(\ge\) 0.8) for dam and dam by sire variance components, whereas there is insufficient power (< 0.8) for the sire variance component. Albeit, the sire component is near zero, so the low power may be an artifact. There was also sufficient power for tray and egg size variance components.In the cases of insufficient power, the sample size of dam, sire, and/or offspring per family can be increased until there is sufficient power.
Taking the reverse approach (can the sample size of dam, sire or offspring per family be reduced while maintaining sufficient power?) using the same variance components and offspring per family sample size, dam and sire sample sizes could be reduced from 11 to 7. The tray sample size was reduced accordingly to have an equal number of offspring per tray, i.e. 7 dams \(\times\) 7 sires \(\times\) 10 offspring = 490, which can divided equally by 10 trays. The egg size sample size was only reduced accordingly to match the change in dam sample size.
#dam ID, sire ID, family ID
desn0_2<- data.frame(dam=rep(1:7,each=7),sire=rep(1:7,7),family=1:(7*7))
#replicate for offspring sample size (10)
desn_2<- do.call("rbind", replicate(10,desn0_2,simplify=F)); rm(desn0)
desn_2$tray<- rep(1:10,each=nrow(desn_2)/10) #equal number of offspring per tray
desn_2$tray<- sample(desn_2$tray,nrow(desn_2)) #shuffle tray numbers
desn_2$egg_size<- desn_2$dam #egg size is related to dam
#>powerLmer3(var_rand=c(0.1461,0,0.1788,0.5499,0.1198),n_rand=c(7,7,49,10),var_fix=0.0597,
#>n_fix=7,design=desn_2,remain="(1|tray) + egg_size") #full with LR
#full with PB and 1000 iterations
#>powerLmer3(var_rand=c(0.1461,0,0.1788,0.5499,0.1198),n_rand=c(7,7,49,10),var_fix=0.0597,
#>n_fix=7,design=desn_2,remain="(1|tray) + egg_size",ftest="PB",iter=1000)
powerLmer3(var_rand=c(0.1461,0,0.1788,0.5499,0.1198),n_rand=c(7,7,49,10),var_fix=0.0597,
n_fix=7,design=desn_2,remain="(1|tray) + egg_size",nsim=25) #25 simulations with LR
#> [1] "2024-02-04 15:49:30 PST"
#> [1] "Starting simulation: 1"
#> [1] "Starting simulation: 2"
#> [1] "Starting simulation: 3"
#> [1] "Starting simulation: 4"
#> [1] "Starting simulation: 5"
#> [1] "Starting simulation: 6"
#> [1] "Starting simulation: 7"
#> [1] "Starting simulation: 8"
#> [1] "Starting simulation: 9"
#> [1] "Starting simulation: 10"
#> [1] "Starting simulation: 11"
#> [1] "Starting simulation: 12"
#> [1] "Starting simulation: 13"
#> [1] "Starting simulation: 14"
#> [1] "Starting simulation: 15"
#> [1] "Starting simulation: 16"
#> [1] "Starting simulation: 17"
#> [1] "Starting simulation: 18"
#> [1] "Starting simulation: 19"
#> [1] "Starting simulation: 20"
#> [1] "Starting simulation: 21"
#> [1] "Starting simulation: 22"
#> [1] "Starting simulation: 23"
#> [1] "Starting simulation: 24"
#> [1] "Starting simulation: 25"
#> Time difference of 7.957783 secs
#> $group
#> group var_in var_out
#> 1 fix_eff 0.0597 0.0415828511017653
#>
#> $fixed
#> term n power
#> 1 egg_size 7 0.16
#>
#> $random
#> term n var_in var_out power
#> 1 dam 7 0.1461 0.168882750794075 0.8
#> 2 sire 7 0 0.00548358319542521 0
#> 3 dam.sire 49 0.1788 0.174208501240021 1
#> 4 tray 10 0.1198 0.117153234207914 1
#> 5 residual <NA> 0.5499 0.545370141358096 <NA>
Confidence intervals for the additive genetic, non-additive genetic, and maternal variance components can be produced using the bootstrap-t resampling method described by Efron and Tibshirani (1993, p. 160‒162). Observations are resampled with replacement until the original sample size is reproduced. The resampled data are then used in the model and the additive genetic, non-additive genetic, and maternal variance components are extracted. The process is repeated for a number of iterations, typically 1,000 times, to produce a distribution for each component. The confidence interval lower and upper limits and median are extracted from the distribution.
Efron B, Tibshirani R. 1993. An Introduction to the Bootstrap. Chapman and Hall, New York.
The resampRepli
function is used to bootstrap resample
observations grouped by replicate ID within family ID for a specified
number of iterations to create the resampled data set. A similar
resampFamily
function is able to resample observations
grouped by family ID only.
copy
is a vector of column numbers (to copy the
contents). Does not need to contain the family and/or replicate
columns.
Full analysis is 1000 iterations. Example has 5 iterations.
#>resampRepli(dat=chinook_length,copy=c(3:8),family="family",replicate="repli",iter=1000) #full
#>resampFamily(dat=chinook_length,copy=c(3:8),family="family",iter=1000) #family only
resampRepli(dat=chinook_length,copy=c(3:8),family="family",replicate="repli",iter=5) #5 iterations
Because of the large file sizes that can be produced, the resampling of each replicate Y per family X is saved separately as a common separated (X_Y_resampR.csv) file in the working directory. These files are merged to create the final resampled data set (resamp_datR.csv).
If using resampFamily
, the file names are X_resampF.csv
per family and resamp_datF.csv for the final resampled data set.
The equivalent to observLmer3
is available for the final
bootstrap resampled data set, i.e. resampLmer3
.
Default is ml = F
for REML. The starting model number
start =
and ending model number end =
need to
be specified.
remain
is the remaining formula using lme4
package format with # sign, e.g. fixed# + (1|random#).
Full analysis is 1000 iterations. Example has 5 iterations.
#>length_datR<- read.csv("resamp_datR.csv") #1000 iterations
#>length_rcomp3<- resampLmer3(resamp=length_datR,dam="dam",sire="sire",response="length",
#>remain="egg_size# + (1|tray#)",start=1,end=1000) #full
data(chinook_resampL) #5 iterations
head(chinook_resampL)
#> dam1 sire1 tray1 cell1 length1 egg_size1 dam2 sire2 tray2 cell2 length2
#> 1 d1 s1 t7 1A 23.1 7.27 d1 s1 t7 1A 23.1
#> 2 d1 s1 t7 1A 22.7 7.27 d1 s1 t7 1A 22.7
#> 3 d1 s1 t7 1A 22.9 7.27 d1 s1 t7 1A 22.9
#> 4 d1 s1 t7 1A 23.1 7.27 d1 s1 t7 1A 22.7
#> 5 d1 s1 t7 1A 23.1 7.27 d1 s1 t7 1A 22.9
#> 6 d1 s1 t8 1A 22.4 7.27 d1 s1 t8 1A 22.3
#> egg_size2 dam3 sire3 tray3 cell3 length3 egg_size3 dam4 sire4 tray4 cell4
#> 1 7.27 d1 s1 t7 1A 23.8 7.27 d1 s1 t7 1A
#> 2 7.27 d1 s1 t7 1A 23.8 7.27 d1 s1 t7 1A
#> 3 7.27 d1 s1 t7 1A 23.8 7.27 d1 s1 t7 1A
#> 4 7.27 d1 s1 t7 1A 23.8 7.27 d1 s1 t7 1A
#> 5 7.27 d1 s1 t7 1A 23.8 7.27 d1 s1 t7 1A
#> 6 7.27 d1 s1 t8 1A 22.4 7.27 d1 s1 t8 1A
#> length4 egg_size4 dam5 sire5 tray5 cell5 length5 egg_size5
#> 1 22.6 7.27 d1 s1 t7 1A 22.9 7.27
#> 2 23.8 7.27 d1 s1 t7 1A 22.6 7.27
#> 3 23.1 7.27 d1 s1 t7 1A 22.6 7.27
#> 4 22.9 7.27 d1 s1 t7 1A 22.9 7.27
#> 5 23.8 7.27 d1 s1 t7 1A 22.7 7.27
#> 6 22.0 7.27 d1 s1 t8 1A 22.4 7.27
length_rcomp3<- resampLmer3(resamp=chinook_resampL,dam="dam",sire="sire",response="length",
remain="egg_size# + (1|tray#)",start=1,end=5)
#> [1] "2024-02-04 15:49:38 PST"
#> [1] "Working on model: 1"
#> [1] "Working on model: 2"
#> [1] "Working on model: 3"
#> [1] "Working on model: 4"
#> [1] "Working on model: 5"
#> Time difference of 0.4564121 secs
length_rcomp3[1:5,]
#> dam:sire tray sire dam Residual Fixed Total
#> 1 0.2138559 0.1160011 0.000000e+00 0.1340714 0.4995956 0.06131461 1.0248386
#> 2 0.2402589 0.1242010 6.029765e-09 0.1376951 0.5203867 0.06188805 1.0844298
#> 3 0.2194606 0.1161735 4.287067e-08 0.1357622 0.4844443 0.04275863 0.9985993
#> 4 0.1960044 0.1330174 1.204095e-03 0.1341200 0.4919724 0.05875469 1.0150730
#> 5 0.2247567 0.1466172 0.000000e+00 0.1305175 0.4787519 0.08000904 1.0606523
#> additive nonadd maternal
#> 1 0.000000e+00 0.8554237 0.1340714
#> 2 2.411906e-08 0.9610355 0.1376951
#> 3 1.714827e-07 0.8778425 0.1357622
#> 4 4.816382e-03 0.7840176 0.1329159
#> 5 0.000000e+00 0.8990270 0.1305175
The function provides a data frame with columns containing the raw variance components for dam, sire, dam by sire, residual, total, additive genetic, non-additive genetic, and maternal. Also columns containing the raw variance components for any additional fixed and/or random effects. The number of rows in the data frame matches the number of iterations in the resampled data set and each row represents a model number.
Extract the bootstrap-t confidence intervals (CI) and median for the
additive genetic, non-additive genetic, and maternal values from the
data frame of models produced using resampLmer3
. Also
extracts intervals for additional fixed and/or random effects.
Default confidence interval is 95% as level = 95
.
remain
is a vector of column names for additional
effects.
#>ciMANA3(comp=length_rcomp3,remain=c("tray","Fixed")) #full, with egg size as Fixed
data(chinook_bootL) #similar to length_rcomp3 1000 models, but has no Fixed
ciMANA3(comp=chinook_bootL,remain=c("tray","Residual"))
#> $raw
#> component lower median upper
#> 1 additive 0 0 0.03
#> 2 nonadd 0.713 0.844 0.996
#> 3 maternal 0.168 0.201 0.233
#> 4 tray 0.091 0.112 0.133
#> 5 Residual 0.466 0.514 0.562
#>
#> $percentage
#> component lower median upper
#> 1 additive 0 0 2.9
#> 2 nonadd 69.4 81.3 94.2
#> 3 maternal 16.7 19.3 22
#> 4 tray 9 10.8 12.6
#> 5 Residual 46.5 49.4 52.7
The raw values are presented and are converted to a percentage of the
total variance for each model. Defaults are the number of decimal places
to round CI raw values as rnd_r = 3
and to round the CI
percent values as rnd_p = 1
.
The bootstrap-t method may produce medians that are largely different from the observed values. However, the chinook_bootL example data for the bootstrap-t CI were produced using another model including tray but not egg size (see the vignette for Advanced Normal Data Example), which may explain differences between the 95% CI and observed values. Nonetheless, options are provided below for 95% CI that are a poor fit.
The BCa method (bias and acceleration) described by Efron and Tibshirani (1993, p.184‒188) can be used for the correction of bootstrap-t confidence intervals.
bias
is a vector of additive, non-additive, and maternal
variance components. Followed by any other components in the order of
the vector remain
,
i.e. c(additive,non-additive,maternal,component1,component2,etc.), from
the raw observed variance components of observLmer3
.
The raw variance components of the chinook_bootL model were additive=
0, non-additive= 0.7192, maternal= 0.2030, tray= 0.1077, and residual=
0.5499. Typically the variance components would be from
observLmer3
above for the analysis pipeline.
The ‘bias fail’ warning is if the bias calculation is infinity (negative or positive), e.g. bias contains a zero value, so the uncorrected confidence interval is displayed for the component.
#bias only
ciMANA3(comp=chinook_bootL,remain=c("tray","Residual"),bias=c(0,0.7192,0.2030,0.1077,0.5499))
#> $raw
#> component lower median upper change
#> 1 additive 0 0 0.03 bias fail
#> 2 nonadd 0.619 0.624 0.735 <NA>
#> 3 maternal 0.174 0.206 0.24 <NA>
#> 4 tray 0.081 0.103 0.125 <NA>
#> 5 Residual 0.534 0.587 0.6 <NA>
#>
#> $percentage
#> component lower median upper change
#> 1 additive 0 0 2.9 bias fail
#> 2 nonadd 62.5 62.6 71.5 <NA>
#> 3 maternal 17 19.7 22.5 <NA>
#> 4 tray 8 10 11.9 <NA>
#> 5 Residual 50.8 54 54.5 <NA>
#full, with egg size as Fixed, observLmer3 components
#>ciMANA3(comp=length_rcomp3,remain=c("tray","Fixed"),bias=c(0,0.7152,0.1461,0.1198,0.0567))
accel
for acceleration correction uses the delete-one
observation jackknife data set. See length_jack3 all observations in the
next section.
data(chinook_jackL)
#bias and acceleration
ciMANA3(comp=chinook_bootL,remain=c("tray","Residual"),bias=c(0,0.7192,0.2030,0.1077,0.5499),
accel=chinook_jackL)
#> $raw
#> component lower median upper change
#> 1 additive 0 0 0.03 bias fail
#> 2 nonadd 0.619 0.624 0.735 <NA>
#> 3 maternal 0.174 0.206 0.24 <NA>
#> 4 tray 0.081 0.103 0.125 <NA>
#> 5 Residual 0.534 0.587 0.6 <NA>
#>
#> $percentage
#> component lower median upper change
#> 1 additive 0 0 2.9 bias fail
#> 2 nonadd 62.5 62.6 71.5 <NA>
#> 3 maternal 17 19.7 22.5 <NA>
#> 4 tray 8 10 11.9 <NA>
#> 5 Residual 50.8 54 54.5 <NA>
#full, with egg size as Fixed, observLmer3 components
#>ciMANA3(comp=length_rcomp3,remain=c("tray","Fixed"),bias=c(0,0.7152,0.1461,0.1198,0.0567),
#>accel=length_jack3)
Jackknife resampling is another method for producing confidence intervals.
The equivalent to observLmer3
is available for jackknife
resampling, i.e. JackLmer3
, using the observed data
frame.
Default is delete-one jackknife resampling as size = 1
and REML as ml = F
.
remain
is the remaining formula using lme4
package format.
Full analysis uses all observations. Example has the first 10 observations.
#>length_jack3<- JackLmer3(observ=chinook_length,dam="dam",sire="sire",response="length",
#>remain="egg_size + (1|tray)") #full, all observations
length_jack3<- JackLmer3(observ=chinook_length,dam="dam",sire="sire",response="length",
remain="egg_size + (1|tray)",first=10) #first 10 observations
#> [1] "2024-02-04 15:49:39 PST"
#> [1] "Removing observation: 1 of 1210"
#> [1] "Removing observation: 2 of 1210"
#> [1] "Removing observation: 3 of 1210"
#> [1] "Removing observation: 4 of 1210"
#> [1] "Removing observation: 5 of 1210"
#> [1] "Removing observation: 6 of 1210"
#> [1] "Removing observation: 7 of 1210"
#> [1] "Removing observation: 8 of 1210"
#> [1] "Removing observation: 9 of 1210"
#> [1] "Removing observation: 10 of 1210"
#> Time difference of 0.9954109 secs
head(length_jack3)
#> dam:sire tray sire dam Residual Fixed Total
#> 1 0.1788309 0.1198896 5.116505e-09 0.1461743 0.5503600 0.05955364 1.054808
#> 2 0.1788610 0.1195792 0.000000e+00 0.1463643 0.5503169 0.05941014 1.054531
#> 3 0.1788239 0.1189139 0.000000e+00 0.1455017 0.5495047 0.06005521 1.052799
#> 4 0.1789238 0.1194246 0.000000e+00 0.1465669 0.5501662 0.05924982 1.054331
#> 5 0.1788550 0.1199743 1.619127e-10 0.1462468 0.5503138 0.05950599 1.054896
#> 6 0.1787815 0.1198816 0.000000e+00 0.1459642 0.5503708 0.05972850 1.054727
#> additive nonadd maternal
#> 1 2.046602e-08 0.7153237 0.1461743
#> 2 0.000000e+00 0.7154439 0.1463643
#> 3 0.000000e+00 0.7152954 0.1455017
#> 4 0.000000e+00 0.7156952 0.1465669
#> 5 6.476507e-10 0.7154199 0.1462468
#> 6 0.000000e+00 0.7151259 0.1459642
Because the delete-one observation jackknife resampling may be
computationally intensive for large data sets, the
JackLmer3
function has the option of delete-d observation
jackknife resampling, for which d > 1. The rows of the observed data
frame are shuffled and a block of observations of size d is deleted
sequentially. For example, delete-5 observation jackknife resampling is
specified as size = 5
, which deletes a block of 5
observations.
Full analysis uses all observations. Example has the first 10 observations.
#>length_jack3D<- JackLmer3(observ=chinook_length,dam="dam",sire="sire",response="length",
#>remain="egg_size + (1|tray)",size=5) #full
length_jack3D<- JackLmer3(observ=chinook_length,dam="dam",sire="sire",response="length",
remain="egg_size + (1|tray)",size=5,first=10) #first 10
#> [1] "2024-02-04 15:49:40 PST"
#> [1] "Removing block: 1 of 242"
#> [1] "Removing block: 2 of 242"
#> [1] "Removing block: 3 of 242"
#> [1] "Removing block: 4 of 242"
#> [1] "Removing block: 5 of 242"
#> [1] "Removing block: 6 of 242"
#> [1] "Removing block: 7 of 242"
#> [1] "Removing block: 8 of 242"
#> [1] "Removing block: 9 of 242"
#> [1] "Removing block: 10 of 242"
#> Time difference of 1.550363 secs
head(length_jack3D)
#> dam:sire tray sire dam Residual Fixed Total
#> 1 0.1787138 0.1199980 2.882837e-07 0.1462634 0.5520360 0.05917780 1.056189
#> 2 0.1769191 0.1211236 1.455383e-09 0.1466239 0.5511235 0.05957517 1.055365
#> 3 0.1789981 0.1194060 0.000000e+00 0.1466029 0.5521988 0.05950103 1.056707
#> 4 0.1798751 0.1200773 1.136096e-09 0.1454655 0.5492563 0.05990084 1.054575
#> 5 0.1783032 0.1196376 0.000000e+00 0.1482163 0.5508447 0.06049623 1.057498
#> 6 0.1782257 0.1201585 0.000000e+00 0.1463628 0.5504652 0.06062335 1.055836
#> additive nonadd maternal
#> 1 1.153135e-06 0.7148552 0.1462631
#> 2 5.821532e-09 0.7076763 0.1466239
#> 3 0.000000e+00 0.7159922 0.1466029
#> 4 4.544386e-09 0.7195006 0.1454655
#> 5 0.000000e+00 0.7132130 0.1482163
#> 6 0.000000e+00 0.7129028 0.1463628
Extract the jackknife confidence intervals (CI) and median for the
additive genetic, non-additive genetic, and maternal values from the
data frame of models produced using JackLmer3
. Also
extracts any additional fixed effect and/or random effect variance
components.
The mean and the standard error of pseudo-values for each variance component are calculated (Efron and Tibshirani 1993, p.184‒188). The standard error is then used with the Student’s t distribution to provide the lower and upper limits for the confidence interval. For delete-d jackknife resampling, M degrees of freedom were used for producing the confidence interval (Martin et al. 2004): M = N / d, where N is the total number of observations and d is the number of deleted observations. Large values of M, such as 1,000, can translate to the delete-d jackknife resampling method approaching bootstrap resampling expectations (Efron and Tibshirani 1993, p. 149).
Martin, H., Westad, F. & Martens, H. (2004). Improved Jackknife Variance Estimates of Bilinear Model Parameters. COMPSTAT 2004 – Proceedings in Computational Statistics 16th Symposium Held in Prague, Czech Republic, 2004 (ed J. Antoch), pp. 261-275. Physica-Verlag HD, Heidelberg.
Default confidence interval is 95% as level = 95
.
remain
is a vector of column names for additional
effects.
full
is a vector of additive, non-additive, maternal,
and total variance components. Followed by any other components in the
order of the vector remain
,
i.e. c(additive,non-additive,maternal,total,component1,component2,etc.),
from the raw observed variance components of
observLmer3
.
The chinook_jackL example data for the jackknife CI were produced
using another model including tray but not egg size (see the vignette
for Advanced Normal Data Example). The raw variance
components of this model were additive= 0, non-additive= 0.7192,
maternal= 0.2030, total= 1.0404, tray= 0.1077, and residual= 0.5499.
Typically the variance components would be from observLmer3
above for the analysis pipeline.
data(chinook_jackL) #similar to length_jack3 all observations
ciJack3(comp=chinook_jackL,remain=c("tray","Residual"),full=c(0,0.7192,0.2030,1.0404,0.1077,
0.5499))
#> $raw
#> component lower mean upper
#> 1 additive 0 0 0
#> 2 nonadd 0.481 0.691 0.9
#> 3 maternal 0.197 0.244 0.291
#> 4 tray 0.024 0.056 0.088
#> 5 Residual 0.519 0.519 0.519
#>
#> $percentage
#> component lower mean upper
#> 1 additive 0 0 0
#> 2 nonadd 52.2 69.7 87.2
#> 3 maternal 20.6 24.4 28.2
#> 4 tray 3.2 5.9 8.6
#> 5 Residual 48.4 52.3 56.2
#full, all observations, with egg size as Fixed, observLmer3 components
#>ciJack3(comp=length_jack3,remain=c("tray","Fixed"),full=c(0,0.7152,0.1461,1.0543,0.1198,
#>0.0597))
The raw values are presented and are converted to a percentage of the
total variance for each model. Defaults are the number of decimal places
to round CI raw values as rnd_r = 3
and to round the CI
percent values as rnd_p = 1
.
The barMANA
and boxMANA
functions are
simple plotting functions for the confidence intervals or all values,
respectively, from the bootstrap and jackknife approaches. Default is to
display the percentage values as type = perc
. Raw values
can be displayed as type = raw
.
Within the functions, there are simple plot modifications available.
For the y-axis, min and max values can be species as ymin
and ymax
, as well as the increment as yunit
.
Also, magnification of the axis unit as cex_yaxis
and label
as cex_ylab
. The position of the legend can be specified as
leg
. Default is “topright”.
The barMANA
function produces bar graphs with the
bootstrap-t median (ciMANA3
) or jackknife pseudo-value mean
(ciJack3
) as the top of the shaded bar and error bars
covering the range of the confidence interval for each of the additive
genetic, non-additive genetic, and maternal values of a phenotypic
trait.
The length of the error bar can be specified in inches as
bar_len
.
length_ci<- ciJack3(comp=chinook_jackL,remain=c("tray","Residual"),full=c(0,0.7192,0.2030,1.0404,
0.1077,0.5499))
oldpar<- par(mfrow=c(2,1))
barMANA(ci_dat=length_ci) #basic, top
barMANA(ci_dat=length_ci,bar_len=0.3,yunit=20,ymax=100,cex_ylab=1.3) #modified, bottom
Different traits can also be combined on the same bar plot using
trait
specified in ciMANA3
or
ciJack3
. The information is combined into a list object.
For the example, the jackknife CI is duplicated to simulate ‘different
traits’.
length_ci1<- ciJack3(comp=chinook_jackL,remain=c("tray","Residual"),full=c(0,0.7192,0.2030,1.0404,
0.1077,0.5499),trait="length_1")
length_ci2<- ciJack3(comp=chinook_jackL,remain=c("tray","Residual"),full=c(0,0.7192,0.2030,1.0404,
0.1077,0.5499),trait="length_2")
comb_bar<- list(raw=rbind(length_ci1$raw,length_ci2$raw),
percentage=rbind(length_ci1$percentage,length_ci2$percentage))
barMANA(ci_dat=comb_bar,bar_len=0.3,yunit=20,ymax=100,cex_ylab=1.3)
The legend is slightly off in the presented html version but is fine with the R plotting device.
The boxMANA
function produces box plots using all values
for the bootstrap-t resampling data set (resampLmer3
) or
jackknife resampling data set (JackLmer3
).
oldpar<- par(mfrow=c(2,1))
boxMANA(comp=chinook_bootL) #from resampLmer3, basic, top
boxMANA(comp=chinook_bootL,yunit=20,ymax=100,cex_ylab=1.3,leg="topleft") #modified, bottom
Different traits can also be combined on the same box plot by adding a “trait” column to the resampling data set. For the example, the bootstrap-t data frame is duplicated to simulate ‘different traits’.
chinook_bootL1<- chinook_bootL; chinook_bootL2<- chinook_bootL #from resampLmer3
chinook_bootL1$trait<- "length_1"; chinook_bootL2$trait<- "length_2"
comb_boot<- rbind(chinook_bootL1,chinook_bootL2)
comb_boot$trait<- as.factor(comb_boot$trait)
boxMANA(comb_boot,yunit=20,ymax=100,cex_ylab=1.3,leg="topleft")
The recommended follow-up vignette is the Simple Non-Normal Data Example, covering the standard model only for non-normal error structures (e.g. binary, proportion, and/or count data types).