Abstract
As of bakR version 1.0.0, you can now provide a table of fraction new estimates as input. Tools like GRAND-SLAM provide such tables, though you can also use your own pipeline for estimating the fraction of reads that are new in your NR-seq experiments. In this vigngette, I will provide a brief introduction to how to use this type of alternative input in bakR. An Appendix at the end of this vignette will go deeper into some specifics regarding input data structure and analysis strategies.
The 1st step to using bakR with fraction new estimates as input is to
create a bakRFnData object. A bakRFnData object consists of two
components: a fns data frame and a metadf data frame. fns stands for
“fraction new estimates” and contains information regarding the
estimates for the fraction of sequencing reads from a given feature that
were new in a given sample. metadf stands for metadata data frame and
contains important information about the experimental details of each
sample (i.e., how long the metabolic label feed was, which samples are
reference samples, and which are experimental samples). Examples of what
these data structures are available via calls to
data("fns")
and data("metadf")
, for the fns
data frame and metadf respectively.
A fns data frame consists of rows corresponding to fraction new estimate information for a given feature. In particular, there are 4 required (and one optional) columns in the fns data frame:
The metadf data frame is described in both the “bakR for people in a hurry” and “Differential kinetic analysis with bakR” vignettes.
One route by which to obtain the fns data frame is via GRAND-SLAM, an efficient and user friendly tool developed by the Erhard lab which implements a binomial mixture model originally described by our lab. GRAND-SLAM outputs a table named run_name.tsv (where run_name is whatever you specified when running GRAND-SLAM), which can be quickly converted to a fns data frame. You can then create a bakRFnData object as follows:
After creating the bakRFnData object, you must first run bakR’s most efficient implementation (the MLE implementation from the bakR manuscript).:
# Run the efficient model
Fit <- bakRFit(bfndo)
#> Mapping sample name to sample characteristics
#> Filtering out low coverage features
#> Processing data...
#> Estimating read count-variance relationship
#> Averaging replicate data and regularizing estimates
#> Assessing statistical significance
#> All done! Run QC_checks() on your bakRFit object to assess the
#> quality of your data and get recommendations for next steps.
bakRFit() is used here as a wrapper for two functions in bakR:
fn_process() and fast_analysis(). For more details on what these
functions do, run ?fn_process
or
fast_analysis
.
When using fraction new estimates as input, only one of the two more highly powered implementations may be used (the Hybrid implementation from the bakR manuscript). You can run this as follows:
The Fit objects contain lists pertaining to the fits of each of the models. The possible contents include:
bakR provides a variety of easy to use functions for beginning to investigate your data. The visualizations are particularly aimed at revealing trends in RNA stabilization or destabilization. These include MA plots:
Volcano plots:
## Volcano Plot with Fast Fit; significance assessed relative to an FDR control of 0.05
plotVolcano(Fit$Fast_Fit)
and PCA plots:
This vignette provides the minimal amount of information to get up and running with bakR using fraction new estimates as input. If you would like a more thorough discussion of each step of this process, check out the long form version of the intro to bakR vignette (“Differential kinetic analysis with bakR”). In addition, there are a number of other vignettes that cover various topics not discussed in these intro vignettes: