Christopher Gandrud
Please report any bugs or suggestions at: https://github.com/christophergandrud/DataCombine/issues.
DataCombine is a set of miscellaneous tools intended to make combining data sets–especially time-series cross-section data–easier. The package is continually being developed as I turn lines of code that I frequently use into single functions. It currently includes the following functions:
CasesTable
function added to report cases after
listwise deletion of missing values for time-series cross-sectional
data.
change
: calculates the absolute, percentage, and
proportion change from a specified lag, including within
groups.
CountSpell
: function that returns a variable
counting the spell number for an observation. Works with grouped
data.
dMerge
: merges 2 data frames and report/drop/keeps
only duplicates.
DropNA
: drops rows from a data frame when they have
missing (NA
) values on a given variable(s).
FillDown
: fills in missing (NA
) values
with the previous non-missing value
FillIn
: fills in missing values of a variable from
one data frame with the values from another variable.
FindDups
: find duplicated values in a data frame and
subset it to either include or not include them.
FindReplace
: replaces multiple patterns found in a
character string column of a data frame.
grepl.sub
: subsets a data frame if a specified
pattern is found in a character string.
InsertRow
: allows user to insert a row into a data
frame. Largely implements: Ari B. Friedman’s
function.
MoveFront
: moves variables to the front of a data
frame. This can be useful if you have a data frame with many variables
and want to move a variable or variables to the front.
NaVar
: create new variable(s) indicating if there
are missing values in other variable(s).
shift
: creates lag and lead variables, including for
time-series cross-sectional data. The shifted variable is returned to a
new vector. This function is largely based on TszKin
Julian’s shift function.
slide
: creates lag and lead variables, including for
time-series cross-sectional data. The slid variable are added to the
original data frame. This expands the capabilities of
shift
.
slideMA
: creates a moving average for a period
before or after each time point for a given variable.
SpreadDummy
: spread a dummy variable (1’s and 0’)
over a specified time period and for specified groups.
StartEnd
: finds the starting and ending time points
of a spell, including for time-series cross-sectional data.
rmExcept
: removes all objects from a workspace
except those specified by the user.
TimeExpand
: expands a data set so that it includes
an observation for each time point in a sequence. Works with grouped
data.
TimeFill
: creates a continuous
Unit
-Time
-Dummy
data frame from a
data frame with Unit
-Start
-End
times.
VarDrop
: drops one or more variables from a data
frame.
I will continue to add to the package as I build data sets and run across other pesky tasks I do repeatedly that would be simpler if they were completed by a single function.
DataCombine is on CRAN.
You can also install the most recent stable version with
install_github
from the devtools:
devtools::install_github('christophergandrud/DataCombine')