Consider this example, in which we estimate the number of medications by age group:
Variables | Observations | Design |
---|---|---|
33 | 8,250 |
Stratified 1 - level Cluster Sampling design (with replacement) With (398) clusters. namcs2019sv = survey::svydesign(ids = ~CPSUM, strata = ~CSTRATM, weights = ~PATWT , data = namcs2019sv_df) |
Level | % known | Mean | SEM | SD |
---|---|---|---|---|
Under 15 years | 100 | 1.58 | 0.168 | 1.75 |
15-24 years | 100 | 1.64 | 0.112 | 1.7 |
25-44 years | 100 | 2.15 | 0.225 | 2.74 |
45-64 years | 100 | 3.49 | 0.303 | 4.49 |
65-74 years | 100 | 4.44 | 0.431 | 5.03 |
75 years and over | 100 | 5.53 | 0.494 | 5.59 |
What if we’d like to estimate the same thing, but only for the visits
for which NUMMED > 0
?
One way to do this is to create another survey object for which
NUMMED > 0
, and then analyze this new survey object.
newsurvey = survey_subset(namcs2019sv, NUMMED > 0
, label = "NAMCS 2019 PUF: NUMMED 1+")
set_survey(newsurvey)
Variables | Observations | Design |
---|---|---|
33 | 5,738 |
Stratified 1 - level Cluster Sampling design (with replacement) With (374) clusters. survey_subset(namcs2019sv, NUMMED > 0, label = “NAMCS 2019 PUF: NUMMED 1+”) |
Note that we called set_survey()
, to let R know that we
now want to analyze the new object newsurvey
, not
namcs2019sv
.
Now, let’s create the table:
Level | % known | Mean | SEM | SD |
---|---|---|---|---|
Under 15 years | 100 | 2.34 | 0.157 | 1.66 |
15-24 years | 100 | 2.34 | 0.116 | 1.58 |
25-44 years | 100 | 3.04 | 0.257 | 2.81 |
45-64 years | 100 | 4.92 | 0.358 | 4.62 |
65-74 years | 100 | 6.02 | 0.445 | 4.98 |
75 years and over | 100 | 7.29 | 0.457 | 5.32 |
Be sure to check the table title to verify that you are tabulating the new survey object.
First, let’s review what I call “advanced variable editing”.
surveytable
provides a number of functions to create
or modify survey variables.
Some examples include [var_collapse()] and [var_cut()].
Occasionally, you might need to do advanced variable editing. Here’s how:
Every survey object has an element called
variables
This is a data frame where the survey’s variables are located
variables
data frame
(which is part of the survey object).set_survey()
again. Any time you modify the
variables
data frame, call set_survey()
.For an example of this, see
vignette("Example-Residential-Care-Community-Services-User-NSLTCP-RCC-SU-report")
.
The above explanation raises the question of why
set_survey()
must be called again, after
variables
is modified. Here is an explanation:
The survey that you’re analyzing actually exists in three separate places:
mysurvey.rds
.mysurvey
.surveytable
. This is what surveytable
analyzes.Why is there (3) that’s different from (2), you might ask. That’s due to an arcane issue with how R packages work – both (2) and (3) are necessary.
Normally, the information only flows forwards, from (1) to (2) and from (2) to (3).
Forwards flow:
readRDS()
.set_survey()
.Backwards flow:
surveytable:::.load_survey()
.saveRDS()
. Normally, you
probably don’t want to do this. Normally, the survey file
(mysurvey.rds
) should probably not be changed.The functions for modifying or creating variables that are part of
the surveytable
package (like var_cut()
or
var_collapse()
) modify (3). Since (3) is what
surveytable
works with and tabulates, you can call
var_collapse()
, and then you can call tab()
.
You don’t need to do anything extra in between.
If you are modifying the variables
data frame directly,
you are actually modifying (2). After you modify (2), you need to copy
it over to (3), so that surveytable
can use it. You do that
by calling set_survey()
.
Thus, any time you modify variables
yourself, call
set_survey()
. You modify (2), then copy (2) -> (3) by
calling set_survey()
.
On the flip side, the changes that you make in (3) (using
surveytable
functions like var_cut()
or
var_collapse()
) are not reflected in (2). If you make
changes in (3), then call set_survey()
, those
changes are lost, because set_survey()
copies (2)
-> (3). If those changes were important, you can just rerun the code
that created them. If you really need to go from (3) to (2), use
mysurvey = surveytable:::.load_survey()
.