LikertMakeR (Winzar, 2025)
lets you create synthetic Likert-scale, or related rating-scale,
data.
Set the mean, standard deviation, and correlations or model
coefficients, and the package generates data matching those properties.
It can also rearrange existing data columns to achieve a desired
correlation structure or generate data based on Cronbach’s
Alpha, factor correlations, regression or
ANOVA coefficients, or other summary statistics.
The package should be useful for teaching in the Social Sciences, and for scholars who wish to “replicate” or “reverse engineer” rating-scale data for further analysis and visualisation when only summary statistics have been reported.
I was prompted to write the core functions in LikertMakeR after reviewing too many journal article submissions where authors presented questionnaire results with only means and standard deviations (often only the means), with no apparent understanding of scale distributions, and their impact on scale properties.
Hopefully, this tool will help researchers, teachers & students, and other reviewers, to better think about rating-scale distributions, and the effects of covariation, scale boundaries, and number of items in a scale. Researchers can also use LikertMakeR to create dummy data to prepare analyses ahead of a formal survey.
A Likert scale is the mean, or sum, of several ordinal rating scales. Typically, they are bipolar (usually “agree-disagree”) responses to propositions that are determined to be moderately-to-highly correlated and that capture some facet of a theoretical construct.
Rating scales, such as Likert scales, are not continuous or unbounded.
For example, a 5-point Likert scale that is constructed with, say, five items (questions) will have a summed range of between 5 (all rated ‘1’) and 25 (all rated ‘5’) with all integers in between, and the mean range will be ‘1’ to ‘5’ with intervals of 1/5=0.20. A 7-point Likert scale constructed from eight items will have a summed range between 8 (all rated ‘1’) and 56 (all rated ‘7’) with all integers in between, and the mean range will be ‘1’ to ‘7’ with intervals of 1/8=0.125.
Technically, because they are bounded and not continuous, parametric statistics, such as mean, standard deviation, and correlation, should not be applied to summated rating scales. In practice, however, parametric statistics are commonly used in the social sciences because:
they are in common usage and easily understood,
In practice, all measures are bounded by the constraints of the measurement tool, meaning that they also have upper and lower boundaries and discrete units of measurement, which means that:
results and conclusions drawn from technically-correct
non-parametric statistics are (almost) always the same as for
parametric statistics for such data.
For example, D’Alessandro
et al. (2020) argue that a summated scale, made with
multiple items, “approaches” an interval scale measure, implying that
parametric statistics are quite acceptable.
Likert-scale items, such as responses to a single 1-to-5 agree-disagree question, should not be analysed by professional or responsible researchers. There is too much random error in a single item. Rensis Likert (1932) designed the scale with the logic that a random overstatement on one item is likely to be compensated by a random understatement on another item, so that, when multiple items are combined, we get a reasonably consistent, internally reliable, measure of the target construct.
Rating-scale boundaries define minima and maxima for any scale
values. If the mean is close to one boundary then data points will
gather more closely to that boundary.
If the mean is not in the
middle of a scale, then the data will be always skewed, as shown in the
following plots.
Off-centre means always give skewed distribution in bounded rating scales
lfast() generate a vector of values with predefined mean and standard deviation.
lcor() takes a dataframe of rating-scale values and rearranges the values in each column so that the columns are correlated to match a predefined correlation matrix.
makeCorrAlpha constructs a random correlation matrix of given dimensions from a predefined Cronbach’s Alpha.
makeCorrLoadings constructs a random correlation matrix from a given factor loadings matrix, and factor-correlations matrix.
makeScales() is a wrapper function for lfast() and lcor() to generate items or summated scales with predefined first and second moments and a predefined correlation matrix. This function replaces makeItems() and now includes multi-item measures.
makeItemsScale() generates a random dataframe of scale items based on a predefined summated scale with a desired Cronbach’s Alpha.
makePaired() generates a dataframe of two correlated columns based on summary data from a paired-sample t-test.
makeRepeated() generates a dataframe of k
correlated columns based on summary data from a repeated-samples
ANOVA.
makeScalesRegression() generates a dataframe based on results of output from multiple-regression - R2, standardised betas, and IV correlations (if available).
correlateScales() creates a dataframe of correlated summated scales as one might find in completed survey questionnaire and possibly used in a Structural Equation model.
Helper Functions
alpha() calculates Cronbach’s Alpha from a given correlation matrix or a given dataframe.
eigenvalues() calculates eigenvalues of a correlation matrix, reports on positive-definite status of the matrix and, optionally, displays a scree plot to visualise the eigenvalues.
reliability() Computes internal consistency reliability estimates for a single-factor scale, including Cronbach’s alpha, McDonald’s omega (total), and optional ordinal (polychoric-based) variants and Confidence intervals
> ```
>
> install.packages("LikertMakeR")
> library(LikertMakeR)
>
> ```
> ```
>
> library(devtools)
> install_github("WinzarH/LikertMakeR")
> library(LikertMakeR)
>
> ```
To synthesise a rating scale with lfast(), the user must input the following parameters:
n: sample size
mean: desired mean
sd: desired standard deviation
lowerbound: desired lower bound
upperbound: desired upper bound
items: number of items making the scale - default = 1
An earlier version of LikertMakeR had a function, lexact(), which was slow and no more accurate than the latest version of lfast(). So, lexact() is now deprecated.
The function, lcor(), rearranges the values in the columns of a data-set so that they are correlated at a specified level. It does not change the values - it swaps their positions within each column so that univariate statistics do not change, but their correlations with other vectors do.
lcor() systematically selects pairs of values in a column and swaps their places, and checks to see if this swap improves the correlation matrix. If the revised dataframe produces a correlation matrix closer to the target correlation matrix, then the swap is retained. Otherwise, the values are returned to their original places. This process is iterated across each column.
To create the desired correlated data, the user must define the following parameters:
data: a starter data set of rating-scales. Number of columns must match the dimensions of the target correlation matrix.
target: the target correlation matrix.
Let’s generate some data: three 5-point Likert scales, each with five items.
## generate uncorrelated synthetic data
n <- 128
lowerbound <- 1
upperbound <- 5
items <- 5
mydat3 <- data.frame(
x1 = lfast(n, 2.5, 0.75, lowerbound, upperbound, items),
x2 = lfast(n, 3.0, 1.50, lowerbound, upperbound, items),
x3 = lfast(n, 3.5, 1.00, lowerbound, upperbound, items)
)
#> best solution in 540 iterations
#> best solution in 6216 iterations
#> best solution in 297 iterationsThe first six observations from this dataframe are:
#> x1 x2 x3
#> 1 2.0 5.0 2.4
#> 2 2.2 3.8 4.2
#> 3 2.6 4.0 3.4
#> 4 1.2 5.0 3.2
#> 5 2.4 4.8 5.0
#> 6 3.8 4.8 2.8
And the first and second moments (to 3 decimal places) are:
#> x1 x2 x3
#> mean 2.502 2.998 3.498
#> sd 0.750 1.501 0.999
We can see that the data have first and second moments are very close to what is expected.
As we should expect, randomly-generated synthetic data have low correlations:
#> x1 x2 x3
#> x1 1.00 -0.05 0.00
#> x2 -0.05 1.00 -0.01
#> x3 0.00 -0.01 1.00
Now, let’s define a target correlation matrix:
## describe a target correlation matrix
tgt3 <- matrix(
c(
1.00, 0.85, 0.75,
0.85, 1.00, 0.65,
0.75, 0.65, 1.00
),
nrow = 3
)So now we have a dataframe with desired first and second moments, and a target correlation matrix.
Values in each column of the new dataframe do not change from the original; the values are rearranged.
The first ten observations from this dataframe are:
#> X1 X2 X3
#> 1 2.0 2.6 2.4
#> 2 2.6 2.2 4.2
#> 3 2.6 4.0 4.4
#> 4 3.6 5.0 4.8
#> 5 2.8 4.0 5.0
#> 6 3.8 4.8 4.8
#> 7 3.0 4.2 3.6
#> 8 3.6 5.0 4.4
#> 9 4.2 4.8 5.0
#> 10 1.8 2.0 2.6
And the new dataframe is correlated close to our desired correlation matrix; here presented to 3 decimal places:
#> X1 X2 X3
#> X1 1.00 0.85 0.75
#> X2 0.85 1.00 0.65
#> X3 0.75 0.65 1.00
makeCorrAlpha(), constructs a random correlation matrix of given dimensions and predefined Cronbach’s alpha.
To create the desired correlation matrix, the user must define the following parameters:
items: or “k” - the number of rows and columns of the desired correlation matrix.
alpha: the target value for Cronbach’s Alpha
variance: a notional variance
coefficient to affect the spread of values in the correlation matrix.
Default = ‘0.1’. A value of ‘0’ produces a matrix where all off-diagonal
correlations are equal. Setting ‘variance = 0.25’ or more may be
infeasible, so the function should gracefully adjust the
variance parameter downwards to achieve PD status.
alpha_noise: Controls random variation in the target Cronbach’s alpha before the correlation matrix is constructed.
When alpha_noise = 0 (default), the requested alpha is
reproduced deterministically (subject to numerical tolerance).
When alpha_noise > 0, a small amount of random
variation is added to the requested alpha prior to constructing the
matrix. This can be useful in teaching or simulation settings where
slightly different reliability values are desired across repeated
runs.
Internally, noise is added on the Fisher z-transformed scale to ensure the resulting alpha remains within valid bounds (0, 1).
Typical guidance:
Larger values increase the spread of achieved alpha across runs.
TRUE, returns a list containing the correlation matrix and
a diagnostics list (target/achieved alpha, average inter-item
correlation, eigenvalues, PD flag, and key arguments). If
FALSE (default), returns the correlation matrix only.## define parameters
items <- 4
alpha <- 0.85
## apply makeCorrAlpha() function
set.seed(42)
cor_matrix_4 <- makeCorrAlpha(items, alpha)makeCorrAlpha() produced the following correlation matrix (to three decimal places):
#> item01 item02 item03 item04
#> item01 1.000 0.565 0.667 0.697
#> item02 0.565 1.000 0.484 0.506
#> item03 0.667 0.484 1.000 0.598
#> item04 0.697 0.506 0.598 1.000
## apply makeCorrAlpha() with diagnostics
set.seed(42)
cor_matrix_5 <- makeCorrAlpha(
items = 6,
alpha = 0.90,
diagnostics = TRUE
)## output
cor_matrix_5$R |> round(2)
#> item01 item02 item03 item04 item05 item06
#> item01 1.00 0.59 0.72 0.76 0.73 0.66
#> item02 0.59 1.00 0.50 0.52 0.50 0.45
#> item03 0.72 0.50 1.00 0.64 0.61 0.55
#> item04 0.76 0.52 0.64 1.00 0.64 0.58
#> item05 0.73 0.50 0.61 0.64 1.00 0.55
#> item06 0.66 0.45 0.55 0.58 0.55 1.00
cor_matrix_5$diagnostics
#> $items
#> [1] 6
#>
#> $alpha_input
#> [1] 0.9
#>
#> $alpha_target_effective
#> [1] 0.9
#>
#> $alpha_achieved
#> [1] 0.8999999
#>
#> $average_r
#> [1] 0.5999997
#>
#> $min_eigenvalue
#> [1] 0.192857
#>
#> $variance_input
#> [1] 0.1
#>
#> $internal_variance_used
#> [1] 0.1
#>
#> $alpha_noise
#> [1] 0makeCorrLoadings() generates a correlation matrix from factor loadings and factor correlations as might be seen in Exploratory Factor Analysis (EFA) or a Structural Equation Model (SEM).
makeCorrLoadings(loadings, factorCor = NULL, uniquenesses = NULL, nearPD = FALSE)
loadings: k (items) by
f (factors) matrix of standardised factor
loadings. Item names and Factor names can be taken from the row_names
(items) and the column_names (factors), if present.
factorCor: f x
f factor correlation matrix. If not present, then we assume
that the factors are uncorrelated (orthogonal), which is rare in
practice, and the function applies an identity matrix for
factor_cor.
uniquenesses: length k
vector of uniquenesses. If NULL, the default, compute from the
calculated communalities.
nearPD: (logical) If TRUE, then the function calls the nearPD function from the Matrix package to transform the resulting correlation matrix onto the nearest Positive Definite matrix. Obviously, this only applies if the resulting correlation matrix is not positive definite. (It should never be needed.)
“Censored” loadings (for example, where loadings less than some small
value (often ‘0.30’), are removed for ease-of-communication) tend to
severely reduce the accuracy of the makeCorrLoadings()
function. For a detailed demonstration, see the vignette file,
makeCorrLoadings_Validate.
## Example loadings
factorLoadings <- matrix(
c(
0.05, 0.20, 0.70,
0.10, 0.05, 0.80,
0.05, 0.15, 0.85,
0.20, 0.85, 0.15,
0.05, 0.85, 0.10,
0.10, 0.90, 0.05,
0.90, 0.15, 0.05,
0.80, 0.10, 0.10
),
nrow = 8, ncol = 3, byrow = TRUE
)
## row and column names
rownames(factorLoadings) <- c("Q1", "Q2", "Q3", "Q4", "Q5", "Q6", "Q7", "Q8")
colnames(factorLoadings) <- c("Factor1", "Factor2", "Factor3")
## Factor correlation matrix**
factorCor <- matrix(
c(
1.0, 0.5, 0.4,
0.5, 1.0, 0.3,
0.4, 0.3, 1.0
),
nrow = 3, byrow = TRUE
)## apply makeCorrLoadings() function
itemCorrelations <- makeCorrLoadings(factorLoadings, factorCor)
## derived correlation matrix to two decimal places
round(itemCorrelations, 2)
#> Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
#> Q1 1.00 0.62 0.67 0.48 0.42 0.42 0.43 0.41
#> Q2 0.62 1.00 0.72 0.43 0.36 0.36 0.44 0.42
#> Q3 0.67 0.72 1.00 0.50 0.43 0.43 0.46 0.45
#> Q4 0.48 0.43 0.50 1.00 0.79 0.83 0.65 0.58
#> Q5 0.42 0.36 0.43 0.79 1.00 0.80 0.54 0.48
#> Q6 0.42 0.36 0.43 0.83 0.80 1.00 0.59 0.52
#> Q7 0.43 0.44 0.46 0.65 0.54 0.59 1.00 0.78
#> Q8 0.41 0.42 0.45 0.58 0.48 0.52 0.78 1.00## correlated factors mean that eigenvalues should suggest two or three factors
eigenvalues(cormatrix = itemCorrelations, scree = TRUE)#> itemCorrelations is positive-definite
#> [1] 4.7679427 1.2254239 0.7641967 0.3799863 0.2668158 0.2237851 0.2073574
#> [8] 0.1644922
## orthogonal factors are assumed when factor correlation matrix is not included
orthogonalItemCors <- makeCorrLoadings(factorLoadings)
## derived correlation matrix to two decimal places
round(orthogonalItemCors, 2)
#> Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
#> Q1 1.00 0.58 0.63 0.29 0.24 0.22 0.11 0.13
#> Q2 0.58 1.00 0.69 0.18 0.13 0.10 0.14 0.17
#> Q3 0.63 0.69 1.00 0.26 0.22 0.18 0.11 0.14
#> Q4 0.29 0.18 0.26 1.00 0.75 0.79 0.32 0.26
#> Q5 0.24 0.13 0.22 0.75 1.00 0.78 0.18 0.14
#> Q6 0.22 0.10 0.18 0.79 0.78 1.00 0.23 0.18
#> Q7 0.11 0.14 0.11 0.32 0.18 0.23 1.00 0.74
#> Q8 0.13 0.17 0.14 0.26 0.14 0.18 0.74 1.00makeScales() generates a dataframe of random discrete values so the data replicate a set of scale items or summated rating scales, and are correlated close to a predefined correlation matrix.
Generally, means, standard deviations, and correlations are correct to two decimal places.
makeScales() is a wrapper function for
lfast(), which takes repeated samples selecting a vector that best fits the desired moments, and
lcor(), which rearranges values in each column of the dataframe so they closely match the desired correlation matrix.
To create the desired dataframe, the user must define the following parameters:
n: number of observations
dfMeans: a vector of length
k of desired means of each variable
dfSds: a vector of length
k of desired standard deviations of each variable
lowerbound: a vector of length
k of values for the lower bound of each variable. default =
‘1’
upperbound: a vector of length
k of values for the upper bound of each variable. Default =
‘5’
items: a vector of length
k of the number of items in each variable. Default =
‘1’.
cormatrix: a target correlation matrix
with k rows and k columns.
## define parameters
n <- 256
dfMeans <- c(3.9, 4.1, 3.6, 4.0)
dfSds <- c(0.6, 0.5, 0.8, 0.7)
lowerbound <- rep(1, 4)
upperbound <- rep(5, 4)
items <- c(4, 3, 4, 3)
corMat <- matrix(
c(
1.00, 0.75, 0.60, 0.70,
0.75, 1.00, 0.65, 0.72,
0.60, 0.65, 1.00, 0.68,
0.70, 0.72, 0.68, 1.00
),
nrow = 4, ncol = 4
)
scale_names <- c("BT", "BS", "BL", "BLY")
rownames(corMat) <- scale_names
colnames(corMat) <- scale_names
## apply makeScales() function
df <- makeScales(
n = n,
means = dfMeans,
sds = dfSds,
lowerbound = lowerbound,
upperbound = upperbound,
items = items,
cormatrix = corMat
)
#> Variable 1 : BT -
#> best solution in 754 iterations
#> Variable 2 : BS -
#> best solution in 104 iterations
#> Variable 3 : BL -
#> best solution in 972 iterations
#> Variable 4 : BLY -
#> best solution in 1372 iterations
#>
#> Arranging data to match correlations
#>
#> Successfully generated correlated variables
## test the function
head(df)
#> BT BS BL BLY
#> 1 4.00 4.000000 4.00 3.000000
#> 2 3.75 4.333333 4.50 4.666667
#> 3 3.00 3.666667 3.25 4.000000
#> 4 3.00 3.666667 1.50 3.666667
#> 5 4.50 4.666667 4.75 4.333333
#> 6 2.75 3.666667 3.25 4.000000
tail(df)
#> BT BS BL BLY
#> 251 4.5 4.666667 4.00 4.666667
#> 252 3.0 3.000000 3.75 3.666667
#> 253 2.5 3.333333 1.75 3.333333
#> 254 4.0 3.666667 4.00 4.000000
#> 255 4.5 4.666667 4.25 4.666667
#> 256 4.0 4.333333 4.75 4.666667
### means should be correct to two decimal places
dfmoments <- data.frame(
mean = apply(df, 2, mean) |> round(3),
sd = apply(df, 2, sd) |> round(3)
) |> t()
dfmoments
#> BT BS BL BLY
#> mean 3.901 4.102 3.599 4.0
#> sd 0.600 0.499 0.799 0.7
### correlations should be correct to two decimal places
cor(df) |> round(3)
#> BT BS BL BLY
#> BT 1.00 0.750 0.60 0.700
#> BS 0.75 1.000 0.65 0.719
#> BL 0.60 0.650 1.00 0.680
#> BLY 0.70 0.719 0.68 1.000This is a two-step process:
apply makeCorrAlpha() to generate a correlation matrix from desired alpha,
apply makeScales() to generate rating-scale items from the correlation matrix and desired moments
Required parameters are:
k: number items/ columns
alpha: a target Cronbach’s Alpha.
n: number of observations
lowerbound: a vector of length
k of values for the lower bound of each variable
upperbound: a vector of length
k of values for the upper bound of each variable
means: a vector of length
k of desired means of each variable
sds: a vector of length k
of desired standard deviations of each variable
## define parameters
k <- 6
myAlpha <- 0.85
## generate correlation matrix
set.seed(42)
myCorr <- makeCorrAlpha(items = k, alpha = myAlpha)
## display correlation matrix
myCorr |> round(3)
#> item01 item02 item03 item04 item05 item06
#> item01 1.000 0.477 0.597 0.632 0.603 0.536
#> item02 0.477 1.000 0.392 0.414 0.395 0.352
#> item03 0.597 0.392 1.000 0.519 0.495 0.440
#> item04 0.632 0.414 0.519 1.000 0.523 0.466
#> item05 0.603 0.395 0.495 0.523 1.000 0.444
#> item06 0.536 0.352 0.440 0.466 0.444 1.000
### checking Cronbach's Alpha
alpha(cormatrix = myCorr)
#> [1] 0.8499932## define parameters
n <- 256
myMeans <- c(2.75, 3.00, 3.00, 3.25, 3.50, 3.5)
mySds <- c(1.00, 0.75, 1.00, 1.00, 1.00, 1.5)
lowerbound <- rep(1, k)
upperbound <- rep(5, k)
## Generate Items
myItems <- makeScales(
n = n, means = myMeans, sds = mySds,
lowerbound = lowerbound, upperbound = upperbound,
items = 1,
cormatrix = myCorr
)
#> Variable 1 : item01 -
#> best solution in 288 iterations
#> Variable 2 : item02 -
#> best solution in 2220 iterations
#> Variable 3 : item03 -
#> best solution in 1058 iterations
#> Variable 4 : item04 -
#> best solution in 2125 iterations
#> Variable 5 : item05 -
#> best solution in 452 iterations
#> Variable 6 : item06 -
#> best solution in 12575 iterations
#>
#> Arranging data to match correlations
#>
#> Successfully generated correlated variables
## resulting dataframe
head(myItems)
#> item01 item02 item03 item04 item05 item06
#> 1 4 3 4 5 4 4
#> 2 4 3 3 4 4 3
#> 3 2 3 3 4 3 3
#> 4 3 3 3 2 4 3
#> 5 2 4 3 3 3 5
#> 6 1 2 2 2 3 4
tail(myItems)
#> item01 item02 item03 item04 item05 item06
#> 251 2 3 3 3 4 3
#> 252 3 2 3 3 3 3
#> 253 2 3 2 2 3 3
#> 254 2 2 4 2 3 5
#> 255 4 3 3 5 3 4
#> 256 2 3 3 3 4 4
## means and standard deviations
myMoments <- data.frame(
means = apply(myItems, 2, mean) |> round(3),
sds = apply(myItems, 2, sd) |> round(3)
) |> t()
myMoments
#> item01 item02 item03 item04 item05 item06
#> means 2.750 3.000 3.000 3.250 3.500 3.500
#> sds 0.998 0.751 0.998 0.998 1.002 1.498
## Cronbach's Alpha of dataframe
alpha(NULL, myItems)
#> [1] 0.8500638Summary of dataframe from makeScales() function
To create the desired dataframe, the user must define the following parameters:
scale: a vector or dataframe of the summated rating scale. Should range from (‘lowerbound’ * ‘items’) to (‘upperbound’ * ‘items’)
lowerbound: lower bound of the scale item (example: ‘1’ in a ‘1’ to ‘5’ rating)
upperbound: upper bound of the scale item (example: ‘5’ in a ‘1’ to ‘5’ rating)
items: k, or number of columns to generate
alpha: desired Cronbach’s Alpha. Default = ‘0.8’
summated: (logical) If TRUE, the scale is treated as a summed score (e.g., 4–20 for four 5-point items). If FALSE, it is treated as an averaged score (e.g., 1–5 in 0.25 increments). Default = TRUE.
## define parameters
n <- 256
mean <- 3.00
sd <- 0.85
lowerbound <- 1
upperbound <- 5
items <- 4
## apply lfast() function
meanScale <- lfast(
n = n, mean = mean, sd = sd,
lowerbound = lowerbound, upperbound = upperbound,
items = items
)
#> best solution in 1035 iterations
## sum over all items
summatedScale <- meanScale * itemsSummated scale distribution
## apply makeItemsScale() function
newItems_1 <- makeItemsScale(
scale = summatedScale,
lowerbound = lowerbound,
upperbound = upperbound,
items = items
)
#> rearrange 4 values within each of 256 rows
#> Complete!
#> desired Cronbach's alpha = 0.8 (achieved alpha = 0.7997)
### First 10 observations and summated scale
head(cbind(newItems_1, summatedScale), 10)
#> V1 V2 V3 V4 summatedScale
#> 1 2 2 4 4 12
#> 2 2 4 4 2 12
#> 3 1 2 3 1 7
#> 4 2 4 3 5 14
#> 5 1 3 2 4 10
#> 6 3 3 5 5 16
#> 7 2 4 4 2 12
#> 8 2 4 3 5 14
#> 9 1 4 3 3 11
#> 10 1 3 2 4 10
### correlation matrix
cor(newItems_1) |> round(2)
#> V1 V2 V3 V4
#> V1 1.00 0.62 0.65 0.51
#> V2 0.62 1.00 0.43 0.47
#> V3 0.65 0.43 1.00 0.31
#> V4 0.51 0.47 0.31 1.00
### default Cronbach's alpha = 0.80
alpha(data = newItems_1) |> round(4)
#> [1] 0.7997
### calculate eigenvalues and print scree plot
eigenvalues(cor(newItems_1), 1) |> round(3)#> cor(newItems_1) is positive-definite
#> [1] 2.515 0.708 0.501 0.277
## apply makeItemsScale() function
newItems_2 <- makeItemsScale(
scale = summatedScale,
lowerbound = lowerbound,
upperbound = upperbound,
items = items,
alpha = 0.9
)
#> rearrange 4 values within each of 256 rows
#> Complete!
#> desired Cronbach's alpha = 0.9 (achieved alpha = 0.899)
### First 10 observations and summated scale
head(cbind(newItems_2, summatedScale), 10)
#> V1 V2 V3 V4 summatedScale
#> 1 4 2 3 3 12
#> 2 3 3 2 4 12
#> 3 2 1 2 2 7
#> 4 4 3 4 3 14
#> 5 3 2 2 3 10
#> 6 4 4 5 3 16
#> 7 4 3 3 2 12
#> 8 4 3 3 4 14
#> 9 3 3 3 2 11
#> 10 3 2 2 3 10
### correlation matrix
cor(newItems_2) |> round(2)
#> V1 V2 V3 V4
#> V1 1.00 0.84 0.66 0.66
#> V2 0.84 1.00 0.68 0.67
#> V3 0.66 0.68 1.00 0.63
#> V4 0.66 0.67 0.63 1.00
### requested Cronbach's alpha = 0.90
alpha(data = newItems_2) |> round(4)
#> [1] 0.899
### calculate eigenvalues and print scree plot
eigenvalues(cor(newItems_2), 1) |> round(3)#> cor(newItems_2) is positive-definite
#> [1] 3.075 0.398 0.370 0.157
Generating a data for an independent-samples t-test is trivial with LikertMakeR. But a dataframe for a paired-sample t-test is tricky because the observations are related to each other. That is, we must generate a dataframe of correlated observations.
Note that such tests don’t even require the same sample-size.
## define parameters
lower <- 1
upper <- 5
items <- 6
## generate two independent samples
x1 <- lfast(
n = 20, mean = 2.5, sd = 0.75,
lowerbound = lower, upperbound = upper, items = items
)
#> reached maximum of 1024 iterations
x2 <- lfast(
n = 30, mean = 3.0, sd = 0.85,
lowerbound = lower, upperbound = upper, items = items
)
#> reached maximum of 1024 iterations
## run independent-samples t-test
t.test(x1, x2)
#>
#> Welch Two Sample t-test
#>
#> data: x1 and x2
#> t = -2.1782, df = 44.246, p-value = 0.03476
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -0.96254994 -0.03745006
#> sample estimates:
#> mean of x mean of y
#> 2.5 3.0makePaired() generates correlated values so the data replicate rating scales taken, for example, in a before and after experimental design. The function is effectively a wrapper function for lfast() and lcor() with the addition of a t-statistic from which the between-column correlation is inferred.
Paired t-tests apply to observations that are associated with each other. For example: the same people rating the same object before and after a treatment, the same people rating two different objects, ratings by husband & wife, etc.
makePaired() has similar parameters as for the lfast() function with the addition of a value for the desired t-statistic.
n sample size
means a [1:2] vector of target means for two before/after measures
sds a [1:2] vector of target standard deviations
t_value desired paired t-statistic
lowerbound lower bound (e.g. ‘1’ for a 1-5 rating scale)
upperbound upper bound (e.g. ‘5’ for a 1-5 rating scale)
items number of items in the rating scale.
precision can relax the level of accuracy required, as in lfast().
## define parameters
n <- 20
means <- c(2.5, 3.0)
sds <- c(0.75, 0.85)
lower <- 1
upper <- 5
items <- 6
t <- -2.5
## run the function
pairedDat <- makePaired(
n = n, means = means, sds = sds,
t_value = t,
lowerbound = lower, upperbound = upper, items = items
)
#> Initial data vectors
#> reached maximum of 1024 iterations
#> best solution in 468 iterations
#> Arranging values to conform with desired t-value
#> Complete!## test function output
str(pairedDat)
#> 'data.frame': 20 obs. of 2 variables:
#> $ X1: num 3.17 2.33 3.5 3 2 ...
#> $ X2: num 2.33 2.83 3.67 3.83 4.83 ...
cor(pairedDat) |> round(2)
#> X1 X2
#> X1 1.00 0.38
#> X2 0.38 1.00
pairedMoments <- data.frame(
mean = apply(pairedDat, MARGIN = 2, FUN = mean) |> round(3),
sd = apply(pairedDat, MARGIN = 2, FUN = sd) |> round(3)
) |> t()
pairedMoments
#> X1 X2
#> mean 2.500 3.000
#> sd 0.753 0.852## run a paired-sample t-test
paired_t <- t.test(x = pairedDat$X1, y = pairedDat$X2, paired = TRUE)
# paired_t <- t.test(pairedDat$X1, pairedDat$X2, paired = TRUE)
paired_t
#>
#> Paired t-test
#>
#> data: pairedDat$X1 and pairedDat$X2
#> t = -2.4936, df = 19, p-value = 0.02203
#> alternative hypothesis: true mean difference is not equal to 0
#> 95 percent confidence interval:
#> -0.91967444 -0.08032556
#> sample estimates:
#> mean difference
#> -0.5makeRepeated() Reconstructs a synthetic dataset and inter-timepoint correlation matrix from a repeated-measures ANOVA result, based on reported means, standard deviations, and an F-statistic.
This function estimates the average correlation between repeated measures by matching the reported F-statistic, under one of three assumed correlation structures:
"cs" (Compound Symmetry): Compound Symmetry
assumes that all repeated measures are equally correlated with each
other. That is, the correlation between time 1 and time 2 is the same as
between time 1 and time 3, and so on. This structure is commonly used in
repeated-measures ANOVA by default. It’s mathematically simple and
reflects the idea that all timepoints are equally related. However, it
may not be realistic for data where correlations decrease as time
intervals increase (e.g., memory decay or learning effects).
"ar1" (First-Order Autoregressive):
first-order autoregressive, assumes that measurements closer together in
time are more highly correlated than those further apart. For example,
the correlation between time 1 and time 2 is stronger than between time
1 and time 3. This pattern is often realistic in longitudinal or
time-series studies where change is gradual. The correlation drops off
exponentially with each time step.
"toeplitz" (Linearly Decreasing): Toeplitz
structure is a more flexible option that allows the correlation between
measurements to decrease linearly as the time gap increases. Unlike
AR(1), where the decline is exponential, the Toeplitz structure assumes
a straight-line drop in correlation.
makeRepeated(
n,
k,
means,
sds,
f_stat,
df_between = k - 1,
df_within = (n - 1) * (k - 1),
structure = c("cs", "ar1", "toeplitz"),
names = paste0("time_", 1:k),
items = 1,
lowerbound = 1, upperbound = 5,
return_corr_only = FALSE,
diagnostics = FALSE,
...
)
k. Mean values reported for each timepoint.k. Standard deviations reported for each timepoint.k - 1).(n - 1) * (k - 1))."cs", "ar1", or
"toeplitz" (default).k. Variable names for each timepoint (default:
"time_1" to "time_k").link{lfast}).
(default: 1).1).5).TRUE, return only the estimated correlation matrix.TRUE,
include diagnostic summaries such as feasible F-statistic range and
effect sizes.out1 <- makeRepeated(
n = 128,
k = 3,
means = c(3.1, 3.5, 3.9),
sds = c(1.0, 1.1, 1.0),
items = 4,
f_stat = 4.87,
structure = "cs",
diagnostics = FALSE
)
#> Warning in makeRepeated(n = 128, k = 3, means = c(3.1, 3.5, 3.9), sds = c(1, :
#> Optimization may not have converged. Check results carefully.
#> best solution in 825 iterations
#> best solution in 2003 iterations
#> best solution in 841 iterations
head(out1$data)
#> time_1 time_2 time_3
#> 1 3.75 1.75 4.75
#> 2 3.25 2.50 4.50
#> 3 1.50 4.75 4.25
#> 4 2.75 4.00 4.00
#> 5 3.00 4.00 3.50
#> 6 3.75 4.75 1.50
out1$correlation_matrix
#> time_1 time_2 time_3
#> time_1 1.0000000 -0.4899454 -0.4899454
#> time_2 -0.4899454 1.0000000 -0.4899454
#> time_3 -0.4899454 -0.4899454 1.0000000
out2 <- makeRepeated(
n = 32, k = 4,
means = c(2.75, 3.5, 4.0, 4.4),
sds = c(0.8, 1.0, 1.2, 1.0),
f_stat = 16,
structure = "ar1",
items = 5,
lowerbound = 1, upperbound = 7,
return_corr_only = FALSE,
diagnostics = TRUE
)
#> reached maximum of 1024 iterations
#> best solution in 507 iterations
#> best solution in 324 iterations
#> reached maximum of 1024 iterations
print(out2)
#> $data
#> time_1 time_2 time_3 time_4
#> 1 3.0 5.4 5.6 6.2
#> 2 2.6 2.4 2.6 3.0
#> 3 2.6 3.6 2.8 6.0
#> 4 2.8 2.6 2.2 2.6
#> 5 3.2 5.0 3.8 5.8
#> 6 4.2 4.6 3.6 4.2
#> 7 2.4 3.4 4.8 3.8
#> 8 2.8 4.6 4.4 3.6
#> 9 2.4 2.2 5.2 5.0
#> 10 3.2 2.2 5.0 4.6
#> 11 2.6 3.8 2.8 2.8
#> 12 1.6 4.0 5.6 4.4
#> 13 2.6 3.8 3.8 5.2
#> 14 3.8 4.0 3.6 3.2
#> 15 4.0 3.4 3.0 4.4
#> 16 3.0 4.2 5.0 3.4
#> 17 2.6 3.4 6.6 5.4
#> 18 2.0 3.2 3.0 3.6
#> 19 1.2 4.0 4.2 5.0
#> 20 1.8 2.2 2.4 2.8
#> 21 1.6 2.2 2.6 3.6
#> 22 2.4 2.6 2.0 4.8
#> 23 3.0 2.2 4.2 6.0
#> 24 3.4 2.8 3.2 4.4
#> 25 4.0 5.2 6.2 4.8
#> 26 1.4 3.0 3.8 4.2
#> 27 2.4 3.2 3.8 5.2
#> 28 3.6 5.0 3.6 4.2
#> 29 3.6 2.6 4.2 5.4
#> 30 2.4 2.6 5.0 4.6
#> 31 1.8 3.6 3.8 4.8
#> 32 3.8 5.0 5.6 3.8
#>
#> $correlation_matrix
#> time_1 time_2 time_3 time_4
#> time_1 1.00000000 0.3910032 0.1528835 0.05977794
#> time_2 0.39100319 1.0000000 0.3910032 0.15288350
#> time_3 0.15288350 0.3910032 1.0000000 0.39100319
#> time_4 0.05977794 0.1528835 0.3910032 1.00000000
#>
#> $structure
#> [1] "ar1"
#>
#> $feasible_f_range
#> min max
#> 9.353034 39.481390
#>
#> $recommended_f
#> $recommended_f$conservative
#> [1] 10.21
#>
#> $recommended_f$moderate
#> [1] 11.91
#>
#> $recommended_f$strong
#> [1] 30.29
#>
#>
#> $achieved_f
#> [1] 15.99983
#>
#> $effect_size_raw
#> [1] 0.3792188
#>
#> $effect_size_standardised
#> [1] 0.3717831
out3 <- makeRepeated(
n = 32, k = 4,
means = c(2.0, 2.5, 3.0, 2.8),
sds = c(0.8, 0.9, 1.0, 0.9),
items = 4,
f_stat = 24,
structure = "toeplitz",
diagnostics = TRUE
)
#> Warning in makeRepeated(n = 32, k = 4, means = c(2, 2.5, 3, 2.8), sds = c(0.8,
#> : Optimization may not have converged. Check results carefully.
#> best solution in 114 iterations
#> reached maximum of 1024 iterations
#> best solution in 256 iterations
#> reached maximum of 1024 iterations
str(out3)
#> List of 8
#> $ data :'data.frame': 32 obs. of 4 variables:
#> ..$ time_1: num [1:32] 1 2.25 1.25 1.75 1 3 3.5 2.75 1 2.25 ...
#> ..$ time_2: num [1:32] 2 3 2.25 2.5 1.25 3.5 4 1.5 2 3.25 ...
#> ..$ time_3: num [1:32] 2.25 4.5 3 3.25 3 3 3.75 2.75 3.75 4.5 ...
#> ..$ time_4: num [1:32] 3.25 2.75 2.5 2.5 2.75 3.5 2 3.5 3.25 4.25 ...
#> $ correlation_matrix : num [1:4, 1:4] 1 0.66 0.33 0 0.66 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:4] "time_1" "time_2" "time_3" "time_4"
#> .. ..$ : chr [1:4] "time_1" "time_2" "time_3" "time_4"
#> $ structure : chr "toeplitz"
#> $ feasible_f_range : Named num [1:2] 5.57 8.64
#> ..- attr(*, "names")= chr [1:2] "min" "max"
#> $ recommended_f :List of 3
#> ..$ conservative: num 5.59
#> ..$ moderate : num 5.62
#> ..$ strong : num 7.64
#> $ achieved_f : num 9.95
#> $ effect_size_raw : num 0.142
#> $ effect_size_standardised: num 0.174Generates synthetic rating-scale data that replicates reported
regression results: standardised betas, R^2, and
correlation matrix of independent variables (if available).
makeScalesRegression <- (
n,
beta_std,
r_squared,
iv_cormatrix = NULL,
iv_cor_mean = 0.3,
iv_cor_variance = 0.01,
iv_cor_range = c(-0.7, 0.7),
iv_means,
iv_sds,
dv_mean,
dv_sd,
lowerbound_iv,
upperbound_iv,
lowerbound_dv,
upperbound_dv,
items_iv = 1,
items_dv = 1,
var_names = NULL,
tolerance = 0.005
)
k
(number of independent variables) of standardised betas.R^2NULL0.30.01c(-0.7, 0.7)k
of IV mean valuesk
of IV standard deviationsk of lowerbounds for IV’sk of upperbounds for IV’sk
of number of items in the IV’s. Default = 1.1.NULL0.005set.seed(123)
iv_corr <- matrix(c(1.0, 0.3, 0.3, 1.0), nrow = 2)
result1 <- makeScalesRegression(
n = 64,
beta_std = c(0.4, 0.3),
r_squared = 0.35,
iv_cormatrix = iv_corr,
iv_means = c(3.0, 3.5),
iv_sds = c(1.0, 0.9),
dv_mean = 3.8,
dv_sd = 1.1,
lowerbound_iv = 1,
upperbound_iv = 5,
lowerbound_dv = 1,
upperbound_dv = 5,
items_iv = 4,
items_dv = 4,
var_names = c("Attitude", "Intention", "Behaviour")
)
print(result1)
head(result1$data)
set.seed(456)
result2 <- makeScalesRegression(
n = 64,
beta_std = c(0.3, 0.25, 0.2),
r_squared = 0.40,
iv_cormatrix = NULL, # Will be optimised
iv_cor_mean = 0.3,
iv_cor_variance = 0.02,
iv_means = c(3.0, 3.2, 2.8),
iv_sds = c(1.0, 0.9, 1.1),
dv_mean = 3.5,
dv_sd = 1.0,
lowerbound_iv = 1,
upperbound_iv = 5,
lowerbound_dv = 1,
upperbound_dv = 5,
items_iv = 4,
items_dv = 5
)
# View optimised correlation matrix
print(result2$target_stats$iv_cormatrix)
print(result2$optimisation_info)
likertMakeR() includes two additional functions that may be of help when examining parameters and output.
alpha() calculates Cronbach’s Alpha from a given correlation matrix or a given dataframe
eigenvalues() calculates eigenvalues of a correlation matrix, a report on whether the correlation matrix is positive definite, and produces an optional scree plot.
reliability presents a table of internal consistency statistics
alpha() accepts, as input, either a correlation matrix or a dataframe. If both are submitted, then the correlation matrix is used by default, with a message to that effect.
## define parameters
df <- data.frame(
V1 = c(4, 2, 4, 3, 2, 2, 2, 1),
V2 = c(3, 1, 3, 4, 4, 3, 2, 3),
V3 = c(4, 1, 3, 5, 4, 1, 4, 2),
V4 = c(4, 3, 4, 5, 3, 3, 3, 3)
)
corMat <- matrix(
c(
1.00, 0.35, 0.45, 0.75,
0.35, 1.00, 0.65, 0.55,
0.45, 0.65, 1.00, 0.65,
0.75, 0.55, 0.65, 1.00
),
nrow = 4, ncol = 4
)
## apply function examples
alpha(cormatrix = corMat)
#> [1] 0.8395062
alpha(data = df)
#> [1] 0.8026658
alpha(NULL, df)
#> [1] 0.8026658
alpha(corMat, df)
#> Alert:
#> Both cormatrix and data present.
#>
#> Using cormatrix by default.
#> [1] 0.8395062eigenvalues() calculates eigenvalues of a correlation matrix, reports on whether the matrix is positive-definite, and optionally produces a scree plot.
## define parameters
correlationMatrix <- matrix(
c(
1.00, 0.25, 0.35, 0.45,
0.25, 1.00, 0.70, 0.75,
0.35, 0.70, 1.00, 0.85,
0.45, 0.75, 0.85, 1.00
),
nrow = 4, ncol = 4
)
## apply function
evals <- eigenvalues(cormatrix = correlationMatrix)
#> correlationMatrix is positive-definite
print(evals)
#> [1] 2.7484991 0.8122627 0.3048151 0.1344231reliabiity() Computes internal consistency reliability estimates for a single-factor scale, including Cronbach’s alpha, McDonald’s omega (total), and optional ordinal (polychoric-based) variants and Confidence intervals.
## create dataset
my_cor <- LikertMakeR::makeCorrAlpha(
items = 4,
alpha = 0.80
)
my_data <- LikertMakeR::makeScales(
n = 64,
means = c(2.75, 3.00, 3.25, 3.50),
sds = c(1.25, 1.50, 1.30, 1.25),
lowerbound = rep(1, 4),
upperbound = rep(5, 4),
cormatrix = my_cor
)
#> Variable 1 : item01 -
#> Variable 2 : item02 -
#> Variable 3 : item03 -
#> Variable 4 : item04 -
#>
#> Arranging data to match correlations
#>
#> Successfully generated correlated variables
## run function
reliability(my_data)
#> coef_name estimate n_items n_obs notes
#> alpha 0.801 4 64 Pearson correlations
#> omega_total 0.871 4 64 1-factor eigen omega
reliability(
my_data,
include = c("lambda6", "polychoric")
)
#> coef_name estimate n_items n_obs
#> alpha 0.801 4 64
#> omega_total 0.871 4 64
#> lambda6 0.758 4 64
#> ordinal_alpha 0.762 4 64
#> ordinal_omega_total 0.849 4 64
#> notes
#> Pearson correlations
#> 1-factor eigen omega
#> psych::alpha()
#> Polychoric correlations
#> Polychoric correlations | Ordinal CIs not requested
## bootstrapped Confidence intervals can be slow!
reliability(
my_data,
include = "polychoric",
ci = TRUE,
n_boot = 64
)
#> coef_name estimate ci_lower ci_upper n_items n_obs
#> alpha 0.801 0.715 0.866 4 64
#> omega_total 0.871 0.825 0.909 4 64
#> ordinal_alpha 0.762 0.655 0.824 4 64
#> ordinal_omega_total 0.849 0.795 0.883 4 64
#> notes
#> Pearson correlations
#> 1-factor eigen omega
#> Polychoric correlations
#> Polychoric correlations | Ordinal CIs via bootstrapLikertMakeR is intended for synthesising & correlating rating-scale data with means, standard deviations, and correlations as close as possible to predefined parameters. If you don’t need your data to be close to exact, then other options may be faster or more flexible.
Different approaches include:
sampling from a truncated normal distribution
sampling with a predetermined probability distribution
marginal model specification
Data are sampled from a normal distribution, and then truncated to suit the rating-scale boundaries, and rounded to set discrete values as we see in rating scales.
See Heinz (2021) for an excellent and short example using the following packages:
See also the rLikert() function from the excellent latent2likert package, Lalovic (2024), for an approach using optimal discretization and skew-normal distribution. latent2likert() converts continuous latent variables into ordinal categories to generate Likert scale item responses.
Marginal model specification extends the idea of a predefined probability distribution to multivariate and correlated dataframes.
SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types on CRAN.
lsasim: Functions to Facilitate the Simulation of Large Scale Assessment Data on CRAN. See Matta et al. (2018)
SimCorMultRes: Simulates Correlated Multinomial Responses on CRAN. See Touloumis (2016)
covsim: VITA, IG and PLSIM Simulation for Given Covariance and Marginals on CRAN. See Grønneberg et al. (2022)
The latentFactoR
package is ideal for generating multi-factor items.
latentFactoR::simulate_factors() generates data based on
latent factor models, which in turn can be adjusted to continuous,
polytomous, dichotomous, or mixed. Skews, cross-loadings, wording
effects, population errors, and local dependencies can be added.
High recommended!
The psych
package has several excellent functions for simulating rating-scale
data based on factor loadings.
These focus on factor and item
correlations rather than item moments.
Highly
recommended.
psych::sim.item Generate simulated data structures for circumplex, spherical, or simple structure
psych::sim.congeneric Simulate a congeneric data set with or without minor factors See Revelle (in prep)
Also:
simsem has many functions for simulating and testing data for application in Structural Equation modelling. See examples at https://simsem.org/
D’Alessandro, S., H. Winzar, B. Lowe, B.J. Babin, W. Zikmund (2020). Marketing Research 5ed, Cengage Australia. https://cengage.com.au/sem121/marketing-research-5th-edition-dalessandro-babin-zikmund
Grønneberg, S., Foldnes, N., & Marcoulides, K. M. (2022). covsim: An R Package for Simulating Non-Normal Data for Structural Equation Models Using Copulas. Journal of Statistical Software, 102(1), 1–45. doi:10.18637/jss.v102.i03
Heinz, A. (2021), Simulating Correlated Likert-Scale Data In R: 3 Simple Steps (blog post) https://glaswasser.github.io/simulating-correlated-likert-scale-data/
Lalovic M (2024). latent2likert: Converting Latent Variables into Likert Scale Responses. R package version 1.2.2, https://latent2likert.lalovic.io/.
Matta, T.H., Rutkowski, L., Rutkowski, D. & Liaw, Y.L. (2018), lsasim: an R package for simulating large-scale assessment data. Large-scale Assessments in Education 6, 15. doi:10.1186/s40536-018-0068-8
Pornprasertmanit, S., Miller, P., & Schoemann, A. (2021). simsem: R package for simulated structural equation modeling https://simsem.org/
Revelle, W. (in prep) An introduction to psychometric theory with applications in R. Springer. (working draft available at https://personality-project.org/r/book/ )
Touloumis, A. (2016), Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package, The R Journal 8:2, 79-91. https://doi.org/10.32614/RJ-2016-034
Winzar, H. (2025). LikertMakeR (V 1.4.0): Synthesise and correlate Likert scale and related rating-scale data with predefined first and second moments. CRAN: https://CRAN.R-project.org/package=LikertMakeR