Title: | Miscellaneous Utilities and Functions |
---|---|
Description: | Miscellaneous tools and functions, including: generate descriptive statistics tables, format output, visualize relations among variables or check distributions, and generic functions for residual and model diagnostics. |
Authors: | Joshua F. Wiley [aut, cre] |
Maintainer: | Joshua F. Wiley <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.4.2 |
Built: | 2025-01-24 04:27:29 UTC |
Source: | https://github.com/jwiley/jwileymisc |
Function to round and format a number
.fround(x, digits)
.fround(x, digits)
x |
the data to round and format |
digits |
the number of digits to used |
a character vector
Function calculates smoothing spline quantiles or linear quantiles as a fall back. Not intended for general use. Expected predicted and residual data. Exported to support related packages.
.quantilePercentiles(data, LL = 0.1, UL = 0.9, na.rm = TRUE, cut = 4L)
.quantilePercentiles(data, LL = 0.1, UL = 0.9, na.rm = TRUE, cut = 4L)
data |
A dataset of predicted and residual values. Assumed from some sort of (probably parametric) model. |
LL |
The lower limit for prediction. Defaults to
|
UL |
The upper limit for prediction. Defaults to
|
na.rm |
A logical whether to remove missing values.
Defaults to |
cut |
An integer, how many unique predicted values there have to be at least for it to use quantile regression or treat the predicted values as discrete. Defaults to 4. |
A data.table with the scores and predicted LL and UL, possibly missing if quantile regression models do not converge.
A data frame drawn from a daily diary study, conducted at Monash University in 2017 where young adults old completed measures up to three times per day (morning, afternoon, and evening) for about 12 days. Thus each participant contributed about 36 observations to the dataset. To protect participant confidentiality and anonymity, the data used here were simulated from the original data, but in such a way as to preserve the relations among variables and most features of the raw data.
aces_daily
aces_daily
A data frame containing 19 variables.
A unique identifier for each individual
The date each observation occured on
The survey coded as an integer (1 = morning, 2 = afternoon, 3 = evening)
Survey start time, centered at time since 11am
A 0 or 1 variable, where 1 = female and 0 = male
Participant age in years, top coded at 25
A 0 or 1 variable where 1 = born in Australia and 0 = born outside of Australia
Participants subjective SES, bottom coded at 4 and top coded at 8
Participants level of education (1 = university graduate or higher, 0 = less than university graduate
Self-reported sleep onset latency in minutes, morning survey only
Self-reported number of wakenings after sleep onset, top coded at 4, morning survey only
Overall stress ratings on a 0–10 scale, repeated 3x daily
Overall social support ratings on a 0–10 scale, repeated 3x daily
Positive affect ratings on a 1–5 scale, repeated 3x daily
Negative affect ratings on a 1–5 scale, repeated 3x daily
Problem focused coping on a 1–4 scale, repeated 1x daily at the evening survey
Emotional processing coping on a 1–4 scale, repeated 1x daily at the evening survey
Emotional exprsesion coping on a 1–4 scale, repeated 1x daily at the evening survey
Mental disengagement coping on a 1–4 scale, repeated 1x daily at the evening survey
A generic function for pretty printing in (semi) APA Style
APAStyler(object, ...)
APAStyler(object, ...)
object |
An object with a class matching one of the methods |
... |
Additional argiuments passed on to methods. |
This assumes that all the objects in a list have the same class
and that an APAStyler
method exists for that class.
## S3 method for class 'list' APAStyler(object, ...)
## S3 method for class 'list' APAStyler(object, ...)
object |
A list in this case, where each element is another known class. |
... |
Additional arguments. |
Styled results.
## Not run: m1 <- lm(mpg ~ qsec * hp, data = mtcars) m2 <- lm(mpg ~ qsec + hp, data = mtcars) m3 <- lm(mpg ~ am + vs, data = mtcars) mt1 <- modelTest(m1) mt2 <- modelTest(m2) mt3 <- modelTest(m3) ## styling regression models APAStyler(list(m1, m2)) ## modelTest objects get merged APAStyler(list(mt1, mt2)) ## the models can be named by passing a named list ## including "special" characters using backticks, like spaces APAStyler(list(Full = mt1, Reduced = mt2)) APAStyler(list(Full = mt1, Reduced = mt2, `Alternate Model` = mt3)) ## you can customize the way output is presented APAStyler(list(mt1, mt2), format = list( FixedEffects = "%s, %s\n(%s, %s)", EffectSizes = "Cohen's f2 = %s (%s)")) ## clean up rm(m1, m2, m3, mt1, mt2, mt3) ## End(Not run)
## Not run: m1 <- lm(mpg ~ qsec * hp, data = mtcars) m2 <- lm(mpg ~ qsec + hp, data = mtcars) m3 <- lm(mpg ~ am + vs, data = mtcars) mt1 <- modelTest(m1) mt2 <- modelTest(m2) mt3 <- modelTest(m3) ## styling regression models APAStyler(list(m1, m2)) ## modelTest objects get merged APAStyler(list(mt1, mt2)) ## the models can be named by passing a named list ## including "special" characters using backticks, like spaces APAStyler(list(Full = mt1, Reduced = mt2)) APAStyler(list(Full = mt1, Reduced = mt2, `Alternate Model` = mt3)) ## you can customize the way output is presented APAStyler(list(mt1, mt2), format = list( FixedEffects = "%s, %s\n(%s, %s)", EffectSizes = "Cohen's f2 = %s (%s)")) ## clean up rm(m1, m2, m3, mt1, mt2, mt3) ## End(Not run)
APAStyler method for linear models
## S3 method for class 'lm' APAStyler(object, digits = 2, pdigits, file, print = TRUE, ...)
## S3 method for class 'lm' APAStyler(object, digits = 2, pdigits, file, print = TRUE, ...)
object |
A |
digits |
The number of digits to round results to. Defaults to 2. |
pdigits |
The number of digits to use for p values. Defaults to digits + 1 if missing. |
file |
An optional argument indicating whether the output should be written to a file. |
print |
A logical argument, whether or not to print results to screen.
This is distinct from saving them to a file. Defaults to |
... |
Additional argiuments passed on to |
A generic function for pretty printing in (semi) APA Style
## S3 method for class 'mira' APAStyler(object, lmobject, digits = 2, pdigits, print = TRUE, file, ...)
## S3 method for class 'mira' APAStyler(object, lmobject, digits = 2, pdigits, print = TRUE, file, ...)
object |
|
lmobject |
an lm object the degrees of freedom of which can be used for conservative F tests |
digits |
The number of digits to round results to. Defaults to 2. |
pdigits |
The number of digits to use for p values. Defaults to digits + 1 if missing. |
print |
A logical argument, whether or not to print results to screen.
This is distinct from saving them to a file. Defaults to |
file |
An optional argument indicating whether the output should be written to a file. |
... |
Additional argiuments passed on to |
APAStyler method for model tests from a linear model
## S3 method for class 'modelTest.lm' APAStyler( object, format = list(FixedEffects = c("%s%s [%s, %s]"), EffectSizes = c("f2 = %s, %s")), digits = 2, pcontrol = list(digits = 3, stars = TRUE, includeP = FALSE, includeSign = FALSE, dropLeadingZero = TRUE), ... )
## S3 method for class 'modelTest.lm' APAStyler( object, format = list(FixedEffects = c("%s%s [%s, %s]"), EffectSizes = c("f2 = %s, %s")), digits = 2, pcontrol = list(digits = 3, stars = TRUE, includeP = FALSE, includeSign = FALSE, dropLeadingZero = TRUE), ... )
object |
A |
format |
A list giving the formatting style to be used for the fixed effecvts and effect sizes. |
digits |
A numeric value indicating the number of digits to print. This is still in early implementation stages and currently does not change all parts of the output (which default to 2 decimals per APA style). |
pcontrol |
A list controlling how p values are formatted. |
... |
Additional arguments. |
Styled results.
m1 <- lm(mpg ~ qsec * hp, data = mtcars) APAStyler(modelTest(m1)) APAStyler(modelTest(m1), format = list( FixedEffects = "%s, %s\n(%s, %s)", EffectSizes = "Cohen's f2 = %s (%s)"), pcontrol = list(digits = 4, stars = FALSE, includeP = TRUE, includeSign = TRUE, dropLeadingZero = TRUE)) ## clean up rm(m1)
m1 <- lm(mpg ~ qsec * hp, data = mtcars) APAStyler(modelTest(m1)) APAStyler(modelTest(m1), format = list( FixedEffects = "%s, %s\n(%s, %s)", EffectSizes = "Cohen's f2 = %s (%s)"), pcontrol = list(digits = 4, stars = FALSE, includeP = TRUE, includeSign = TRUE, dropLeadingZero = TRUE)) ## clean up rm(m1)
APAStyler method for model tests from a vglm multinomial model
## S3 method for class 'modelTest.vglm' APAStyler( object, format = list(FixedEffects = c("%s%s [%s, %s]"), EffectSizes = c("Chi-square (df=%s) = %s, %s")), digits = 2, pcontrol = list(digits = 3, stars = TRUE, includeP = FALSE, includeSign = FALSE, dropLeadingZero = TRUE), OR = TRUE, ... )
## S3 method for class 'modelTest.vglm' APAStyler( object, format = list(FixedEffects = c("%s%s [%s, %s]"), EffectSizes = c("Chi-square (df=%s) = %s, %s")), digits = 2, pcontrol = list(digits = 3, stars = TRUE, includeP = FALSE, includeSign = FALSE, dropLeadingZero = TRUE), OR = TRUE, ... )
object |
A |
format |
A list giving the formatting style to be used for the fixed effects and effect sizes. |
digits |
A numeric value indicating the number of digits to print. This is still in early implementation stages and currently does not change all parts of the output (which default to 2 decimals per APA style). |
pcontrol |
A list controlling how p values are formatted. |
OR |
a logical value whether to report odds ratios and
95 percent confidence intervals, if |
... |
Additional arguments. |
Styled results.
mtcars$cyl <- factor(mtcars$cyl) m <- VGAM::vglm(cyl ~ qsec, family = VGAM::multinomial(), data = mtcars) mt <- modelTest(m) APAStyler(mt) APAStyler(mt, OR = FALSE) ## clean up rm(m, mt, mtcars) ## Not run: mtcars$cyl <- factor(mtcars$cyl) mtcars$am <- factor(mtcars$am) m <- VGAM::vglm(cyl ~ qsec, family = VGAM::multinomial(), data = mtcars) APAStyler(modelTest(m)) m <- VGAM::vglm(cyl ~ scale(qsec), family = VGAM::multinomial(), data = mtcars) APAStyler(modelTest(m)) m2 <- VGAM::vglm(cyl ~ factor(vs) * scale(qsec), family = VGAM::multinomial(), data = mtcars) APAStyler(modelTest(m2)) m <- VGAM::vglm(Species ~ Sepal.Length, family = VGAM::multinomial(), data = iris) APAStyler(modelTest(m)) set.seed(1234) sampdata <- data.frame( Outcome = factor(sample(letters[1:3], 20 * 9, TRUE)), C1 = rnorm(20 * 9), D3 = sample(paste0("L", 1:3), 20 * 9, TRUE)) m <- VGAM::vglm(Outcome ~ factor(D3), family = VGAM::multinomial(), data = sampdata) APAStyler(modelTest(m)) m <- VGAM::vglm(Outcome ~ factor(D3) + C1, family = VGAM::multinomial(), data = sampdata) APAStyler(modelTest(m)) ## End(Not run)
mtcars$cyl <- factor(mtcars$cyl) m <- VGAM::vglm(cyl ~ qsec, family = VGAM::multinomial(), data = mtcars) mt <- modelTest(m) APAStyler(mt) APAStyler(mt, OR = FALSE) ## clean up rm(m, mt, mtcars) ## Not run: mtcars$cyl <- factor(mtcars$cyl) mtcars$am <- factor(mtcars$am) m <- VGAM::vglm(cyl ~ qsec, family = VGAM::multinomial(), data = mtcars) APAStyler(modelTest(m)) m <- VGAM::vglm(cyl ~ scale(qsec), family = VGAM::multinomial(), data = mtcars) APAStyler(modelTest(m)) m2 <- VGAM::vglm(cyl ~ factor(vs) * scale(qsec), family = VGAM::multinomial(), data = mtcars) APAStyler(modelTest(m2)) m <- VGAM::vglm(Species ~ Sepal.Length, family = VGAM::multinomial(), data = iris) APAStyler(modelTest(m)) set.seed(1234) sampdata <- data.frame( Outcome = factor(sample(letters[1:3], 20 * 9, TRUE)), C1 = rnorm(20 * 9), D3 = sample(paste0("L", 1:3), 20 * 9, TRUE)) m <- VGAM::vglm(Outcome ~ factor(D3), family = VGAM::multinomial(), data = sampdata) APAStyler(modelTest(m)) m <- VGAM::vglm(Outcome ~ factor(D3) + C1, family = VGAM::multinomial(), data = sampdata) APAStyler(modelTest(m)) ## End(Not run)
A generic function for pretty printing in (semi) APA Style
## S3 method for class 'SEMSummary' APAStyler( object, digits = 2, type = c("cov", "cor", "both"), stars = FALSE, file = ifelse(.Platform$OS.type == "windows", "clipboard", FALSE), sep = "\t", print = TRUE, ... )
## S3 method for class 'SEMSummary' APAStyler( object, digits = 2, type = c("cov", "cor", "both"), stars = FALSE, file = ifelse(.Platform$OS.type == "windows", "clipboard", FALSE), sep = "\t", print = TRUE, ... )
object |
|
digits |
The number of digits to round results to. Defaults to 2. |
type |
A character vector giving what to print. Defaults to ‘cov’, the covariances. Other options are ‘cor’ and ‘both’. |
stars |
A logical value whether to include significance values as stars (*** p < .001, ** p < .01, * p < .05). |
file |
An optional argument indicating whether the output should be written to a file. |
sep |
Character what the separator for the table should be. Defaults to tabs. |
print |
A logical argument, whether or not to print results to screen.
This is distinct from saving them to a file. Defaults to |
... |
Additional argiuments passed on to |
m <- SEMSummary(~., data = mtcars) APAStyler(m, type = "cor", stars = FALSE, file = FALSE) APAStyler(m, type = "cov", stars = FALSE, file = FALSE) APAStyler(m, type = "both", stars = FALSE, file = FALSE) APAStyler(m, type = "cor", stars = TRUE, file = FALSE) APAStyler(m, type = "cov", stars = TRUE, file = FALSE) APAStyler(m, type = "both", stars = TRUE, file = FALSE)
m <- SEMSummary(~., data = mtcars) APAStyler(m, type = "cor", stars = FALSE, file = FALSE) APAStyler(m, type = "cov", stars = FALSE, file = FALSE) APAStyler(m, type = "both", stars = FALSE, file = FALSE) APAStyler(m, type = "cor", stars = TRUE, file = FALSE) APAStyler(m, type = "cov", stars = TRUE, file = FALSE) APAStyler(m, type = "both", stars = TRUE, file = FALSE)
Given a vector, convert it to missing (NA) values, where the class of the missing matches the input class. Currently supports character, logical, integer, factor, numeric, times (from chron), Date, POSIXct, POSIXlt, and zoo (from zoo).
as.na(x)
as.na(x)
x |
A vector to convert to missing (NA) |
a vector the same length as the input with missing values of the same class
str(as.na(1L:5L)) str(as.na(rnorm(5))) str(as.na(c(TRUE, FALSE))) str(as.na(as.Date("2017-01-01")))
str(as.na(1L:5L)) str(as.na(rnorm(5))) str(as.na(c(TRUE, FALSE))) str(as.na(as.Date("2017-01-01")))
The function takes a path and changes the current working directory to the path. If the directory specified in the path does not currently exist, it will be created.
cd(base, pre, num)
cd(base, pre, num)
base |
a character string with the base path to the directory. This is required. |
pre |
an optional character string with the prefix to add to the base path. Non character strings will be coerced to character class. |
num |
an optional character string, prefixed by |
The function has been designed to be platform independent,
although it has had limited testing. Path creation is done using
file.path
, the existence of the directory is checked using
file.exists
and the directory created with dir.create
.
Only the first argument, is required. The other optional arguments
are handy when one wants to create many similar directories with a common base.
NULL, changes the current working directory
## Not run: # an example just using the base cd("~/testdir") # an example using the optional arguments base <- "~/testdir" pre <- "test_" cd(base, pre, 1) cd(base, pre, 2) ## End(Not run)
## Not run: # an example just using the base cd("~/testdir") # an example using the optional arguments base <- "~/testdir" pre <- "test_" cd(base, pre, 1) cd(base, pre, 2) ## End(Not run)
Utility to estimate the unadjusted, covariate adjusted, and multivariate adjusted unique contributions of one or more IVs on one or more DVs
compareIVs( dv, type, iv, covariates = character(), data, multivariate = FALSE, ... )
compareIVs( dv, type, iv, covariates = character(), data, multivariate = FALSE, ... )
dv |
A character string or vector of the depentent variable(s) |
type |
A character string or vector indicating the type of dependent variable(s) |
iv |
A character string or vector giving the IV(s) |
covariates |
A character string or vector giving the covariate(s) |
data |
The data to be used for analysis |
multivariate |
A logical value whether to have models with all IVs simultaneously. |
... |
Additional arguments passed on to the internal function,
|
A list with all the model results.
test1 <- compareIVs( dv = c("mpg", "disp"), type = c("normal", "normal"), iv = c("hp", "qsec"), covariates = "am", data = mtcars, multivariate = TRUE) test1$OverallSummary rm(test1)
test1 <- compareIVs( dv = c("mpg", "disp"), type = c("normal", "normal"), iv = c("hp", "qsec"), covariates = "am", data = mtcars, multivariate = TRUE) test1$OverallSummary rm(test1)
Save and read RDS functions for using multithreaded “ZSTD” or “LZ4” compression
saveRDSfst(object, filename = "", compression = 100, algorithm = "ZSTD") readRDSfst(filename)
saveRDSfst(object, filename = "", compression = 100, algorithm = "ZSTD") readRDSfst(filename)
object |
An R object to be saved. |
filename |
A character string giving the filename of the object on disk (to save to or read from) |
compression |
A numeric value between 0 and 100 indicating the amount of compression. Defaults to 100, the highest level of compression. 0 gives the lowest compression. |
algorithm |
A character string of the type of compression to use. Defaults to “ZSTD” which is better compression but slower. The only other option is “LZ4” which is faster but may provide less compression. |
By default, saveRDS()
does not have multithreaded compression
built in. These functions use “ZSTD” or “LZ4” compression
via the fst
package for multithreaded compression and decompression
with good performance.
To save them, objects are serialized, compressed, and then saved using saveRDS()
.
To read them, objects are read in using readRDS()
, decompressed, and then unserialized.
Hashing is used to verify the results. saveRDS()
is performed using version = 3
,
so it will not work on older versions of R
.
saveRDSfst()
is called for its side effect of saving a file to disk.
The original R
object if using readRDSfst()
.
saveRDSfst(mtcars, filename = file.path(tempdir(), "mtcars.RDS")) saveRDSfst(mtcars, filename = file.path(tempdir(), "mtcars.RDS")) readRDSfst(file.path(tempdir(), "mtcars.RDS"))
saveRDSfst(mtcars, filename = file.path(tempdir(), "mtcars.RDS")) saveRDSfst(mtcars, filename = file.path(tempdir(), "mtcars.RDS")) readRDSfst(file.path(tempdir(), "mtcars.RDS"))
This is a simple function designed to convert a correlation matrix
(standardized covariance matrix) back to a covariance matrix.
It is the opposite of cov2cor
.
cor2cov(V, sigma)
cor2cov(V, sigma)
V |
an n x n correlation matrix. Should be numeric, square, and symmetric. |
sigma |
an n length vector of the standard deviations. The length of the vector must match the number of columns in the correlation matrix. |
an n x n covariance matrix
# using a built in dataset cor2cov(cor(longley), sapply(longley, sd)) # should match the above covariance matarix cov(longley) all.equal(cov(longley), cor2cov(cor(longley), sapply(longley, sd)))
# using a built in dataset cor2cov(cor(longley), sapply(longley, sd)) # should match the above covariance matarix cov(longley) all.equal(cov(longley), cor2cov(cor(longley), sapply(longley, sd)))
Given a square, symmetric matrix (such as a correlation matrix) this function tries to drop the fewest possible number of variables to return a (square, symmetric) matrix with no missing cells.
corOK(x, maxiter = 100)
corOK(x, maxiter = 100)
x |
a square, symmetric matrix or object coercable to such (such as a data frame). |
maxiter |
a number indicating the maximum number of iterations, currently as a sanity check. See details. |
The assumption that x is square and symmetric comes because it is
assumed that the number of missing cells for a given column are identical
to that of the corresponding row. corOK
finds the column with the
most missing values, and drops that (and its corresponding row), and continues
on in like manner until the matrix has no missing values. Although this was
intended for a correlation matrix, it could be used on other types of matrices.
Note that because corOK
uses an iterative method, it can be slow when many
columns/rows need to be removed. For the intended use (correlation matrices) there
probably should not be many missing. As a sanity check and to prevent tediously long
computations, the maximum number of iterations can be set.
A list with two elements
x |
The complete non missing matrix. |
keep.indices |
A vector of the columns and rows from the original matrix to be kept (i.e., that are nonmissing). |
cormat <- cor(iris[, -5]) # set missing cormat[cbind(c(1,2), c(2,1))] <- NA # print cormat # return complete corOK(cormat) # using maximum iterations corOK(cormat, maxiter=0) # clean up rm(cormat)
cormat <- cor(iris[, -5]) # set missing cormat[cbind(c(1,2), c(2,1))] <- NA # print cormat # return complete corOK(cormat) # using maximum iterations corOK(cormat, maxiter=0) # clean up rm(cormat)
This function creates a heatmap of a correlation matrix using ggplot2.
corplot( x, coverage, pvalues, type = c("both", "cor", "p", "coverage"), digits = 2, order = c("cluster", "asis"), ..., control.grobs = list() )
corplot( x, coverage, pvalues, type = c("both", "cor", "p", "coverage"), digits = 2, order = c("cluster", "asis"), ..., control.grobs = list() )
x |
A correlation matrix or some other square symmetric matrix. |
coverage |
An (optional) matrix with the same dimensions as
|
pvalues |
An (optional) matrix with the same dimensions as
|
type |
A character string indicating what to show on top of the heatmap. Can be
‘coverage’, in which case bubble points show coverage;
‘p’, in which case p values are shown;
‘cor’, in which case correlations are shown; or
‘both’, in which case both correlations and p-values are shown.
Only has an effect if a coverage (or pvalue) matrix is passed
also. Defaults to |
digits |
The number of digits to round to when printing the
correlations on the heatmap. Text is suppressed when a coverage
matrix is passed and |
order |
A character string indicating how to order the resulting plot. Defaults to ‘cluster’ which uses hierarchical clustering to sensibly order the variables. The other option is ‘asis’ in which case the matrix is plotted in the order it is passed. |
... |
Additional arguments currently only passed to
|
control.grobs |
A list of additional |
The actual plot is created using ggplot2
and geom_tile
.
In addition to creating the plot, the variables are ordered based on a
hierarchical clustering of the correlation matrix. Specifically, 1 - x
is used as the distance matrix. If coverage is passed, will also add a bubble
plot with the area proportional to the proportion of data present for any
given cell. Defaults for ggplot2
are set, but it is possible to use a
named list of quote()d ggplot calls to override all defaults. This is not
expected for typical use. Particularly main, points, and text as these rely
on internal variable names; however, labels, the gradient color, and area
scaling can be adjusted more safely.
Primarily called for the side effect of creating a plot.
However, the ggplot2
plot object is returned,
so it can be saved, replotted, edited, etc.
# example plotting the correlation matrix from the # mtcars dataset corplot(cor(mtcars)) dat <- as.matrix(iris[, 1:4]) # randomly set 25% of the data to missing set.seed(10) dat[sample(length(dat), length(dat) * .25)] <- NA cor(dat, use = "pair") cor(dat, use = "complete") # create a summary of the data (including coverage matrix) sdat <- SEMSummary(~ ., data = dat, use = "pair") str(sdat) # using the plot method for SEMSummary (which basically just calls corplot) ## getting correlations above diagonal and p values below diagonal#' plot(sdat) ## get correlations only plot(sdat, type = "cor") ## showing coverage plot(sdat, type = "coverage") # use the control.grobs argument to adjust the coverage scaling # to go from 0 to 1 rather than the range of coverage corplot(x = sdat$sSigma, coverage = sdat$coverage, type = "coverage", control.grobs = list(area = quote(scale_size_area(limits = c(0, 1)))) ) # also works with plot() on a SEMSummary plot(x = sdat, type = "coverage", control.grobs = list(area = quote(scale_size_area(limits = c(0, 1)))) ) rm(dat, sdat)
# example plotting the correlation matrix from the # mtcars dataset corplot(cor(mtcars)) dat <- as.matrix(iris[, 1:4]) # randomly set 25% of the data to missing set.seed(10) dat[sample(length(dat), length(dat) * .25)] <- NA cor(dat, use = "pair") cor(dat, use = "complete") # create a summary of the data (including coverage matrix) sdat <- SEMSummary(~ ., data = dat, use = "pair") str(sdat) # using the plot method for SEMSummary (which basically just calls corplot) ## getting correlations above diagonal and p values below diagonal#' plot(sdat) ## get correlations only plot(sdat, type = "cor") ## showing coverage plot(sdat, type = "coverage") # use the control.grobs argument to adjust the coverage scaling # to go from 0 to 1 rather than the range of coverage corplot(x = sdat$sSigma, coverage = sdat$coverage, type = "coverage", control.grobs = list(area = quote(scale_size_area(limits = c(0, 1)))) ) # also works with plot() on a SEMSummary plot(x = sdat, type = "coverage", control.grobs = list(area = quote(scale_size_area(limits = c(0, 1)))) ) rm(dat, sdat)
Simple function to calculate effect sizes for frequency tables.
cramerV(x)
cramerV(x)
x |
A frequency table, such as from |
A numeric value with Phi for 2 x 2 tables or Cramer's V for tables larger than 2 x 2.
cramerV(xtabs(~ am + vs, data = mtcars)) cramerV(xtabs(~ cyl + vs, data = mtcars)) cramerV(xtabs(~ cyl + am, data = mtcars))
cramerV(xtabs(~ am + vs, data = mtcars)) cramerV(xtabs(~ cyl + vs, data = mtcars)) cramerV(xtabs(~ cyl + am, data = mtcars))
Calculate the Circular Difference
diffCircular(x, y, max)
diffCircular(x, y, max)
x |
Numeric or integer values |
y |
Numeric or integer values |
max |
the theoretical maximum (e.g., if degrees, 360; if hours, 24; etc.). |
A value with the circular difference. This will always be positive if defined.
diffCircular(330, 30, max = 360) diffCircular(22, 1, max = 24) diffCircular(c(22, 23, 21, 22), c(1, 1, 23, 14), max = 24)
diffCircular(330, 30, max = 360) diffCircular(22, 1, max = 24) diffCircular(c(22, 23, 21, 22), c(1, 1, 23, 14), max = 24)
Give a dataset and a list of variables, or just the data in the vars. For best results, convert categorical variables into factors. Provides a table of estimated descriptive statistics optionally by group levels.
egltable( vars, g, data, idvar, strict = TRUE, parametric = TRUE, paired = FALSE, simChisq = FALSE, sims = 1000000L )
egltable( vars, g, data, idvar, strict = TRUE, parametric = TRUE, paired = FALSE, simChisq = FALSE, sims = 1000000L )
vars |
Either an index (numeric or character) of
variables to access from the |
g |
A variable used tou group/separate the data prior to calculating descriptive statistics. |
data |
optional argument of the dataset containing the variables to be described. |
idvar |
A character string indicating the variable name
of the ID variable. Not currently used, but will eventually
support |
strict |
Logical, whether to strictly follow the type of each variable, or to assume categorical if the number of unique values is less than or equal to 3. |
parametric |
Logical whether to use parametric tests in the
case of multiple groups to test for differences. Only applies to
continuous variables. If |
paired |
Logical whether the data are paired or not. Defaults to
|
simChisq |
Logical whether to estimate p-values for chi-square test
for categorical data when there are multiple groups, by simulation.
Defaults to |
sims |
Integer for the number of simulations to be used to estimate
p-values for the chi-square tests for categorical variables when
there are multiple groups. Defaults to one million ( |
A data frame of the table.
egltable(iris) egltable(colnames(iris)[1:4], "Species", data = iris) egltable(iris, parametric = FALSE) egltable(colnames(iris)[1:4], "Species", iris, parametric = FALSE) egltable(colnames(iris)[1:4], "Species", iris, parametric = c(TRUE, TRUE, FALSE, FALSE)) egltable(colnames(iris)[1:4], "Species", iris, parametric = c(TRUE, TRUE, FALSE, FALSE), simChisq=TRUE) diris <- data.table::as.data.table(iris) egltable("Sepal.Length", g = "Species", data = diris) tmp <- mtcars tmp$cyl <- factor(tmp$cyl) tmp$am <- factor(tmp$am, levels = 0:1) egltable(c("mpg", "hp"), "vs", tmp) egltable(c("mpg", "hp"), "am", tmp) egltable(c("am", "cyl"), "vs", tmp) tests <- with(sleep, wilcox.test(extra[group == 1], extra[group == 2], paired = TRUE)) str(tests) ## example with paired data egltable(c("extra"), g = "group", data = sleep, idvar = "ID", paired = TRUE) ## what happens when ignoring pairing (p-value off) # egltable(c("extra"), g = "group", data = sleep, idvar = "ID") ## paired categorical data example ## using data on chick weights to create categorical data tmp <- subset(ChickWeight, Time %in% c(0, 20)) tmp$WeightTertile <- cut(tmp$weight, breaks = quantile(tmp$weight, c(0, 1/3, 2/3, 1), na.rm = TRUE), include.lowest = TRUE) egltable(c("weight", "WeightTertile"), g = "Time", data = tmp, idvar = "Chick", paired = TRUE) rm(tmp)
egltable(iris) egltable(colnames(iris)[1:4], "Species", data = iris) egltable(iris, parametric = FALSE) egltable(colnames(iris)[1:4], "Species", iris, parametric = FALSE) egltable(colnames(iris)[1:4], "Species", iris, parametric = c(TRUE, TRUE, FALSE, FALSE)) egltable(colnames(iris)[1:4], "Species", iris, parametric = c(TRUE, TRUE, FALSE, FALSE), simChisq=TRUE) diris <- data.table::as.data.table(iris) egltable("Sepal.Length", g = "Species", data = diris) tmp <- mtcars tmp$cyl <- factor(tmp$cyl) tmp$am <- factor(tmp$am, levels = 0:1) egltable(c("mpg", "hp"), "vs", tmp) egltable(c("mpg", "hp"), "am", tmp) egltable(c("am", "cyl"), "vs", tmp) tests <- with(sleep, wilcox.test(extra[group == 1], extra[group == 2], paired = TRUE)) str(tests) ## example with paired data egltable(c("extra"), g = "group", data = sleep, idvar = "ID", paired = TRUE) ## what happens when ignoring pairing (p-value off) # egltable(c("extra"), g = "group", data = sleep, idvar = "ID") ## paired categorical data example ## using data on chick weights to create categorical data tmp <- subset(ChickWeight, Time %in% c(0, 20)) tmp$WeightTertile <- cut(tmp$weight, breaks = quantile(tmp$weight, c(0, 1/3, 2/3, 1), na.rm = TRUE), include.lowest = TRUE) egltable(c("weight", "WeightTertile"), g = "Time", data = tmp, idvar = "Chick", paired = TRUE) rm(tmp)
This function takes a vector of statistics and calculates the empirical p-value, that is, how many fall on the other side of zero. It calculates a two-tailed p-value.
empirical_pvalue(x, na.rm = TRUE)
empirical_pvalue(x, na.rm = TRUE)
x |
a data vector to operate on |
na.rm |
Logical whether to remove NA values. Defaults to |
a named vector with the number of values falling at or below zero, above zero, and the empirical p-value.
Joshua F. Wiley <[email protected]>
empirical_pvalue(rnorm(100))
empirical_pvalue(rnorm(100))
This function uses the contrast
function from rms to
find the threshold for significance from interactions.
findSigRegions( object, l1, l2, name.vary, lower, upper, alpha = 0.05, starts = 50 )
findSigRegions( object, l1, l2, name.vary, lower, upper, alpha = 0.05, starts = 50 )
object |
A fitted rms object |
l1 |
the first set of values to fix for the contrast function |
l2 |
the second set of values to fix for the contrast function |
name.vary |
the name of the model parameter to vary values for
to find the threshold. Note that this should not be included in
|
lower |
The lower bound to search for values for the varying value |
upper |
The upper bound to search for values for the varying value |
alpha |
The significance threshold, defaults to |
starts |
Number of starting values to try between the lower and upper bounds. |
A data table with notes if no convergence or significance thresholds (if any).
## make me
## make me
Function to format the reuslts of a hypothesis test as text
formatHtest( x, type = c("t", "F", "chisq", "kw", "mh", "r_pearson", "r_kendall", "r_spearman"), ... )
formatHtest( x, type = c("t", "F", "chisq", "kw", "mh", "r_pearson", "r_kendall", "r_spearman"), ... )
x |
A |
type |
The type of htest. Currently one of: “t”, “F”, “chisq”, “kw”, “mh”, “r_pearson”, “r_kendall”, or “r_spearman” for t-tests, F-tests, chi-square tests, kruskal-wallis tests, Mantel-Haenszel tests, pearson correlations, kendall tau correlation, and spearman rho correlation, respectively. |
... |
Arguments passed on to p-value formatting |
A character string with results
formatHtest(t.test(extra ~ group, data = sleep), type = "t") formatHtest(anova(aov(mpg ~ factor(cyl), data = mtcars)), type = "F") formatHtest(chisq.test(c(A = 20, B = 15, C = 25)), type = "chisq") formatHtest(kruskal.test(Ozone ~ Month, data = airquality), type = "kw") formatHtest(mantelhaen.test(UCBAdmissions), type = "mh") formatHtest(cor.test(~ mpg + hp, data = mtcars, method = "pearson"), type = "r_pearson") formatHtest(cor.test(~ mpg + hp, data = mtcars, method = "kendall"), type = "r_kendall") formatHtest(cor.test(~ mpg + hp, data = mtcars, method = "spearman"), type = "r_spearman")
formatHtest(t.test(extra ~ group, data = sleep), type = "t") formatHtest(anova(aov(mpg ~ factor(cyl), data = mtcars)), type = "F") formatHtest(chisq.test(c(A = 20, B = 15, C = 25)), type = "chisq") formatHtest(kruskal.test(Ozone ~ Month, data = airquality), type = "kw") formatHtest(mantelhaen.test(UCBAdmissions), type = "mh") formatHtest(cor.test(~ mpg + hp, data = mtcars, method = "pearson"), type = "r_pearson") formatHtest(cor.test(~ mpg + hp, data = mtcars, method = "kendall"), type = "r_kendall") formatHtest(cor.test(~ mpg + hp, data = mtcars, method = "spearman"), type = "r_spearman")
Function to format the median and IQR of a variable
formatMedIQR(x, d = 2, na.rm = TRUE)
formatMedIQR(x, d = 2, na.rm = TRUE)
x |
the data to have the median and IQR calculated |
d |
How many digits to display. Defaults to 2. |
na.rm |
Logical whether to remove missing values. Defaults to |
A character string with results
formatMedIQR(mtcars$mpg)
formatMedIQR(mtcars$mpg)
Function to simplify formatting p-values for easy viewing / publication
formatPval( x, d = 3, sd, includeP = FALSE, includeSign = FALSE, dropLeadingZero = TRUE )
formatPval( x, d = 3, sd, includeP = FALSE, includeSign = FALSE, dropLeadingZero = TRUE )
x |
p values to convert |
d |
number of digits |
sd |
number of scientific digits. Defaults to |
includeP |
logical value whether to include the character “p” itself.
Defaults to |
includeSign |
logical value whether to include the character “=” or “<”.
Defaults to |
dropLeadingZero |
logical value whether to drop leading zeros for p-values.
Defaults to |
A character string with stars
formatPval(c(.00052456, .000000124, .01035, .030489, .534946)) formatPval(c(.00052456, .000000124, .01035, .030489, .534946), 3, 3, FALSE, TRUE) formatPval(c(.00052456, .000000124, .01035, .030489, .534946), 3, 3, TRUE, TRUE) formatPval(c(.00052456, .000000124, .01035, .030489, .534946), 5) formatPval(c(1, .15346, .085463, .05673, .04837, .015353462, .0089, .00164, .0006589, .0000000053326), 3, 5) formatPval(c(1, .15346, .085463, .05673, .04837, .015353462, .0089, .00164, .0006589, .0000000053326), 3, 5, dropLeadingZero = FALSE)
formatPval(c(.00052456, .000000124, .01035, .030489, .534946)) formatPval(c(.00052456, .000000124, .01035, .030489, .534946), 3, 3, FALSE, TRUE) formatPval(c(.00052456, .000000124, .01035, .030489, .534946), 3, 3, TRUE, TRUE) formatPval(c(.00052456, .000000124, .01035, .030489, .534946), 5) formatPval(c(1, .15346, .085463, .05673, .04837, .015353462, .0089, .00164, .0006589, .0000000053326), 3, 5) formatPval(c(1, .15346, .085463, .05673, .04837, .015353462, .0089, .00164, .0006589, .0000000053326), 3, 5, dropLeadingZero = FALSE)
Make axis lines informative by making them show the observed range of the data. Inspired from the excellent ggthemes package: https://github.com/jrnold/ggthemes
geom_tufterange( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., show.legend = NA, inherit.aes = TRUE )
geom_tufterange( mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., show.legend = NA, inherit.aes = TRUE )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
The statistical transformation to use on the data for this layer.
When using a
|
position |
A position adjustment to use on the data for this layer. This
can be used in various ways, including to prevent overplotting and
improving the display. The
|
... |
Other arguments passed on to
|
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
Creates a plot for likert scale
gglikert( x, y, leftLab, rightLab, colour, data, xlim, title, shape = 18, size = 7 )
gglikert( x, y, leftLab, rightLab, colour, data, xlim, title, shape = 18, size = 7 )
x |
Variable to plot on the x axis (the likert scale responses or averages) |
y |
The variable containing an index of the different items, should be integers |
leftLab |
The variable with anchors for the low end of the Likert scale |
rightLab |
The variable with anchors for the high end of the Likert scale |
colour |
A character string giving the name of a variable for colouring the data, like a grouping variable. Alternately the colour of points passed to |
data |
The data to use for plotting |
xlim |
A vector giving the lower an upper limit for the x axis. This should be the possible range of the Likert scale, not the actual range. |
title |
A character vector giving the title for the plot |
shape |
A number indicating the point shape, passed to |
size |
A number indicating the size of points, passed to |
library(JWileymisc) library(ggplot2) library(ggpubr) testdat <- data.table::data.table( Var = 1:4, Mean = c(1.5, 3, 2.2, 4.6), Low = c("Happy", "Peaceful", "Excited", "Content"), High = c("Sad", "Angry", "Hopeless", "Anxious")) gglikert("Mean", "Var", "Low", "High", data = testdat, xlim = c(1, 5), title = "Example Plot of Average Affect Ratings") testdat <- rbind( cbind(testdat, Group = "Young"), cbind(testdat, Group = "Old")) testdat$Mean[5:8] <- c(1.7, 2.6, 2.0, 4.4) gglikert("Mean", "Var", "Low", "High", colour = "Group", data = testdat, xlim = c(1, 5), title = "Example Plot of Average Affect Ratings") gglikert("Mean", "Var", "Low", "High", colour = "Group", data = testdat, xlim = c(1, 5), title = "Example Plot of Average Affect Ratings") + ggplot2::scale_colour_manual(values = c("Young" = "grey50", "Old" = "black")) ## clean up rm(testdat)
library(JWileymisc) library(ggplot2) library(ggpubr) testdat <- data.table::data.table( Var = 1:4, Mean = c(1.5, 3, 2.2, 4.6), Low = c("Happy", "Peaceful", "Excited", "Content"), High = c("Sad", "Angry", "Hopeless", "Anxious")) gglikert("Mean", "Var", "Low", "High", data = testdat, xlim = c(1, 5), title = "Example Plot of Average Affect Ratings") testdat <- rbind( cbind(testdat, Group = "Young"), cbind(testdat, Group = "Old")) testdat$Mean[5:8] <- c(1.7, 2.6, 2.0, 4.4) gglikert("Mean", "Var", "Low", "High", colour = "Group", data = testdat, xlim = c(1, 5), title = "Example Plot of Average Affect Ratings") gglikert("Mean", "Var", "Low", "High", colour = "Group", data = testdat, xlim = c(1, 5), title = "Example Plot of Average Affect Ratings") + ggplot2::scale_colour_manual(values = c("Young" = "grey50", "Old" = "black")) ## clean up rm(testdat)
Given a data.frame
or data.table
, create a character vector
MD5 hash of the overall dataset and each variable. The goal of this is to create
a secure vector / text file that can be tracked using version control
(e.g., GitHub) without requiring commiting sensitive datasets.
The tracking will make it possible to evaluate whether two datasets are the
same, such as when sending data or when datasets may change over time
to know which variable(s) changed, if any.
hashDataset(x, file)
hashDataset(x, file)
x |
A |
file |
An optional character string. If given, assumed to be the path/name of a file to write the character string hash out to, for convenience. When non missing, the character vector is returned invisibly and a file written. When missing (default), the character vector is returned directly. |
A (possibly invisible) character vector. Also (optionally) a text file written version of the character string.
hashDataset(mtcars) ## if a file is specified it will write the results to the text file ## nicely formatted, along these lines cat(hashDataset(cars), sep = "\n")
hashDataset(mtcars) ## if a file is specified it will write the results to the text file ## nicely formatted, along these lines cat(hashDataset(cars), sep = "\n")
This function uses the contrast
function from rms to
find the threshold for significance from interactions.
intSigRegGraph( object, predList, contrastList, xvar, varyvar, varyvar.levels, xlab = xvar, ylab = "Predicted Values", ratio = 1, xlim, ylim, xbreaks, xlabels = xbreaks, scale.x = c(m = 0, s = 1), scale.y = c(m = 0, s = 1), starts = 50 )
intSigRegGraph( object, predList, contrastList, xvar, varyvar, varyvar.levels, xlab = xvar, ylab = "Predicted Values", ratio = 1, xlim, ylim, xbreaks, xlabels = xbreaks, scale.x = c(m = 0, s = 1), scale.y = c(m = 0, s = 1), starts = 50 )
object |
A fitted rms object |
predList |
TODO |
contrastList |
TODO |
xvar |
TODO |
varyvar |
TODO |
varyvar.levels |
TODO |
xlab |
optional |
ylab |
TODO |
ratio |
TODO |
xlim |
TODO |
ylim |
TODO |
xbreaks |
TODO |
xlabels |
optional |
scale.x |
optional |
scale.y |
optional |
starts |
Number of starting values to try between the lower and upper bounds. |
A data table with notes if no convergence or significance thresholds (if any).
## make me
## make me
Given a vector, return TRUE
or FALSE
if each element is either missing (NA/NaN), non finite (e.g. infinite)
or a zero length character string (only for character vectors).
is.naz(x)
is.naz(x)
x |
A vector to identify missing / non finite or zero length strings from |
a logical vector
is.naz(c(1, NA, NaN)) is.naz(c(1, NA, NaN, Inf)) is.naz(c("test", "", NA_character_))
is.naz(c(1, NA, NaN)) is.naz(c(1, NA, NaN, Inf)) is.naz(c("test", "", NA_character_))
Given a variable, create a k lagged version, optionally do it by a grouping factor, such as an ID.
lagk(x, k = 1, by)
lagk(x, k = 1, by)
x |
the variable to lag |
k |
the length to lag it |
by |
a variable to lag by. Must be sorted. |
a vector of the lagged values
lagk(1:4, 1)
lagk(1:4, 1)
This function is a minor modification of the lm() function
to allow the use of a pre-specified design matrix. It is not intended for
public use but only to support modelTest.lm
.
lm2( formula, data, subset, weights, na.action, model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, designMatrix, yObserved, ... )
lm2( formula, data, subset, weights, na.action, model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, designMatrix, yObserved, ... )
formula |
An object of class "formula" although it is only minimally used |
data |
the dataset |
subset |
subset |
weights |
any weights |
na.action |
Defaults to |
model |
defaults to |
x |
defaults to |
y |
defaults to |
qr |
defaults to |
singular.ok |
defaults to |
contrasts |
defaults to |
offset |
missing by default |
designMatrix |
a model matrix / design matrix (all numeric, pre coded if applicable for discrete variables) |
yObserved |
the observed y values |
... |
additional arguments |
an lm class object
lm
mtcars$cyl <- factor(mtcars$cyl) m <- lm(mpg ~ hp * cyl, data = mtcars) x <- model.matrix(m) y <- mtcars$mpg m2 <- JWileymisc:::lm2(mpg ~ 1 + cyl + hp:cyl, data = mtcars, designMatrix = x[, -2, drop = FALSE], yObserved = y) anova(m, m2) rm(m, m2, x, y)
mtcars$cyl <- factor(mtcars$cyl) m <- lm(mpg ~ hp * cyl, data = mtcars) x <- model.matrix(m) y <- mtcars$mpg m2 <- JWileymisc:::lm2(mpg ~ 1 + cyl + hp:cyl, data = mtcars, designMatrix = x[, -2, drop = FALSE], yObserved = y) anova(m, m2) rm(m, m2, x, y)
Function to calculate circular mean
meanCircular(x, max, na.rm = TRUE)
meanCircular(x, max, na.rm = TRUE)
x |
Numeric or integer values |
max |
The theoretical maximum (e.g., if degrees, 360) |
na.rm |
A logical value indicating whether to remove missing values.
Defaults to |
A numeric value with the circular mean.
meanCircular(c(22:23, 1:2), max = 24) meanCircular(c(12, 24), max = 24) meanCircular(c(6, 7, 23), max = 24) meanCircular(c(6, 7, 21), max = 24) meanCircular(c(6, 21), max = 24) meanCircular(c(6, 23), max = 24) meanCircular(c(.91, .96, .05, .16), max = 1) meanCircular(c(6, 7, 8, 9), max = 24) meanCircular(1:3, max = 24) meanCircular(21:23, max = 24) meanCircular(c(16, 17, 18, 19), max = 24) meanCircular(c(355, 5, 15), max = 360)
meanCircular(c(22:23, 1:2), max = 24) meanCircular(c(12, 24), max = 24) meanCircular(c(6, 7, 23), max = 24) meanCircular(c(6, 7, 21), max = 24) meanCircular(c(6, 21), max = 24) meanCircular(c(6, 23), max = 24) meanCircular(c(.91, .96, .05, .16), max = 1) meanCircular(c(6, 7, 8, 9), max = 24) meanCircular(1:3, max = 24) meanCircular(21:23, max = 24) meanCircular(c(16, 17, 18, 19), max = 24) meanCircular(c(355, 5, 15), max = 360)
Generic function.
modelCompare(model1, model2, ...) as.modelCompare(x) is.modelCompare(x) ## S3 method for class 'lm' modelCompare(model1, model2, ...)
modelCompare(model1, model2, ...) as.modelCompare(x) is.modelCompare(x) ## S3 method for class 'lm' modelCompare(model1, model2, ...)
model1 |
A fitted model object. |
model2 |
A fitted model object to compare to |
... |
Additional arguments passed to specific methods. |
x |
An object (e.g., list or a modelCompare object) to test or attempt coercing to a modelCompare object. |
Depends on the method dispatch.
m1 <- lm(mpg ~ qsec * hp, data = mtcars) m2 <- lm(mpg ~ am, data = mtcars) modelCompare(m1, m2) ## cleanup rm(m1, m2) ## Not run: m3 <- lm(mpg ~ 1, data = mtcars) m4 <- lm(mpg ~ 0, data = mtcars) modelCompare(m3, m4) ## cleanup rm(m3, m4) ## End(Not run)
m1 <- lm(mpg ~ qsec * hp, data = mtcars) m2 <- lm(mpg ~ am, data = mtcars) modelCompare(m1, m2) ## cleanup rm(m1, m2) ## Not run: m3 <- lm(mpg ~ 1, data = mtcars) m4 <- lm(mpg ~ 0, data = mtcars) modelCompare(m3, m4) ## cleanup rm(m3, m4) ## End(Not run)
A set of functions to calculate
model diagnostics on models, including constructors,
a generic function, a test of whether an object is of the
modelDiagnostics
class, and methods.
modelDiagnostics(object, ...) as.modelDiagnostics(x) is.modelDiagnostics(x) ## S3 method for class 'lm' modelDiagnostics( object, ev.perc = 0.001, robust = FALSE, distr = "normal", standardized = TRUE, ... )
modelDiagnostics(object, ...) as.modelDiagnostics(x) is.modelDiagnostics(x) ## S3 method for class 'lm' modelDiagnostics( object, ev.perc = 0.001, robust = FALSE, distr = "normal", standardized = TRUE, ... )
object |
A fitted model object, with methods for
|
... |
Additional arguments, passed to |
x |
An object to test or a list to coerce to a
|
ev.perc |
A real number between 0 and 1 indicating the proportion of the theoretical distribution beyond which values are considered extreme values (possible outliers). Defaults to .001. |
robust |
Whether to use robust mean and standard deviation estimates for normal distribution |
distr |
A character string given the assumed distribution.
Passed on to |
standardized |
A logical whether to use standardized residuals.
Defaults to |
A logical (is.modelDiagnostics
) or
a modelDiagnostics object (list) for
as.modelDiagnostics
and modelDiagnostics
.
testm <- stats::lm(mpg ~ hp * factor(cyl), data = mtcars) md <- modelDiagnostics(testm) plot(md$residualDiagnostics$testDistribution) md$extremeValues plot(md) md <- modelDiagnostics(testm, ev.perc = .1) md$extremeValues plot(md, ncol = 2) testdat <- data.frame( y = c(1, 2, 2, 3, 3, NA, 9000000, 2, 2, 1), x = c(1, 2, 3, 4, 5, 6, 5, 4, 3, 2)) modelDiagnostics( lm(y ~ x, data = testdat, na.action = "na.omit"), ev.perc = .1)$extremeValues modelDiagnostics( lm(y ~ x, data = testdat, na.action = "na.exclude"), ev.perc = .1)$extremeValues ## clean up rm(testm, md, testdat)
testm <- stats::lm(mpg ~ hp * factor(cyl), data = mtcars) md <- modelDiagnostics(testm) plot(md$residualDiagnostics$testDistribution) md$extremeValues plot(md) md <- modelDiagnostics(testm, ev.perc = .1) md$extremeValues plot(md, ncol = 2) testdat <- data.frame( y = c(1, 2, 2, 3, 3, NA, 9000000, 2, 2, 1), x = c(1, 2, 3, 4, 5, 6, 5, 4, 3, 2)) modelDiagnostics( lm(y ~ x, data = testdat, na.action = "na.omit"), ev.perc = .1)$extremeValues modelDiagnostics( lm(y ~ x, data = testdat, na.action = "na.exclude"), ev.perc = .1)$extremeValues ## clean up rm(testm, md, testdat)
Generic function. Generally returns things like fit indices, absolute error metrics, tests of overall model significance.
modelPerformance(object, ...) as.modelPerformance(x) is.modelPerformance(x) ## S3 method for class 'lm' modelPerformance(object, ...)
modelPerformance(object, ...) as.modelPerformance(x) is.modelPerformance(x) ## S3 method for class 'lm' modelPerformance(object, ...)
object |
A fitted model object. The class of the model determines which specific method is called. |
... |
Additional arguments passed to specific methods. |
x |
A object (e.g., list or a modelPerformance object) to test or attempt coercing to a modelPerformance object. |
For lm
class objects, return number of observations,
AIC, BIC, log likelihood, R2, overall model F test, and p-value.
A data.table
with results.
A list with a data.table
with the following elements:
A character string indicating the model type, here lm
The number of observations
Akaike Information Criterion
Bayesian Information Criterion
log likelihood
log likelihood degrees of freedom
Residual variability
in sample variance explained
Cohen's F2 effect size R2 / (1 - R2)
adjusted variance explained
F value for overall model significance test
numerator degrees of freedom for F test
denominator degrees of freedom for F test
p-value for overall model F test
modelPerformance(lm(mpg ~ qsec * hp, data = mtcars)) modelPerformance(lm(mpg ~ hp, data = mtcars)) ## Not run: modelPerformance(lm(mpg ~ 0 + hp, data = mtcars)) modelPerformance(lm(mpg ~ 1, data = mtcars)) modelPerformance(lm(mpg ~ 0, data = mtcars)) ## End(Not run)
modelPerformance(lm(mpg ~ qsec * hp, data = mtcars)) modelPerformance(lm(mpg ~ hp, data = mtcars)) ## Not run: modelPerformance(lm(mpg ~ 0 + hp, data = mtcars)) modelPerformance(lm(mpg ~ 1, data = mtcars)) modelPerformance(lm(mpg ~ 0, data = mtcars)) ## End(Not run)
TODO: make me!
modelTest(object, ...) is.modelTest(x) as.modelTest(x) ## S3 method for class 'vglm' modelTest(object, ...) ## S3 method for class 'lm' modelTest(object, ...)
modelTest(object, ...) is.modelTest(x) as.modelTest(x) ## S3 method for class 'vglm' modelTest(object, ...) ## S3 method for class 'lm' modelTest(object, ...)
object |
A fitted model object. |
... |
Additional arguments passed to specific methods. |
x |
A object (e.g., list or a modelTest object) to test or attempt coercing to a modelTest object. |
Depends on the method dispatch.
A list with two elements.
Results
contains a data table of the actual estimates.
Table
contains a nicely formatted character matrix.
A list with two elements.
Results
contains a data table of the actual estimates.
Table
contains a nicely formatted character matrix.
mtcars$cyl <- factor(mtcars$cyl) m <- VGAM::vglm(cyl ~ qsec, family = VGAM::multinomial(), data = mtcars) modelTest(m) ## clean up rm(m, mtcars) ## Not run: mtcars$cyl <- factor(mtcars$cyl) mtcars$am <- factor(mtcars$am) m <- VGAM::vglm(cyl ~ qsec, family = VGAM::multinomial(), data = mtcars) modelTest(m) m <- VGAM::vglm(cyl ~ scale(qsec), family = VGAM::multinomial(), data = mtcars) modelTest(m) m2 <- VGAM::vglm(cyl ~ factor(vs) * scale(qsec), family = VGAM::multinomial(), data = mtcars) modelTest(m2) m <- VGAM::vglm(Species ~ Sepal.Length, family = VGAM::multinomial(), data = iris) modelTest(m) set.seed(1234) sampdata <- data.frame( Outcome = factor(sample(letters[1:3], 20 * 9, TRUE)), C1 = rnorm(20 * 9), D3 = sample(paste0("L", 1:3), 20 * 9, TRUE)) m <- VGAM::vglm(Outcome ~ factor(D3), family = VGAM::multinomial(), data = sampdata) modelTest(m) m <- VGAM::vglm(Outcome ~ factor(D3) + C1, family = VGAM::multinomial(), data = sampdata) modelTest(m) ## End(Not run) m1 <- lm(mpg ~ qsec * hp, data = mtcars) modelTest(m1) mtcars$cyl <- factor(mtcars$cyl) m2 <- lm(mpg ~ cyl, data = mtcars) modelTest(m2) m3 <- lm(mpg ~ hp * cyl, data = mtcars) modelTest(m3) m4 <- lm(sqrt(mpg) ~ hp * cyl, data = mtcars) modelTest(m4) m5 <- lm(mpg ~ sqrt(hp) * cyl, data = mtcars) modelTest(m5) ## cleanup rm(m1, m2, m3, m4, m5, mtcars)
mtcars$cyl <- factor(mtcars$cyl) m <- VGAM::vglm(cyl ~ qsec, family = VGAM::multinomial(), data = mtcars) modelTest(m) ## clean up rm(m, mtcars) ## Not run: mtcars$cyl <- factor(mtcars$cyl) mtcars$am <- factor(mtcars$am) m <- VGAM::vglm(cyl ~ qsec, family = VGAM::multinomial(), data = mtcars) modelTest(m) m <- VGAM::vglm(cyl ~ scale(qsec), family = VGAM::multinomial(), data = mtcars) modelTest(m) m2 <- VGAM::vglm(cyl ~ factor(vs) * scale(qsec), family = VGAM::multinomial(), data = mtcars) modelTest(m2) m <- VGAM::vglm(Species ~ Sepal.Length, family = VGAM::multinomial(), data = iris) modelTest(m) set.seed(1234) sampdata <- data.frame( Outcome = factor(sample(letters[1:3], 20 * 9, TRUE)), C1 = rnorm(20 * 9), D3 = sample(paste0("L", 1:3), 20 * 9, TRUE)) m <- VGAM::vglm(Outcome ~ factor(D3), family = VGAM::multinomial(), data = sampdata) modelTest(m) m <- VGAM::vglm(Outcome ~ factor(D3) + C1, family = VGAM::multinomial(), data = sampdata) modelTest(m) ## End(Not run) m1 <- lm(mpg ~ qsec * hp, data = mtcars) modelTest(m1) mtcars$cyl <- factor(mtcars$cyl) m2 <- lm(mpg ~ cyl, data = mtcars) modelTest(m2) m3 <- lm(mpg ~ hp * cyl, data = mtcars) modelTest(m3) m4 <- lm(sqrt(mpg) ~ hp * cyl, data = mtcars) modelTest(m4) m5 <- lm(mpg ~ sqrt(hp) * cyl, data = mtcars) modelTest(m5) ## cleanup rm(m1, m2, m3, m4, m5, mtcars)
This function relies on the lavaan package to use the Expectation Maximization (EM) algorithm to estimate the first and second moments (means and [co]variances) when there is missing data.
moments(data, ...)
moments(data, ...)
data |
A data frame or an object coercable to a data frame. The means and covariances of all variables are estimated. |
... |
Additional arguments passed on to the |
A list containing the esimates from the EM algorithm.
mu |
A named vector of the means. |
sigma |
The covariance matrix. |
Suggested by Yves Rosseel author of the lavaan package on which this depends
# sample data Xmiss <- as.matrix(iris[, -5]) # make 25% missing completely at random set.seed(10) Xmiss[sample(length(Xmiss), length(Xmiss) * .25)] <- NA Xmiss <- as.data.frame(Xmiss) # true means and covariance colMeans(iris[, -5]) # covariance with n - 1 divisor cov(iris[, -5]) # means and covariance matrix using list wise deletion colMeans(na.omit(Xmiss)) cov(na.omit(Xmiss)) # means and covariance matrix using EM moments(Xmiss) # clean up rm(Xmiss)
# sample data Xmiss <- as.matrix(iris[, -5]) # make 25% missing completely at random set.seed(10) Xmiss[sample(length(Xmiss), length(Xmiss) * .25)] <- NA Xmiss <- as.data.frame(Xmiss) # true means and covariance colMeans(iris[, -5]) # covariance with n - 1 divisor cov(iris[, -5]) # means and covariance matrix using list wise deletion colMeans(na.omit(Xmiss)) cov(na.omit(Xmiss)) # means and covariance matrix using EM moments(Xmiss) # clean up rm(Xmiss)
Given a vector, exclude any missing values, not a number values, non finite values, and if a character class, any zero length strings.
naz.omit(x)
naz.omit(x)
x |
A vector to exclude missing, non finite or zero length strings from |
a vector with missing/non finite/zero length strings omitted
## stats na.omit stats::na.omit(c(1, NA, NaN)) stats::na.omit(c("test", "", NA_character_)) naz.omit(c(1, NA, NaN)) naz.omit(c(1L, NA)) naz.omit(c(1L, NA, Inf)) naz.omit(c("test", "", NA_character_))
## stats na.omit stats::na.omit(c(1, NA, NaN)) stats::na.omit(c("test", "", NA_character_)) naz.omit(c(1, NA, NaN)) naz.omit(c(1L, NA)) naz.omit(c(1L, NA, Inf)) naz.omit(c("test", "", NA_character_))
This function takes a vector of statistics and calculates several summaries: mean, median, 95 the empirical p-value, that is, how many fall on the other side of zero.
param_summary(x, trans = function(x) x, ..., na.rm = TRUE)
param_summary(x, trans = function(x) x, ..., na.rm = TRUE)
x |
a data vector to operate on |
trans |
A function to transform the data. Used for summaries, but not p-values. Defaults to the identity function. |
... |
Additional arguments passed to |
na.rm |
Logical whether to remove NA values. Defaults to |
A data frame of summary statistics
param_summary(rnorm(100))
param_summary(rnorm(100))
This functions nicely formats a data frame of parameter summary statistics and is designed to be used with the param_summary() function.
param_summary_format(d, digits = getOption("digits"), pretty = FALSE)
param_summary_format(d, digits = getOption("digits"), pretty = FALSE)
d |
A data frame of the parameter summary statistics |
digits |
Number of digits to round to for printing |
pretty |
Logical value whether prettified values should be returned.
Defaults to |
A formatted data.table of summary statistics or a formated
vector (if pretty = TRUE
).
set.seed(1234) xsum <- do.call(rbind, apply(matrix(rnorm(100*10), ncol = 10), 2, param_summary)) rownames(xsum) <- letters[1:10] param_summary_format(xsum) param_summary_format(xsum, pretty = TRUE) rm(xsum)
set.seed(1234) xsum <- do.call(rbind, apply(matrix(rnorm(100*10), ncol = 10), 2, param_summary)) rownames(xsum) <- letters[1:10] param_summary_format(xsum) param_summary_format(xsum, pretty = TRUE) rm(xsum)
This function creates a number of diagnostic plots from lm models.
## S3 method for class 'modelDiagnostics.lm' plot(x, y, plot = TRUE, ask = TRUE, ncol, ...)
## S3 method for class 'modelDiagnostics.lm' plot(x, y, plot = TRUE, ask = TRUE, ncol, ...)
x |
A |
y |
Included to match the generic. Not used. |
plot |
A logical value whether or not to plot the results or simply return the graaphical objects. |
ask |
A logical whether to ask before changing plots. Only applies to interactive environments. |
ncol |
The number of columns to use for plots. Missing by default which means individual plots are created. If specified, plots are put together in a grid. |
... |
Included to match the generic. Not used. |
a list including plots of the residuals, residuals versus fitted values
testm <- stats::lm(mpg ~ hp * factor(cyl), data = mtcars) md <- modelDiagnostics(testm, ev.perc = .1) plot(md) plot(md, ncol = 2) ## clean up rm(testm, md)
testm <- stats::lm(mpg ~ hp * factor(cyl), data = mtcars) md <- modelDiagnostics(testm, ev.perc = .1) plot(md) plot(md, ncol = 2) ## clean up rm(testm, md)
This function creates a number of diagnostic plots from residuals. It is a default method.
## S3 method for class 'residualDiagnostics' plot(x, y, plot = TRUE, ask = TRUE, ncol, ...)
## S3 method for class 'residualDiagnostics' plot(x, y, plot = TRUE, ask = TRUE, ncol, ...)
x |
A |
y |
Included to match the generic. Not used. |
plot |
A logical value whether or not to plot the results or simply return the graphical objects. |
ask |
A logical whether to ask before changing plots. Only applies to interactive environments. |
ncol |
The number of columns to use for plots. Missing by default which means individual plots are created. If specified, plots are put together in a grid. |
... |
Included to match the generic. Not used. |
a list including plots of the residuals, residuals versus fitted values
testm <- stats::lm(mpg ~ hp * factor(cyl), data = mtcars) testm <- stats::lm(mpg ~ factor(cyl), data = mtcars) md <- residualDiagnostics(testm, ev.perc = .1) plot(md, plot = FALSE)$ResFittedPlot plot(md, ncol = 2) ## clean up rm(testm, md)
testm <- stats::lm(mpg ~ hp * factor(cyl), data = mtcars) testm <- stats::lm(mpg ~ factor(cyl), data = mtcars) md <- residualDiagnostics(testm, ev.perc = .1) plot(md, plot = FALSE)$ResFittedPlot plot(md, ncol = 2) ## clean up rm(testm, md)
Plots SEMSummary object
## S3 method for class 'SEMSummary' plot(x, y, ...)
## S3 method for class 'SEMSummary' plot(x, y, ...)
x |
An object of class SEMSummary. |
y |
Ignored |
... |
Additional arguments passed on to the real workhorse, |
# default plot plot(SEMSummary(~ ., data = mtcars)) # same as default plot(SEMSummary(~ ., data = mtcars), type = "coverage") # shows p values plot(SEMSummary(~ ., data = mtcars), type = "p") # shows correlations plot(SEMSummary(~ ., data = mtcars), type = "cor")
# default plot plot(SEMSummary(~ ., data = mtcars)) # same as default plot(SEMSummary(~ ., data = mtcars), type = "coverage") # shows p values plot(SEMSummary(~ ., data = mtcars), type = "p") # shows correlations plot(SEMSummary(~ ., data = mtcars), type = "cor")
Plots SEMSummary.list object
## S3 method for class 'SEMSummary.list' plot(x, y, which, plot = TRUE, ...)
## S3 method for class 'SEMSummary.list' plot(x, y, which, plot = TRUE, ...)
x |
An object of class SEMSummary.list. |
y |
Ignored |
which |
either a numeric vector based on the positions, or a character vector giving the names of the levels of the list to plot. |
plot |
A logical, whether to actually plot the results or not.
Defaults to |
... |
Additional arguments passed on to the real workhorse, |
## correlation matrix by am level plot(SEMSummary(~ . | am, data = mtcars))
## correlation matrix by am level plot(SEMSummary(~ . | am, data = mtcars))
Make plots of testDistribution objects, including density and QQ plots.
## S3 method for class 'testDistribution' plot( x, y, xlim = NULL, varlab = "X", plot = TRUE, rugthreshold = 500, seed = 1234, factor = 1, ... )
## S3 method for class 'testDistribution' plot( x, y, xlim = NULL, varlab = "X", plot = TRUE, rugthreshold = 500, seed = 1234, factor = 1, ... )
x |
A list of class “testDistribution”. |
y |
Included to match the generic. Not used. |
xlim |
An optional vector to control the x limits for the theoretical distribution
density line, useful when densities become extreme at boundary values to help keep the
scale of the graph reasonable. Passed on to |
varlab |
A character vector the label to use for the variable |
plot |
A logical vector whether to plot the graphs. Defaults to |
rugthreshold |
Integer determining the number of observations beyond which no rug plot is added. Note that even if this threshold is exceeded, a rug plot will still be added for any extreme values (if extreme values are used and present). |
seed |
a random seed used to make the jitter added for Poisson and Negative Binomial distributions reproducible |
factor |
A scale factor fo the amount of jitter added to the QQ and Deviates plots for Poisson and Negative Binomial distributions. Defaults to 1. This results in 1 * smallest distance between points / 5 being used. |
... |
Additional arguments. |
An invisible list with the ggplot2 objects for graphs, as well as information about the distribution (parameter estimates, name, log likelihood (useful for comparing the fit of different distributions to the data), and a dataset with the sorted data and theoretical quantiles.
## evaluate mpg against a normal distribution plot(testDistribution(mtcars$mpg)) ## Not run: ## example data set.seed(1234) d <- data.table::data.table( Ynorm = rnorm(200), Ybeta = rbeta(200, 1, 4), Ychisq = rchisq(200, 8), Yf = rf(200, 5, 10), Ygamma = rgamma(200, 2, 2), Ynbinom = rnbinom(200, mu = 4, size = 9), Ypois = rpois(200, 4)) ## testing and graphing plot(testDistribution(d$Ybeta, "beta", starts = list(shape1 = 1, shape2 = 4))) plot(testDistribution(d$Ychisq, "chisq", starts = list(df = 8))) ## for chi-square distribution, extreme values only on ## the right tail plot(testDistribution(d$Ychisq, "chisq", starts = list(df = 8), extremevalues = "empirical", ev.perc = .1)) plot(testDistribution(d$Ychisq, "chisq", starts = list(df = 8), extremevalues = "theoretical", ev.perc = .1)) plot(testDistribution(d$Yf, "f", starts = list(df1 = 5, df2 = 10))) plot(testDistribution(d$Ygamma, "gamma")) plot(testDistribution(d$Ynbinom, "poisson")) plot(testDistribution(d$Ynbinom, "nbinom")) plot(testDistribution(d$Ypois, "poisson")) ## compare log likelihood of two different distributions testDistribution(d$Ygamma, "normal")$Distribution$LL testDistribution(d$Ygamma, "gamma")$Distribution$LL plot(testDistribution(d$Ynorm, "normal")) plot(testDistribution(c(d$Ynorm, 10, 1000), "normal", extremevalues = "theoretical")) plot(testDistribution(c(d$Ynorm, 10, 1000), "normal", extremevalues = "theoretical", robust = TRUE)) plot(testDistribution(mtcars, "mvnormal")) ## for multivariate normal mahalanobis distance ## which follows a chi-square distribution, extreme values only on ## the right tail plot(testDistribution(mtcars, "mvnormal", extremevalues = "empirical", ev.perc = .1)) plot(testDistribution(mtcars, "mvnormal", extremevalues = "theoretical", ev.perc = .1)) rm(d) ## cleanup ## End(Not run)
## evaluate mpg against a normal distribution plot(testDistribution(mtcars$mpg)) ## Not run: ## example data set.seed(1234) d <- data.table::data.table( Ynorm = rnorm(200), Ybeta = rbeta(200, 1, 4), Ychisq = rchisq(200, 8), Yf = rf(200, 5, 10), Ygamma = rgamma(200, 2, 2), Ynbinom = rnbinom(200, mu = 4, size = 9), Ypois = rpois(200, 4)) ## testing and graphing plot(testDistribution(d$Ybeta, "beta", starts = list(shape1 = 1, shape2 = 4))) plot(testDistribution(d$Ychisq, "chisq", starts = list(df = 8))) ## for chi-square distribution, extreme values only on ## the right tail plot(testDistribution(d$Ychisq, "chisq", starts = list(df = 8), extremevalues = "empirical", ev.perc = .1)) plot(testDistribution(d$Ychisq, "chisq", starts = list(df = 8), extremevalues = "theoretical", ev.perc = .1)) plot(testDistribution(d$Yf, "f", starts = list(df1 = 5, df2 = 10))) plot(testDistribution(d$Ygamma, "gamma")) plot(testDistribution(d$Ynbinom, "poisson")) plot(testDistribution(d$Ynbinom, "nbinom")) plot(testDistribution(d$Ypois, "poisson")) ## compare log likelihood of two different distributions testDistribution(d$Ygamma, "normal")$Distribution$LL testDistribution(d$Ygamma, "gamma")$Distribution$LL plot(testDistribution(d$Ynorm, "normal")) plot(testDistribution(c(d$Ynorm, 10, 1000), "normal", extremevalues = "theoretical")) plot(testDistribution(c(d$Ynorm, 10, 1000), "normal", extremevalues = "theoretical", robust = TRUE)) plot(testDistribution(mtcars, "mvnormal")) ## for multivariate normal mahalanobis distance ## which follows a chi-square distribution, extreme values only on ## the right tail plot(testDistribution(mtcars, "mvnormal", extremevalues = "empirical", ev.perc = .1)) plot(testDistribution(mtcars, "mvnormal", extremevalues = "theoretical", ev.perc = .1)) rm(d) ## cleanup ## End(Not run)
Generic function to return variance explained (R2) estimates from various models. In some cases these will be true R2 values, in other cases they may be pseudo-R2 values if R2 is not strictly defined for a model.
R2(object, ...) ## S3 method for class 'lm' R2(object, ...)
R2(object, ...) ## S3 method for class 'lm' R2(object, ...)
object |
A fitted model object. |
... |
Additional arguments passed to specific methods. |
Depends on the method dispatch.
The raw and adjusted r-squared value.
R2(lm(mpg ~ qsec * hp, data = mtcars))
R2(lm(mpg ~ qsec * hp, data = mtcars))
A set of functions to calculate
residual diagnostics on models, including constructors,
a generic function, a test of whether an object is of the
residualDiagnostics
class, and methods.
residualDiagnostics(object, ...) as.residualDiagnostics(x) is.residualDiagnostics(x) ## S3 method for class 'lm' residualDiagnostics( object, ev.perc = 0.001, robust = FALSE, distr = "normal", standardized = TRUE, cut = 4L, ... )
residualDiagnostics(object, ...) as.residualDiagnostics(x) is.residualDiagnostics(x) ## S3 method for class 'lm' residualDiagnostics( object, ev.perc = 0.001, robust = FALSE, distr = "normal", standardized = TRUE, cut = 4L, ... )
object |
A fitted model object, with methods for
|
... |
Additional arguments, not currently used. |
x |
A object (e.g., list or a modelDiagnostics object) to test or attempt coercing to a residualDiagnostics object. |
ev.perc |
A real number between 0 and 1 indicating the proportion of the theoretical distribution beyond which values are considered extreme values (possible outliers). Defaults to .001. |
robust |
Whether to use robust mean and standard deviation estimates for normal distribution |
distr |
A character string given the assumed distribution.
Passed on to |
standardized |
A logical whether to use standardized residuals.
Defaults to |
cut |
An integer, how many unique predicted values there have to be at least for predicted values to be treated continuously, otherwise they are treated as discrete values. Defaults to 4. |
A logical (is.residualDiagnostics
) or
a residualDiagnostics object (list) for
as.residualDiagnostics
and residualDiagnostics
.
testm <- stats::lm(mpg ~ hp * factor(cyl), data = mtcars) resm <- residualDiagnostics(testm) plot(resm$testDistribution) resm <- residualDiagnostics(testm, standardized = FALSE) plot(resm$testDistribution) ## clean up rm(testm, resm) ## Not run: testdat <- data.frame( y = c(1, 2, 2, 3, 3, NA, 9000000, 2, 2, 1), x = c(1, 2, 3, 4, 5, 6, 5, 4, 3, 2)) residualDiagnostics( lm(y ~ x, data = testdat, na.action = "na.omit"), ev.perc = .1)$Residuals residualDiagnostics( lm(y ~ x, data = testdat, na.action = "na.exclude"), ev.perc = .1)$Residuals residualDiagnostics( lm(sqrt(mpg) ~ hp, data = mtcars, na.action = "na.omit"), ev.perc = .1)$Residuals ## End(Not run)
testm <- stats::lm(mpg ~ hp * factor(cyl), data = mtcars) resm <- residualDiagnostics(testm) plot(resm$testDistribution) resm <- residualDiagnostics(testm, standardized = FALSE) plot(resm$testDistribution) ## clean up rm(testm, resm) ## Not run: testdat <- data.frame( y = c(1, 2, 2, 3, 3, NA, 9000000, 2, 2, 1), x = c(1, 2, 3, 4, 5, 6, 5, 4, 3, 2)) residualDiagnostics( lm(y ~ x, data = testdat, na.action = "na.omit"), ev.perc = .1)$Residuals residualDiagnostics( lm(y ~ x, data = testdat, na.action = "na.exclude"), ev.perc = .1)$Residuals residualDiagnostics( lm(sqrt(mpg) ~ hp, data = mtcars, na.action = "na.omit"), ev.perc = .1)$Residuals ## End(Not run)
Numbers are the minimum, 25th percentile, median, 75th percentile, and maximum, of the non missing data. Values returned are either the significant digits or rounded values, whichever ends up resulting in the fewest total digits.
roundedfivenum(x, round = 2, sig = 3)
roundedfivenum(x, round = 2, sig = 3)
x |
The data to have the summary calculated on |
round |
The number of digits to try rounding |
sig |
The number of significant digits to try |
The rounded or significant digit five number summary
JWileymisc:::roundedfivenum(rnorm(1000)) JWileymisc:::roundedfivenum(mtcars$hp)
JWileymisc:::roundedfivenum(rnorm(1000)) JWileymisc:::roundedfivenum(mtcars$hp)
This function is designed to calculate the descriptive statistics and summaries that are often reported on raw data when the main analyses use structural equation modelling.
SEMSummary( formula, data, use = c("fiml", "pairwise.complete.obs", "complete.obs") )
SEMSummary( formula, data, use = c("fiml", "pairwise.complete.obs", "complete.obs") )
formula |
A formula of the variables to be used in the analysis. See the ‘details’ section for more information. |
data |
A data frame, matrix, or list containing the variables used in the formula. This is a required argument. |
use |
A character vector of how to handle missing data. Defaults to “fiml”. |
This function calculates a variety of relevant statistics on the raw data used in a SEM analysis. Because it is meant for SEM style data, for now it expects all variables to be numeric. In the future I may try to expand it to handle factor variables somehow.
Both the formula and data arguments are required. The formula should
be the right hand side only. The most common way to use it would be with
variable names separated by ‘+s’. For convenience, a ‘.’ is
expanded to mean “all variables in the data set”. For a large number
of variables or when whole datasets are being analyzed, this can
be considerably easier to write. Also it facilitates column indexing
by simply passing a subset of the data (e.g., data[, 1:10]
) and
using the ‘.’ expansion to
analyze the first 10 columns. The examples section demonstrate this use.
Also noteworthy is that SEMSummary
is not really meant to be used
on its own. It is the computational workhorse, but it is meant to be used
with a styling or printing method to produce simple output.
APAStyler
has methods for SEMSummary
output.
There are several new ways to handle missing data now including listwise deletion, pairwise deletion, and using the EM algorithm, the default.
A list with S3 class “SEMSummary”
names |
A character vector containing the variable names. |
n |
An integer vector of the length of each variable used (this includes available and missing data). |
nmissing |
An integer vector of the number of missing values in each variable. |
mu |
A vector of the arithmetic means of each variable (on complete data). |
stdev |
A numeric vector of the standard deviations of each variable (on complete data). |
Sigma |
The numeric covariance matrix for all variables. |
sSigma |
The numeric correlation matrix for all variables. |
coverage |
A numeric matrix giving the percentage (technically decimal) of information available for each pairwise covariance/correlation. |
pvalue |
The two-sided p values for the correlation matrix. Pairwise present N used to calculate degrees of freedom. |
## Example using the built in iris dataset s <- SEMSummary(~ Sepal.Length + Sepal.Width + Petal.Length, data = iris) s # show output ... not very nice ## Prettier output from SEMSummary APAStyler(s) #### Subset the dataset and use the . expansion #### ## summary for all variables in mtcars data set ## with 11 variables, this could be a pain to write out SEMSummary(~ ., data = mtcars) ## . expansion is also useful when we know column positions ## but not necessarily names SEMSummary(~ ., data = mtcars[, c(1, 2, 3, 9, 10, 11)]) ## clean up rm(s) ## sample data Xmiss <- as.matrix(iris[, -5]) # make q0% missing completely at random set.seed(10) Xmiss[sample(length(Xmiss), length(Xmiss) * .10)] <- NA Xmiss <- as.data.frame(Xmiss) SEMSummary(~ ., data = Xmiss, use = "fiml") ## pairwise APAStyler(SEMSummary(~ ., data = Xmiss, use = "pair"), type = "cor") ## same as cor() cor(Xmiss, use = "pairwise.complete.obs") ## complete cases only SEMSummary(~ ., data = Xmiss, use = "comp") ## clean up rm(Xmiss)
## Example using the built in iris dataset s <- SEMSummary(~ Sepal.Length + Sepal.Width + Petal.Length, data = iris) s # show output ... not very nice ## Prettier output from SEMSummary APAStyler(s) #### Subset the dataset and use the . expansion #### ## summary for all variables in mtcars data set ## with 11 variables, this could be a pain to write out SEMSummary(~ ., data = mtcars) ## . expansion is also useful when we know column positions ## but not necessarily names SEMSummary(~ ., data = mtcars[, c(1, 2, 3, 9, 10, 11)]) ## clean up rm(s) ## sample data Xmiss <- as.matrix(iris[, -5]) # make q0% missing completely at random set.seed(10) Xmiss[sample(length(Xmiss), length(Xmiss) * .10)] <- NA Xmiss <- as.data.frame(Xmiss) SEMSummary(~ ., data = Xmiss, use = "fiml") ## pairwise APAStyler(SEMSummary(~ ., data = Xmiss, use = "pair"), type = "cor") ## same as cor() cor(Xmiss, use = "pairwise.complete.obs") ## complete cases only SEMSummary(~ ., data = Xmiss, use = "comp") ## clean up rm(Xmiss)
This is a low level fitting function, for SEMSummary.
SEMSummary.fit( formula, data, use = c("fiml", "pairwise.complete.obs", "complete.obs") )
SEMSummary.fit( formula, data, use = c("fiml", "pairwise.complete.obs", "complete.obs") )
formula |
A formula of the variables to be used in the analysis. See the ‘details’ section for more information. |
data |
A data frame, matrix, or list containing the variables used in the formula. This is a required argument. |
use |
A character vector of how to handle missing data. Defaults to “fiml”. |
A list with S3 class “SEMSummary”
names |
A character vector containing the variable names. |
n |
An integer vector of the length of each variable used (this includes available and missing data). |
nmissing |
An integer vector of the number of missing values in each variable. |
mu |
A vector of the arithmetic means of each variable (on complete data). |
stdev |
A numeric vector of the standard deviations of each variable (on complete data). |
Sigma |
The numeric covariance matrix for all variables. |
sSigma |
The numeric correlation matrix for all variables. |
coverage |
A numeric matrix giving the percentage (technically decimal) of information available for each pairwise covariance/correlation. |
pvalue |
The two-sided p values for the correlation matrix. Pairwise present N used to calculate degrees of freedom. |
Simple function to calculate effect sizes for mean differences.
smd(x, g, index = c("all", "1", "2"))
smd(x, g, index = c("all", "1", "2"))
x |
A continuous variable |
g |
A grouping variable, with two levels |
index |
A character string: “all” uses pooled variance, “1” uses the first factor level variance, “2” uses the second factor level variance. |
The standardized mean difference.
smd(mtcars$mpg, mtcars$am) smd(mtcars$mpg, mtcars$am, "all") smd(mtcars$mpg, mtcars$am, "1") smd(mtcars$mpg, mtcars$am, "2") smd(mtcars$hp, mtcars$vs) d <- data.table::as.data.table(mtcars) d[, smd(mpg, vs)] rm(d)
smd(mtcars$mpg, mtcars$am) smd(mtcars$mpg, mtcars$am, "all") smd(mtcars$mpg, mtcars$am, "1") smd(mtcars$mpg, mtcars$am, "2") smd(mtcars$hp, mtcars$vs) d <- data.table::as.data.table(mtcars) d[, smd(mpg, vs)] rm(d)
Function to simplify converting p-values to asterisks
star(x, includeMarginal = FALSE)
star(x, includeMarginal = FALSE)
x |
p values to convert to stars |
includeMarginal |
logical value whether to include a symbol for
marginally significant >.05 but < .10 p-values. Defaults to |
A character string with stars
star(c(.0005, .001, .005, .01, .02, .05, .08, .1, .5, 1))
star(c(.0005, .001, .005, .01, .02, .05, .08, .1, .5, 1))
Function designed to help examine distributions.
It also includes an option for assessing multivariate normality using the
(squared) Mahalanobis distance. A generic function, some methods, and
constructor (as.testDistribution
) and function to check class
(is.testDistribution
) also are provided.
Note that for the use
argument, several options are possible.
By default it is “complete.obs”, which uses only cases with complete
data on all variables.
Another option is “pairwise.complete.obs”, which uses
all available data for each variable indivdiually to estimate the means and
variances, and all pairwise complete observation pairs for each covariance. Because
the same cases are not used for all estimates, it is possible to obtain a covariance
matrix that is not positive definite (e.g., correlations > +1 or < -1).
Finally, the last option is “fiml”, which uses full information maximum likelihood estimates of the means and covariance matrix. Depending on the number of cases, missing data patterns, and variables, this may be quite slow and computationally demanding.
The robust
argument determines whether to use robust estimates or not
when calculating densities, etc. By default it is FALSE
, but if
TRUE
and a univariate or multivariate normal distribution is tested,
then robust estimates of the means and covariance matrix (a variance if univariate)
will be used based on covMcd
from the robustbase package.
testDistribution(x, ...) as.testDistribution(x) is.testDistribution(x) ## Default S3 method: testDistribution( x, distr = c("normal", "beta", "chisq", "f", "gamma", "geometric", "nbinom", "poisson", "uniform", "mvnormal"), na.rm = TRUE, starts, extremevalues = c("no", "theoretical", "empirical"), ev.perc = 0.001, use = c("complete.obs", "pairwise.complete.obs", "fiml"), robust = FALSE, ... )
testDistribution(x, ...) as.testDistribution(x) is.testDistribution(x) ## Default S3 method: testDistribution( x, distr = c("normal", "beta", "chisq", "f", "gamma", "geometric", "nbinom", "poisson", "uniform", "mvnormal"), na.rm = TRUE, starts, extremevalues = c("no", "theoretical", "empirical"), ev.perc = 0.001, use = c("complete.obs", "pairwise.complete.obs", "fiml"), robust = FALSE, ... )
x |
The data as a single variable or vector to check the distribution unless the distribution is “mvnormal” in which case it should be a data frame or data table. |
... |
Additional arguments. If these include mu and sigma and the distribution is multivariate normal, then it will use the passed values instead of calculating the mean and covariances of the data. |
distr |
A character string indicating the distribution to be tested. Currently one of: “normal”, “beta”, “chisq” (chi-squared), “f”, “gamma”, “geometric”, “nbinom” (negative binomial), “poisson”, “uniform”, or “mvnormal” for multivariate normal where Mahalanobis distances are calculated and compared against a Chi-squared distribution with degrees of freedom equal to the number of variables. |
na.rm |
A logical value whether to omit missing values. Defaults to |
starts |
A named list of the starting values. Not required for all distributions.
Passed on to |
extremevalues |
A character vector whether to indicate extreme values. Should be “no” to do nothing, “empirical” to show extreme values based on the observed data percentiles, or “theoretical” to show extreme values based on percentiles of the theoretical distribution. |
ev.perc |
Percentile to use for extreme values. For example if .01, then the lowest 1 percent and highest 1 percent will be labelled extreme values. Defaults to the lowest and highest 0.1 percent. |
use |
A character vector indicating how the moments
(means and covariance matrix) should be estimated in the presence of
missing data when |
robust |
A logical whether to use robust estimation or not.
Currently only applies to normally distributed data
(univariate or multivariate). Also, when |
A logical whether or not an object is of class
testDistribution
or an object of the same class.
A list with information about the distribution (parameter estimates, name, log likelihood (useful for comparing the fit of different distributions to the data), and a dataset with the sorted data and theoretical quantiles.
## example data set.seed(1234) d <- data.table::data.table( Ynorm = rnorm(200), Ybeta = rbeta(200, 1, 4), Ychisq = rchisq(200, 8), Yf = rf(200, 5, 10), Ygamma = rgamma(200, 2, 2), Ynbinom = rnbinom(200, mu = 4, size = 9), Ypois = rpois(200, 4)) ## testing and graphing testDistribution(d$Ybeta, "beta", starts = list(shape1 = 1, shape2 = 4)) testDistribution(d$Ychisq, "chisq", starts = list(df = 8)) ## for chi-square distribution, extreme values only on ## the right tail testDistribution(d$Ychisq, "chisq", starts = list(df = 8), extremevalues = "empirical", ev.perc = .1) testDistribution(d$Ychisq, "chisq", starts = list(df = 8), extremevalues = "theoretical", ev.perc = .1) ## Not run: testDistribution(d$Yf, "uniform") testDistribution(d$Ypois, "geometric") testDistribution(d$Yf, "f", starts = list(df1 = 5, df2 = 10)) testDistribution(d$Ygamma, "gamma") testDistribution(d$Ynbinom, "poisson") testDistribution(d$Ynbinom, "nbinom") testDistribution(d$Ypois, "poisson") ## compare log likelihood of two different distributions testDistribution(d$Ygamma, "normal")$Distribution$LL testDistribution(d$Ygamma, "gamma")$Distribution$LL testDistribution(d$Ynorm, "normal") testDistribution(c(d$Ynorm, 10, 1000), "normal", extremevalues = "theoretical") testDistribution(c(d$Ynorm, 10, 1000), "normal", extremevalues = "theoretical", robust = TRUE) testDistribution(mtcars, "mvnormal") ## for multivariate normal mahalanobis distance ## which follows a chi-square distribution, extreme values only on ## the right tail testDistribution(mtcars, "mvnormal", extremevalues = "empirical", ev.perc = .1) testDistribution(mtcars, "mvnormal", extremevalues = "theoretical", ev.perc = .1) rm(d) ## cleanup ## End(Not run)
## example data set.seed(1234) d <- data.table::data.table( Ynorm = rnorm(200), Ybeta = rbeta(200, 1, 4), Ychisq = rchisq(200, 8), Yf = rf(200, 5, 10), Ygamma = rgamma(200, 2, 2), Ynbinom = rnbinom(200, mu = 4, size = 9), Ypois = rpois(200, 4)) ## testing and graphing testDistribution(d$Ybeta, "beta", starts = list(shape1 = 1, shape2 = 4)) testDistribution(d$Ychisq, "chisq", starts = list(df = 8)) ## for chi-square distribution, extreme values only on ## the right tail testDistribution(d$Ychisq, "chisq", starts = list(df = 8), extremevalues = "empirical", ev.perc = .1) testDistribution(d$Ychisq, "chisq", starts = list(df = 8), extremevalues = "theoretical", ev.perc = .1) ## Not run: testDistribution(d$Yf, "uniform") testDistribution(d$Ypois, "geometric") testDistribution(d$Yf, "f", starts = list(df1 = 5, df2 = 10)) testDistribution(d$Ygamma, "gamma") testDistribution(d$Ynbinom, "poisson") testDistribution(d$Ynbinom, "nbinom") testDistribution(d$Ypois, "poisson") ## compare log likelihood of two different distributions testDistribution(d$Ygamma, "normal")$Distribution$LL testDistribution(d$Ygamma, "gamma")$Distribution$LL testDistribution(d$Ynorm, "normal") testDistribution(c(d$Ynorm, 10, 1000), "normal", extremevalues = "theoretical") testDistribution(c(d$Ynorm, 10, 1000), "normal", extremevalues = "theoretical", robust = TRUE) testDistribution(mtcars, "mvnormal") ## for multivariate normal mahalanobis distance ## which follows a chi-square distribution, extreme values only on ## the right tail testDistribution(mtcars, "mvnormal", extremevalues = "empirical", ev.perc = .1) testDistribution(mtcars, "mvnormal", extremevalues = "theoretical", ev.perc = .1) rm(d) ## cleanup ## End(Not run)
This is a barebones theme. It turns many aspects of plot background lines, etc. off completely. Inspired from the excellent ggthemes package: https://github.com/jrnold/ggthemes
theme_tufte(base_size = 11, base_family = "sans")
theme_tufte(base_size = 11, base_family = "sans")
base_size |
base font size, given in pts. |
base_family |
base font family |
Given a vector, shift the values to have a new center, but keeping the same minimum and maximum. Designed to work with time values where the minimum indicates the same time as the maximum (e.g., 24:00:00 is the same as 00:00:00).
timeshift(x, center = 0, min = 0, max = 1, inverse = FALSE)
timeshift(x, center = 0, min = 0, max = 1, inverse = FALSE)
x |
the time scores to shift |
center |
A value (between the minimum and maximum) to center the time scores. Defaults to 0, which has no effect. |
min |
The theoretical minimum of the time scores. Defaults to 0. |
max |
the theoretical maximum of the time scores. Defaults to 1. |
inverse |
A logical value, whether to ‘unshift’
the time scores. Defaults to |
A vector of shifted time scores, recentered as specified.
## example showing centering at 11am (i.e., 11am becomes new 0) plot((1:24)/24, timeshift((1:24)/24, 11/24)) ## example showing the inverse, note that 24/24 becomes 0 plot((1:24)/24, timeshift(timeshift((1:24)/24, 11/24), 11/24, inverse = TRUE))
## example showing centering at 11am (i.e., 11am becomes new 0) plot((1:24)/24, timeshift((1:24)/24, 11/24)) ## example showing the inverse, note that 24/24 becomes 0 plot((1:24)/24, timeshift(timeshift((1:24)/24, 11/24), 11/24, inverse = TRUE))
This calculates and displays means, confidence intervals as well as which groups are different based on Tukey's HSD. Inspired by http://stackoverflow.com/questions/18771516/is-there-a-function-to-add-aov-post-hoc-testing-results-to-ggplot2-boxplot
TukeyHSDgg(x, y, d, ci = 0.95, idvar, ...)
TukeyHSDgg(x, y, d, ci = 0.95, idvar, ...)
x |
A categorical grouping variable name. |
y |
A continuous outcome variable name. |
d |
A dataset |
ci |
A numeric value indicating the coverage of the confidence interval to use. Defaults to 0.95. |
idvar |
An optional ID variable for multilevel data |
... |
Additional arguments passed on. |
A ggplot graph object.
## examples using it with single level data ## differences based on an ANOVA and follow up contrasts mtcars$cyl <- factor(mtcars$cyl) TukeyHSDgg("cyl", "mpg", mtcars) rm(mtcars) ## Not run: TukeyHSDgg("Species", "Sepal.Length", iris) ## example based on multilevel data ## differences based on model fit with lmer and follow up contrasts TukeyHSDgg("treatment", "decrease", OrchardSprays, idvar = "colpos") ## End(Not run)
## examples using it with single level data ## differences based on an ANOVA and follow up contrasts mtcars$cyl <- factor(mtcars$cyl) TukeyHSDgg("cyl", "mpg", mtcars) rm(mtcars) ## Not run: TukeyHSDgg("Species", "Sepal.Length", iris) ## example based on multilevel data ## differences based on model fit with lmer and follow up contrasts TukeyHSDgg("treatment", "decrease", OrchardSprays, idvar = "colpos") ## End(Not run)
Converter character (string) input of Snellen fractions, Counting Fingers (CF), and Hand Motion (HM) to logMAR values for use in statistical models. Can handle linear interpolation if passed an appropriate chart or if the measures fit with the default chart.
VAConverter( OS, OD, chart.values = NULL, chart.nletters = NULL, datatype = c("snellen", "decimal", "logMAR"), zero = 3 )
VAConverter( OS, OD, chart.values = NULL, chart.nletters = NULL, datatype = c("snellen", "decimal", "logMAR"), zero = 3 )
OS |
The values to be converted for the left eye (oculus sinister). |
OD |
The values to be converted for the right eye (oculus dexter) |
chart.values |
The Snellen fractions for the chart used (if interpolation is necessary and it is different from the default). |
chart.nletters |
The number of letters on each line of the chart that was used. Necessary for proper interpolation. |
datatype |
The type of data passed to |
zero |
The “zero” logMAR value. This is used as the
zero point for visual acuity. For example, for light perception
(LP), no light perception (NLP), etc. It defaults to 3 (which
is equivalent to a Snellen value of 20/20000), but may also be
|
VAConverter
is primarily designed to take raw character
data of various forms and convert them to logMAR values.
Acceptable examples include: "20/20", "20/80 + 3", "20/20 - 4",
"10/20", "CF 10", "HM 2", "CF 4", "NLP", "LP", "", "CF", "HM",
etc. For Snellen values, both parts should be present, and
there should be a space between components; e.g., between
fraction, +/- and number or between CF and 10. Although I have
attempted to make it as flexible and general as possible, there
are still fairly rigid requirements so that it can parse a
variety of text formats to numerical values. Optionally, it can
also handle decimal values (i.e., the results of actually
dividing a Snellen value 20/20 = 1).
chart.values
and chart.nletters
must be the same
length. These are used to interpolate values such as "20/20 + 3"
which is interpreted as reading all of the letters on the "20/20"
line and "3" of the letters on the next best line (typically
"20/15" but this can be chart dependent). The functions goes 3/n
of the distance between the logMAR values for each line. This is
why it is important to know the values for the chart that
was actually used.
If datatype = "logMAR", the values passed to OS
and
OD
are directly assigned to the logMAROS
and
logMAROD
slots of a "VAObject"
and an
error is returned if that results in the creation of an invalid
object (e.g., they are not numeric or not of equal length).
The zero
argument is primarily included to facilitate
calculating averages. For example, in some cases it may be nice
to get a sense of an individual's "overall" or "average" logMAR
value. Because on the logMAR scale, 0 is "20/20", an alternate
number needs to be used. 3 was chosen as a rough default, but it
is by no means necessarily the best choice. If you are not
interested in computing an average between the left and right eyes
within individuals, it makes sense to simply use NA
rather
than a crude "zero" approximation.
An object of class VAObject
. This
includes the left and right eye logMAR values in slots
@logMAROS
and @logMAROD
as well as additional
information. More information can be found in the class
documentation.
## sampdat <- c("HM 12", "20/20 + 3", "20/50", "CF", "HM", ## "20/70 - 2", "LP", NA, "Prosthetic") ## tmp <- VAConverter(OS = sampdat, OD = rev(sampdat), datatype = "snellen")
## sampdat <- c("HM 12", "20/20 + 3", "20/50", "CF", "HM", ## "20/70 - 2", "LP", NA, "Prosthetic") ## tmp <- VAConverter(OS = sampdat, OD = rev(sampdat), datatype = "snellen")
A class to hold Visual Acuity data for the oculus sinister (OS; left eye) and oculus dexter (OD; right eye)
## S4 method for signature 'VAObject,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'VAObject' print(x, ...) ## S4 method for signature 'VAObject' show(object) ## S4 method for signature 'VAObject' summary(object, weightbest = TRUE, w = c(0.75, 0.25))
## S4 method for signature 'VAObject,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'VAObject' print(x, ...) ## S4 method for signature 'VAObject' show(object) ## S4 method for signature 'VAObject' summary(object, weightbest = TRUE, w = c(0.75, 0.25))
x |
the object to subset |
i |
the rows to subset (optional) |
j |
the columns to subset (optional) |
... |
Additional arguments passed to lower functions |
drop |
should be missing |
object |
A VAObject class object |
weightbest |
Logical whether to upweight the best seeing eye.
Defaults to |
w |
A numeric vector of the weights, first for the best seeing
then the worst seeing eye. Defaults to |
x[i
: extract method
print(VAObject)
: print method
show(VAObject)
: show method
summary(VAObject)
: summary method
originalOS
the original visual acuity data for the left (ocular sinister) eye
originalOD
the original visual acuity data for the right (ocular dexter) eye
logMAROS
Logarithm of the minimum angle of resolution data for OS
logMAROD
Logarithm of the minimum angle of resolution data for OD
chart.values
the snellen values for each line of the chart used to measure visual acuity. Used for the linear interpolation in the case of partially correct line readings.
chart.nletters
the number of letters on each line of the chart used to measure visual acuity. Used for the linear interpolation in the case of partially correct line readings (+2 is 2/4 of the way to the next line if there are four letters, but only 2/6 if there are six, etc.)
zero
the logMAR value chosen to represent "zero" visual acuity when creating the combined logMAR values for both eyes or taking the arithmetic mean.
A class designed to hold visual acuity summary data
## S4 method for signature 'VASummaryObject' show(object) ## S4 method for signature 'VASummaryObject,missing' plot(x, y, ...)
## S4 method for signature 'VASummaryObject' show(object) ## S4 method for signature 'VASummaryObject,missing' plot(x, y, ...)
object |
The object to be shown |
x |
A VASummaryObject |
y |
Should be missing |
... |
Additional, unused arguments |
show(VASummaryObject)
: show method
plot(x = VASummaryObject, y = missing)
: plot method
logMAR.combined
Numeric values of the combined logarithm of the minimum angle of resolution data for both eyes
snellen.combined
the snellen values back transformed from the combined logMAR values
mean.logMAR
average of the logarithm of the minimum angle of resolution data
mean.snellen
average of the combined Snellen data
Simple function winsorizes data at the specified percentile.
winsorizor(d, percentile, values, na.rm = TRUE)
winsorizor(d, percentile, values, na.rm = TRUE)
d |
A vector, matrix, data frame, or data table to be winsorized |
percentile |
The percentile bounded by [0, 1] to winsorize data at. If a data frame or matrix is provided for the data, this should have the same length as the number of columns, or it will be repeated for all. |
values |
If values are specified, use these instead of calculating by percentiles. Should be a data frame with columns named “low”, and “high”. If a data frame or matrix is provided for the data, there should be as many rows for values to winsorize at as there are columns in the data. |
na.rm |
A logical whether to remove NAs. |
winsorized data. Attributes are included to list the exact values (for each variable, if a data frame or matrix) used to winsorize at the lower and upper ends.
dev.new(width = 10, height = 5) par(mfrow = c(1, 2)) hist(as.vector(eurodist), main = "Eurodist") hist(winsorizor(as.vector(eurodist), .05), main = "Eurodist with lower and upper\n5% winsorized") library(data.table) dat <- data.table(x = 1:5) dat[, y := scale(1:5)] winsorizor(dat$y, .01) ## make a copy of the data table winsorizor(dat, .01) winsorizor(mtcars, .01) winsorizor(matrix(1:9, 3), .01) rm(dat) # clean up
dev.new(width = 10, height = 5) par(mfrow = c(1, 2)) hist(as.vector(eurodist), main = "Eurodist") hist(winsorizor(as.vector(eurodist), .05), main = "Eurodist with lower and upper\n5% winsorized") library(data.table) dat <- data.table(x = 1:5) dat[, y := scale(1:5)] winsorizor(dat$y, .01) ## make a copy of the data table winsorizor(dat, .01) winsorizor(mtcars, .01) winsorizor(matrix(1:9, 3), .01) rm(dat) # clean up