Title: | Compositional Data Linear Models with Composition Redistribution |
---|---|
Description: | Provided data containing an outcome variable, compositional variables and additional covariates (optional); linearly regress the outcome variable on an isometric log ratio (ilr) transformation of the linearly dependent compositional variables. The package provides predictions (with confidence intervals) in the change (delta) in the outcome/response variable based on the multiple linear regression model and evenly spaced reallocations of the compositional values. The compositional data analysis approach implemented is outlined in Dumuid et al. (2017a) <doi:10.1177/0962280217710835> and Dumuid et al. (2017b) <doi:10.1177/0962280217737805>. |
Authors: | Ty Stanford [aut, cre] , Charlotte Lund Rasmussen [aut] , Dot Dumuid [aut] |
Maintainer: | Ty Stanford <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0 |
Built: | 2024-10-17 03:29:30 UTC |
Source: | https://github.com/tystan/codaredistlm |
Add ILR coordinates to a data.frame containing composition variables
append_ilr_coords(dataf, comps, psi)
append_ilr_coords(dataf, comps, psi)
dataf |
data.frame containing composition variables |
comps |
character vector of composition variable names in dataf |
psi |
ilrBase passed to |
Sanity checks for arguments passed to predict_delta_comps()
check_input_args(dataf, y, comps, covars, deltas)
check_input_args(dataf, y, comps, covars, deltas)
dataf |
A |
y |
Name (as string/character vector of length 1) of outcome variable in |
comps |
Character vector of names of compositions in |
covars |
Character vector of covariates names (non-comp variables) in |
deltas |
A vector of time-component changes (as proportions of compositions , i.e., values between -1 and 1). Optional. |
Throws errors for any problematic input. Returns TRUE
invisibly if no issues found.
Check if compositional variable are strictly greater than 0
check_strictly_positive_vals(dataf, comps, tol = 1e-06)
check_strictly_positive_vals(dataf, comps, tol = 1e-06)
dataf |
data.frame containing composition variables |
comps |
character vector of composition variable names in dataf |
tol |
a numeric value that compositional values are expected to be greater or equal than. 1e-6 is deafult |
If any compositional values are found to be strictly less than tol
and erro is thrown.
Returns TRUE
invisibly otherwise.
Check whether columns exist in a data.frame
cols_exist(dataf, cols)
cols_exist(dataf, cols)
dataf |
a data.frame |
cols |
character vector of columns to be checked in |
An error if all cols
not present in dataf
.
Returns TRUE
invisibly otherwise.
Statistical test of the collective significance of the ilr variables
compare_two_lm(y_str, X1, X2)
compare_two_lm(y_str, X1, X2)
y_str |
a string representation of the column in |
X1 |
a data.frame or matrix that contains a subset of the predictor variables
in |
X2 |
a data.frame or matrix that contains the predictor variables and outcome variable |
Returns NULL
invisibly. The ANOVA analysis is printed to the console, that is,
the statistical test of whether the additional predictors in X2
improve the
model significantly from the model with only the subset of predictors in X1
.
Creates row-wise perturbations of compositions from the mean composition
create_comparison_matrix(comparisons, comps, mean_comps)
create_comparison_matrix(comparisons, comps, mean_comps)
comparisons |
currently two choices: |
comps |
the names (character vector) of the compositional variables |
mean_comps |
the mean composition of |
comparisons = "one-v-one"
creates a matrix with length(comps)
columns and length(comps) * (length(comps) - 1)
rows.
The rows contain all pairs of variables with 1 and -1 values.
comparisons = "prop-realloc"
creates a matrix with length(comps)
columns and length(comps)
rows.
Each rows contains a 1 value for a compositional variable and the remaining values sum to -1 proportional to the mean_comps
value for those variables.
Note that for both comparisons
options the net change is 0 (each row sums to 0).
Create ilr basis matrix (V)
create_v_mat(n_comp)
create_v_mat(n_comp)
n_comp |
the number of compositional variables |
A n_comp
by n_comp - 1
matrix where each column relates to one ilr variable
The ilr basis made so that the numerator (+
values) for the i
th column is in the i
th row.
All values below the +
value in the column are set to -1
(the denominator).
The ilr basis for 3 compositional vars is
(2, -1, -1)/sqrt(6), (0, 1, -1)/sqrt(2)
.
The ilr basis for 4 comp vars is
(3, -1, -1, -1)/sqrt(12), (0, 2, -1, -1)/sqrt(6), (0, 0, 1, -1)/sqrt(2)
.
etc
Extract critical quantities from a lm object (for confidence interval calculations)
extract_lm_quantities(lm_X, alpha = 0.05)
extract_lm_quantities(lm_X, alpha = 0.05)
lm_X |
a lm object |
alpha |
level of significance. Defaults to 0.05. |
A list containing the lm
model matrix (dmX
),
the inverse of t(dmX) x dmX
(XtX_inv
),
the standard error (s_e
),
the estimated single column beta matrix (beta_hat
), and
the critical value of the relevant degrees of freedom t-dist (crit_val
).
A dataset containing z_bmi (outcome), time-use compositions (sl,sb,lpa,mvpa), and covariates from the Fairclough (2017) paper. The data can be found in supp file 7 of the paper at https://link.springer.com/article/10.1186/s12966-017-0521-z.
data(fairclough)
data(fairclough)
A data frame with 169 rows and 21 variables
The variables in the data are as follows:
child_id
school
sex
decimal_age
imd_decile
height mass
bmi
z_bmi
itof_grade
waist_circ
whtr
shuttles_20m
wear_time
sed
lpa
mpa
vpa
mvpa
sleep
min_in_day
Fairclough, Stuart J. and Dumuid, Dorothea and Taylor, Sarah and Curry, Whitney and McGrane, Bronagh and Stratton, Gareth and Maher, Carol and Olds, Timothy. Fitness, fatness and the reallocation of time between children’s daily movement behaviours: an analysis of compositional data. International Journal of Behavioral Nutrition and Physical Activity, 2017. 14(1): 64.
A dataset containing fat percentage (outcome), time-use compositions (sl,sb,lpa,mvpa), and covariates (sibs,parents,ed). Note sl+sb+lpa+mvpa=1440 minutes for each subject. The variables are as follows:
data(fat_data)
data(fat_data)
A data frame with 100 rows and 8 variables
fat. child fat percentage (11.29–29.99)
sl. daily sleep in minutes (283–765)
sb. sedentary behaviour in minutes (354–789)
lpa. low-intensity physical activity in minutes (157–507)
mvpa. moderate- to vigorous-intensity physical activity in minutes (35–155)
sibs. number of siblings (0,1,2,3,4)
parents. number of parents/caregivers at home (1,2)
ed. education level of parent(s) (0=high school, 1=diploma, 2=degree)
fit linear model based on input data.frame
fit_lm(y_str, X, verbose = TRUE)
fit_lm(y_str, X, verbose = TRUE)
y_str |
a string representation of the column in |
X |
a data.frame or matrix that contains the predictor and outcome variables |
verbose |
if |
A lm
object where the y_str
column has been regressed against the remaining
columns of X
(with an intercept term as well).
pred_delta_comps()
?Is object that is returned from pred_delta_comps()
?
is_deltacomp_obj(x)
is_deltacomp_obj(x)
x |
object to be tested |
Boolean TRUE or FALSE
lm()
?Is object that is returned from lm()
?
is_lm_mod(x)
is_lm_mod(x)
x |
object to be tested |
Boolean TRUE or FALSE
Catch NULL, empty and objects containing NAs
is_null_or_na(x)
is_null_or_na(x)
x |
object to be tested |
Boolean. If object is NULL, empty or contains NA then TRUE returned. FALSE otherwise.
Plot redistributed time-use predictions from compositional ilr multiple linear regression model fit by predict_delta_comps()
plot_delta_comp(dc_obj, comp_total = NULL, units_lab = NULL)
plot_delta_comp(dc_obj, comp_total = NULL, units_lab = NULL)
dc_obj |
A |
comp_total |
A numeric scalar that is the original units of the composition to make the x-axis the original scale instead of in the range |
units_lab |
Character string of the units of the compositions relating to |
Returns a plot object from the ggplot2
package (that is, class of gg
and ggplot
).
Ty Stanford <[email protected]>
data(fairclough) deltacomp_df <- predict_delta_comps( dataf = fairclough, y = "z_bmi", comps = c("sleep","sed","lpa","mvpa"), covars = c("decimal_age","sex"), deltas = seq(-20, 20, by = 5) / (24 * 60), comparisons = "prop-realloc", alpha = 0.05 ) class(deltacomp_df) plot_delta_comp( dc_obj = deltacomp_df, comp_total = 24 * 60, units_lab = "min" ) deltacomp_df <- predict_delta_comps( dataf = fairclough, y = "z_bmi", comps = c("sleep","sed","lpa","mvpa"), covars = c("decimal_age","sex"), deltas = seq(-20, 20, by = 5) / (24 * 60), comparisons = "one-v-one", alpha = 0.05 ) plot_delta_comp( dc_obj = deltacomp_df, comp_total = 24 * 60, units_lab = "min" )
data(fairclough) deltacomp_df <- predict_delta_comps( dataf = fairclough, y = "z_bmi", comps = c("sleep","sed","lpa","mvpa"), covars = c("decimal_age","sex"), deltas = seq(-20, 20, by = 5) / (24 * 60), comparisons = "prop-realloc", alpha = 0.05 ) class(deltacomp_df) plot_delta_comp( dc_obj = deltacomp_df, comp_total = 24 * 60, units_lab = "min" ) deltacomp_df <- predict_delta_comps( dataf = fairclough, y = "z_bmi", comps = c("sleep","sed","lpa","mvpa"), covars = c("decimal_age","sex"), deltas = seq(-20, 20, by = 5) / (24 * 60), comparisons = "one-v-one", alpha = 0.05 ) plot_delta_comp( dc_obj = deltacomp_df, comp_total = 24 * 60, units_lab = "min" )
Provided the data (containing outcome, compositional components and covariates), fit a ilr multiple linear regression model and provide predictions from reallocating compositional values pairwise amunsnst the components model.
predict_delta_comps( dataf, y, comps, covars = NULL, deltas = c(0, 10, 20)/(24 * 60), comparisons = c("prop-realloc", "one-v-one")[1], alpha = 0.05 )
predict_delta_comps( dataf, y, comps, covars = NULL, deltas = c(0, 10, 20)/(24 * 60), comparisons = c("prop-realloc", "one-v-one")[1], alpha = 0.05 )
dataf |
A |
y |
Name (as string/character vector of length 1) of outcome variable in |
comps |
Character vector of names of compositions in |
covars |
Optional. Character vector of covariates names (non-comp variables) in |
deltas |
A vector of time-component changes (as proportions of compositions , i.e., values between -1 and 1). Optional.
Changes in compositions to be computed pairwise. Defaults to 0, 10 and 20 minutes as a proportion of the 1440 minutes
in a day (i.e., approximately |
comparisons |
Currently two choices: |
alpha |
Optional. Level of significance. Defaults to 0.05. |
Values in the comps
columns must be strictly greater than zero. These compositional values are NOT assumed to be constrained to (0, 1)
values as the function normalises the compositions row-wise to sum to 1 in part of it's processing of the dataset before analysis.
Please see the deltacomp
package README.md
file for examples and explanation of the comparisons = "prop-realloc"
and comparisons = "one-v-one"
options.
Messages are printed to the console as the function tests the inputs, produces the isometric log ratios (ilrs), fits the linear model and produces the redistributed time-use predictions (with confidence intervals).
Returns a data.frame
of the time-use redistribution predictions (and 95% confidence intervals) with the following columns:
comp+
: the compositional variable with the addition of the delta
value
comp-
: the compositional variable with the subtraction of the delta
value
delta
: the time-use redistribution value
alpha
: significance level for the 100(1-alpha)% confidence interval
delta_pred
: the predicted mean change in the outcome variable
ci_lo
: the lower limit of 100(1-alpha)% confidence interval corresponding to delta_pred
ci_up
: the upper limit of 100(1-alpha)% confidence interval corresponding to delta_pred
sig
: "*"
if the delta_pred
is significantly different from 0 at the alpha
level (empty string otherwise)
The data.frame has a class of deltacomp_obj
which denotes there are additional attributes of the returned object accessible using attr(*, "attribute_name")
.
The possible values for "attribute_name"
are:
dataf
: a data.frame of the predictors (covariates and ilrs)
y
: a vector of the outcome variable
comps
: a character vector of the time-use composition names
lm
: the lm
object of the multiple linear regression fit (using y
and dataf
from above)
deltas
: the redistributed time-use values used in the predictions
comparisons
: "one-v-one"
or "prop-realloc"
provided as the comparisons
argument
alpha
: significance level for the 100(1-alpha)% confidence intervals
ilr_basis
: the ilr change of basis matrix V
mean_pred
: a single row data.frame with the predicted mean outcome (fit
column) value from the "average" set of predictors
Ty Stanford <[email protected]>
predict_delta_comps( dataf = fat_data, y = "fat", comps = c("sl", "sb", "lpa", "mvpa"), covars = c("sibs", "parents", "ed"), deltas = seq(-60, 60, by = 5) / (24 * 60), comparisons = "one-v-one", alpha = 0.05 ) delta_comp_out <- predict_delta_comps( dataf = fat_data, y = "fat", comps = c("sl", "sb", "lpa", "mvpa"), covars = NULL, deltas = seq(-60, 60, by = 5) / (24 * 60), comparisons = "prop-realloc", alpha = 0.05 ) # get the mean prediction from the returned object attr(delta_comp_out, "mean_pred")
predict_delta_comps( dataf = fat_data, y = "fat", comps = c("sl", "sb", "lpa", "mvpa"), covars = c("sibs", "parents", "ed"), deltas = seq(-60, 60, by = 5) / (24 * 60), comparisons = "one-v-one", alpha = 0.05 ) delta_comp_out <- predict_delta_comps( dataf = fat_data, y = "fat", comps = c("sl", "sb", "lpa", "mvpa"), covars = NULL, deltas = seq(-60, 60, by = 5) / (24 * 60), comparisons = "prop-realloc", alpha = 0.05 ) # get the mean prediction from the returned object attr(delta_comp_out, "mean_pred")
Print the ilr transformation of provided composition parts to console
print_ilr_trans(comps)
print_ilr_trans(comps)
comps |
a character vector of compositional parts |
a character vector of representing the ilr transformation of the comps
is returned invisibly as the function's purpose is simply to
print to the R console