| Title: | Log-Linear Modelling of Triad Genotype Data |
|---|---|
| Description: | Triad Log-Linear modelling of Imprinting Environmental interactions, and Maternal effects (TriLLIEM). This is an implementation of the log-linear model described in a series of papers, see for example Ainsworth et al. (2010) <doi:10.1002/gepi.20547>. |
| Authors: | Kevin Heda Zhao [aut, cre], Kelly Burkett [aut] |
| Maintainer: | Kevin Heda Zhao <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-05-10 08:55:19 UTC |
| Source: | https://github.com/kevinhzhao/trilliem |
This method is modelled after the anova.glm method. It produces an analysis of deviance table for multiple nested models.
## S3 method for class 'TriLLIEM' anova(object, ...)## S3 method for class 'TriLLIEM' anova(object, ...)
object |
An object of class |
... |
Additional |
Like anova.glm, each model's residual degrees of freedom and deviances are given, alongside their respective differences between the models. Models should be nested for these results to be statistically interpretable. The last column shows the p-value from chi-squared tests comparing the difference in deviance for each model.
TriLLIEM objects modelling any sort of imprinting effect must use this
function, as the EM algorithm used in TriLLIEM causes the anova.glm
function to treat the estimated values as having been truly observed,
modifying the degrees of freedom.
An object of class "anova.TriLLIEM" inheriting from class
"anova".
model1 <- TriLLIEM(dat = example_dat4R, effects = c("C")) model2 <- TriLLIEM(dat = example_dat4R, effects = c("C", "M")) anova(model1, model2)model1 <- TriLLIEM(dat = example_dat4R, effects = c("C")) model2 <- TriLLIEM(dat = example_dat4R, effects = c("C", "M")) anova(model1, model2)
Example data formatted to be used in the TriLLIEM function.
example_dat4Rexample_dat4R
A dataframe with columns for:
index for the category corresponding to maternal (M), paternal (F), and child (C) genotypes
mating type category for maternal (M) and paternal (F) genotypes under a mating symmetry model
mating type category for maternal (M) and paternal (F) genotypes under a model that does not assume mating symmetry
mother genotype
father genotype
child genotype
binary variable for if environmental effects are present (1) or not present (0)
disease present (1)/not present (0)
counts for each category of maternal (M), paternal (F), and child (C) genotypes
TriLLIEM.sim objectsView underlying simulation details for TriLLIEM.sim objects
fullview(sim)fullview(sim)
sim |
an object of class |
A data frame with the same counts in each category as sim, but
with underlying simulation probabilities given as columns:
typeOrig"Index for each genetically distinct row,
based on "M", "F", "C", "matOrg", and "patOrg".
Ranges from 1 to 64, unlike the type column in sim which ranges from 1
to 60, as the inclusion of "matOrg" and "patOrg" lead to potentially
four additional rows.
matOrg" and "patOrg"Binary variables for when the minor allele
is maternally/paternally inherited, so that counts when "M", "F", "C"
are all 1 are not ambiguous when determining parent-of-origin effects.
prMF"Probability of the mother ("M") and father ("F") pairing
in the simulated population, based on the minor allele frequency and mating
type coefficients given during simulation.
prCGivenMFOrg"Probability of the child's genotype ("C") and the
allele parent of origin ("matOrg" and "patOrg") conditional on the
genotypes of the mother and father.
prMFCOrg"Probability of the triad based on genotypes and parent
of origin, product of "prMF" and "prCGivenMFOrg".
PrMFCOrg"Probability of the triad based on genotypes, parent of
origin, and environmental exposure ("E") conditional on disease status
("D"), obtained by scaling "prMFCOrg" by the risk of disease (equal to
"prMFCOrg" when "D" is 0).
pop"Sub-population each row is simulated from (all 1 if
simulated without population stratification)
The "count" column in "sim" is the sum of the rows of the "count"
column in the returned data frame when grouped by "M", "F", "C", "E",
"D".
## View the underlying distributions behind some of the example models given in simulateData() dat1 <- simulateData(S = c(1, 2, 4), If = 3) fullview(dat1) dat2 <- simulateData( nControl = 1000, propE = c(0.1, 0.4), propE.control = c(0.2, 0.2), nPop = 2, maf = c(0.3, 0.4), prev.byPop = c(0.2, 0.3), prop.byPop = c(0.6, 0.4) ) fullview(dat2)## View the underlying distributions behind some of the example models given in simulateData() dat1 <- simulateData(S = c(1, 2, 4), If = 3) fullview(dat1) dat2 <- simulateData( nControl = 1000, propE = c(0.1, 0.4), propE.control = c(0.2, 0.2), nPop = 2, maf = c(0.3, 0.4), prev.byPop = c(0.2, 0.3), prop.byPop = c(0.6, 0.4) ) fullview(dat2)
summary.TriLLIEM objectsPrint method for summary.TriLLIEM objects
## S3 method for class 'summary.TriLLIEM' print( x, digits = max(3L, getOption("digits") - 3L), signif.stars = getOption("show.signif.stars"), ... )## S3 method for class 'summary.TriLLIEM' print( x, digits = max(3L, getOption("digits") - 3L), signif.stars = getOption("show.signif.stars"), ... )
x |
an object of class " |
digits |
the number of significant digits to use when printing. |
signif.stars |
logical. If |
... |
additional arguments passed to printCoefmat. |
Prints summary of the TriLLIEM model, displaying the original call
to the function, the matrix of coefficients, the AIC, and the number of Fisher
Scoring and EM iterations.
res <- TriLLIEM(mtmodel = "HWE", effects = c("C", "M", "Im"), dat = example_dat4R) print(summary(res))res <- TriLLIEM(mtmodel = "HWE", effects = c("C", "M", "Im"), dat = example_dat4R) print(summary(res))
TriLLIEM objectsPrint method for TriLLIEM objects
## S3 method for class 'TriLLIEM' print(x, digits = max(3L, getOption("digits") - 3L), ...)## S3 method for class 'TriLLIEM' print(x, digits = max(3L, getOption("digits") - 3L), ...)
x |
an object of class |
digits |
the number of significant digits to use when printing. |
... |
arguments passed to or from other methods. |
Prints details of the TriLLIEM model, with nuisance parameters
(e.g., mating type parameters) omitted. To view all fitted parameters,
run coef(x).
res <- TriLLIEM(mtmodel = "HWE", effects = c("C", "M", "Im"), dat = example_dat4R) print(res)res <- TriLLIEM(mtmodel = "HWE", effects = c("C", "M", "Im"), dat = example_dat4R) print(res)
Function to simulate maternal, paternal, and child genotype counts under different genetic effect models.
simulateData( nCases = 1000, nControl = 0, R = c(1, 1, 1), S = c(1, 1, 1), V = c(1, 1, 1), mtCoef = c(1, 1, 1), Im = 1, If = 1, propE = 0, propE.control = propE, Einteraction = "M", nPop = 1, maf = 0.3, prev.byPop = NULL, prop.byPop = NULL, Fst = 0.005 )simulateData( nCases = 1000, nControl = 0, R = c(1, 1, 1), S = c(1, 1, 1), V = c(1, 1, 1), mtCoef = c(1, 1, 1), Im = 1, If = 1, propE = 0, propE.control = propE, Einteraction = "M", nPop = 1, maf = 0.3, prev.byPop = NULL, prop.byPop = NULL, Fst = 0.005 )
nCases |
number of case trios to simulate |
nControl |
number of control trios |
R |
vector of 3 elements representing child effects for 0, 1, and 2 copies of the risk allele, respectively |
S |
vector of 3 elements representing maternal effects for 0, 1, and 2 copies of the risk allele, respectively |
V |
vector of 3 elements representing gene-environment effects for 0, 1, and 2 copies of the risk allele, respectively |
mtCoef |
vector of 3 elements representing the mating type coefficients, see details for more information |
Im |
maternal imprinting effect |
If |
paternal imprinting effect |
propE |
scalar or vector of proportion of case trios in the
environmental exposure group in a single or multiple populations (when
|
propE.control |
scalar or vector of proportion of control trios in the
environmental exposure group in a single or multiple populations (when
|
Einteraction |
string indicating what variable environmental effects
interact with (one of " |
nPop |
number of populations to simulate for population stratification, see details for information on specifying other parameters when there are multiple strata |
maf |
scalar or vector of the minor allele frequency proportion
in a single or multiple populations (when |
prev.byPop |
prevalence of cases in each sub population (vector with
length equal to |
prop.byPop |
proportion of each sub population, must sum to 1 (vector
with length equal to |
Fst |
F parameter for estimation of |
To simulate the counts, first the total number of case trios
(nCases) and control trios (nControl) are partitioned into the nPop
sub-populations by randomly sampling from respective multinomial
distributions, with probabilities calculated using prop.byPop and
prev.byPop as follows:
where is the -th sub-population, is the
-th element of prev.byPop, -th element
of propE (when ) or propE.control (when ), and
is the -th element of prop.byPop.
If only a single aggregated maf value is provided when there are multiple
sub-populations, maf may be estimated using the model from
Balding and Nichols (1995). This requires an parameter to be provided,
with estimates of maf being obtained by sampling from a beta distribution
with shape parameters
where MAF is the provided maf.
If nPop = 1, this step is skipped as everyone will be in the
same sub-population. Note that this method of simulating population
stratification assumes and are conditionally independent
(i.e., V = c(1,1,1)).
Distributions of counts in each partition based on the different genotype,
exposure, and disease status categories are then independently sampled
based on the risk of disease (from the given R, S, V, Im, and If
values) and mating type probabilities (from mtCoef).
R, S, and V are vectors of 3 elements, where the first element should
be 1, acting as a baseline for when , , or are
0 respectively. The following two numbers represent the multiplicative
increase to risk when the respective genotype is 1 or 2. Im and If are
the imprinting parameters, representing the multiplicative increase to risk
when the minor allele is maternally or paternally inherited.
Mating type probabilities are simulated using the mating asymmetry
parameterization of Bourgey et al. (2011).
Here, mtCoef is a vector of 3 elements representing .
We begin by assigning each
assuming the population is in Hardy-Weinberg equilibrium,
with probabilities obtained from the given maf. Letting
we reassign these particular mating pairs by setting
To view the data frame with underlying simulation probabilities, counts for either parent of origin case when mother, father, and child are all, heterozygous, and by sub-population when simulating population stratification, use fullview.
An object of class TriLLIEM.sim, a data frame with columns for:
type"Index for the category corresponding to maternal ("M"),
paternal ("F"), and child ("C") genotypes.
mt_MS"Mating type category for maternal and paternal genotypes under a mating symmetry model.
mt_MaS"Mating type category for maternal and paternal genotypes under a model that does not assume mating symmetry.
M"Maternal genotype.
F"Paternal genotype.
C"Child genotype.
E"Binary variable for if environmental effects are present (1) or not present (0).
D"Case child (1)/Control child (0).
count"Number of triads falling under the specified category based
on "M", "F", "C", "E", "D".
Balding DJ, Nichols RA (1995).
“A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity.”
Genetica, 96(1–2), 3–12.
ISSN 1573-6857.
doi:10.1007/bf01441146.
http://dx.doi.org/10.1007/BF01441146.
Bourgey M, Healy J, Saint-Onge P, Massé H, Sinnett D, Roy-Gagnon M (2011).
“Genome-wide detection and characterization of mating asymmetry in human populations.”
Genetic Epidemiology, 35(6), 526-535.
doi:10.1002/gepi.20602.
https://onlinelibrary.wiley.com/doi/pdf/10.1002/gepi.20602, https://onlinelibrary.wiley.com/doi/abs/10.1002/gepi.20602.
## Maternal effect of 2, and paternal imprinting of 3. simulateData(S = c(1, 2, 4), If = 3) ## Paternal imprinting by environment interaction of 1.5. simulateData(V = c(1, 1.5, 1.5^2), propE = 0.3, Einteraction = "If") ## Maternal gene environment interaction of 1.6 with controls simulateData(nControl = 1000, V = c(1, 1.6, 1.6^2), propE = 0.3, Einteraction = "M") ## Null model with 3 different sub-populations simulateData( nPop = 3, maf = c(0.1, 0.2, 0.3), prev.byPop = c(0.2, 0.1, 0.4), prop.byPop = c(0.3, 0.3, 0.4) ) ## Null model with 2 different sub-populations, environmental exposures, and controls simulateData( nControl = 1000, propE = c(0.1, 0.4), propE.control = c(0.2, 0.2), nPop = 2, maf = c(0.3, 0.4), prev.byPop = c(0.2, 0.3), prop.byPop = c(0.6, 0.4) )## Maternal effect of 2, and paternal imprinting of 3. simulateData(S = c(1, 2, 4), If = 3) ## Paternal imprinting by environment interaction of 1.5. simulateData(V = c(1, 1.5, 1.5^2), propE = 0.3, Einteraction = "If") ## Maternal gene environment interaction of 1.6 with controls simulateData(nControl = 1000, V = c(1, 1.6, 1.6^2), propE = 0.3, Einteraction = "M") ## Null model with 3 different sub-populations simulateData( nPop = 3, maf = c(0.1, 0.2, 0.3), prev.byPop = c(0.2, 0.1, 0.4), prop.byPop = c(0.3, 0.3, 0.4) ) ## Null model with 2 different sub-populations, environmental exposures, and controls simulateData( nControl = 1000, propE = c(0.1, 0.4), propE.control = c(0.2, 0.2), nPop = 2, maf = c(0.3, 0.4), prev.byPop = c(0.2, 0.3), prop.byPop = c(0.6, 0.4) )
Summary function for TriLLIEM functions
## S3 method for class 'TriLLIEM' summary(object, ...)## S3 method for class 'TriLLIEM' summary(object, ...)
object |
an object of class " |
... |
arguments passed to or from other methods. |
Due to TriLLIEM using the EM algorithm when fitting imprinting effects,
calculation of residuals is modified compared to in summary.glm, where
the original data (object$y_initial) is used instead of object$y to
ensure the correct residuals are computed. See TriLLIEM for more details.
A list with the same components as those returned by summary.glm, but with the addition of the following:
the same component from object, always included for "TriLLIEM" objects.
the same component from object, always included for "TriLLIEM" objects
the same component from object, always included for "TriLLIEM" objects
res <- TriLLIEM(mtmodel = "HWE", effects = c("C", "M", "Im"), dat = example_dat4R) summary(res)res <- TriLLIEM(mtmodel = "HWE", effects = c("C", "M", "Im"), dat = example_dat4R) summary(res)
This function is used to fit the user-specified log-linear model to trio count data.
TriLLIEM( mtmodel = "MS", effects = c("C", "M"), dat, includeE = FALSE, Estrat = FALSE, Eanova = FALSE, includeD = FALSE, Minit = 0.5, max.iter = 30, EM.diag = FALSE )TriLLIEM( mtmodel = "MS", effects = c("C", "M"), dat, includeE = FALSE, Estrat = FALSE, Eanova = FALSE, includeD = FALSE, Minit = 0.5, max.iter = 30, EM.diag = FALSE )
mtmodel |
Mating type model to use in the analysis, can be " |
effects |
A vector listing the effects, as strings, to include in the model. Effects can include:
Default is |
dat |
A data frame with triad data, with the formatting of example_dat4R. |
includeE |
A logical value indicating whether to include environment
interaction effects. If set to " |
Estrat |
A logical value indicating whether to use a stratified approach
for environmental interactions. Default is |
Eanova |
A logical value indicating if this is for the sake of running
anova to compare a model with environmental interactions to a model that does
not include environmental interactions. Should be left as " |
includeD |
A logical value indicating whether to use the hybrid
model with controls. If set to " |
Minit |
Initial value for the proportion of triple heterozygote ( |
max.iter |
Maximum number of iterations for the EM algorithm. Default is 30. |
EM.diag |
A logical value indicating whether to show diagnostic messages
for the EM algorithm. Default is |
Fits the specified log-linear model of Weinberg et al. (1998) to
dat using R's glm framework. This includes Weinberg and Umbach (2005)'s
hybrid model allowing for control triads. When imprinting effects ("Im" or
"If") are included, an EM algorithm is run to estimate the counts of
triads which are maternally and paternally
inherited. Normally, if this was done using the glm function, the computed
likelihoods would use these estimated counts, which is incorrect. This function,
alongside its methods like summary.TriLLIEM and anova.TriLLIEM,
performs the EM algorithm while using the original observed counts to correctly
obtain likelihoods.
The model formula supplied to glm is made up of the genetic effects supplied
in effects, alongside several nuisance parameters. These include:
Mating type parameters, dependent on the specified mtmodel,
D, if includeD = TRUE,
E:D, if includeD = TRUE and includeE = TRUE,
Mating type parameter by E interactions, if includeE = TRUE; E, if
includeE = TRUE and mtmodel = "HWE" (both necessary as shown in
Shin et al. (2010)).
These nuisance parameters are omitted when printing the function output,
but may be viewed by using the coef function on the output.
Estrat = TRUE forces every every listed effect in effects to have its
gene environment interaction included in the model, regardless of whether the
user has specified them explicitly or not. This essentially stratifies the
model by E, and is useful for replicating the stratified models necessary
for analyzing gene environment interactions in Haplin
(Gjessing and Lie 2006) and EMIM (Howey and Cordell 2012).
Eanova = TRUE allows the model to be fit when rows are
present but includeE = FALSE, without modifying these exposed rows.
This is only useful for using anova.TriLLIEM
to determine if models with gene-environment interactions are statistically
different from models without gene-environment interactions. Since this option
keeps these categories without using the E parameter though,
the fitted model will use incorrect degrees of freedom in its significance
tests, and hence should not be used for any inferences besides this very
specific case of model comparison.
All TriLLIEM objects are of family poisson_em, a modified version of the
poisson family to account for additional data created by the EM algorithm when
computing residuals and AIC.
An object of class "TriLLIEM", which inherits from class "glm"
and has the same components as the output of glm, with the following
modifications:
df.residualif imprinting effects are specified, subtracted by the number of additional rows introduced by the EM algorithm.
df.nullif imprinting effects are specified, subtracted by the number of additional rows introduced by the EM algorithm.
EM_iternumber of EM algorithm iterations before convergence.
y_initialinitial supplied data frame, dat, before addition of rows
by EM.
grplist of vectors, with each vector containing indices of grouped
rows in the data after the EM algorithm is applied (y). Aggregating these grouped
rows by sum will yield y_initial.
Gjessing HK, Lie RT (2006).
“Case-parent triads: estimating single- and double-dose effects of fetal and maternal disease gene haplotypes.”
Annals of Human Genetics, 70(Pt 3), 382–396.
Howey R, Cordell HJ (2012).
“PREMIM and EMIM: tools for estimation of maternal, imprinting and interaction effects using multinomial modelling.”
BMC Bioinformatics, 13(1).
ISSN 1471-2105.
doi:10.1186/1471-2105-13-149.
http://dx.doi.org/10.1186/1471-2105-13-149.
Shin J, McNeney B, Graham J (2010).
“On the Use of Allelic Transmission Rates for Assessing Gene-by-Environment Interaction in Case-Parent Trios.”
Annals of Human Genetics, 74(5), 439-451.
doi:10.1111/j.1469-1809.2010.00599.x.
https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1469-1809.2010.00599.x, https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-1809.2010.00599.x.
Weinberg CR, Umbach DM (2005).
“A Hybrid Design for Studying Genetic Influences on Risk of Diseases with Onset Early in Life.”
The American Journal of Human Genetics, 77(4), 627–636.
ISSN 0002-9297.
doi:10.1086/496900.
http://dx.doi.org/10.1086/496900.
Weinberg CR, Wilcox AJ, Lie RT (1998).
“A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting.”
American Journal of Human Genetics, 62(4), 969–978.
res1 <- TriLLIEM(mtmodel = "HWE", effects = c("C", "M", "Im"), dat = example_dat4R) summary(res1) dat <- simulateData( nControl = 1000, propE = c(0.1, 0.4), propE.control = c(0.2, 0.2), nPop = 2, maf = c(0.3, 0.4), prev.byPop = c(0.2, 0.3), prop.byPop = c(0.6, 0.4) ) ## Obtain the non-stratified and stratified models and compare them via anova res2 <- TriLLIEM( mtmodel = "HWE", effects = c("C", "M", "Im", "E:Im"), dat = dat, includeE = TRUE, includeD = TRUE ) res3 <- TriLLIEM( mtmodel = "HWE", effects = c("C", "M", "Im", "E:Im"), dat = dat, includeE = TRUE, Estrat = TRUE, includeD = TRUE ) anova(res2, res3) ## Compare non-stratified model to a model without E by setting Eanova = TRUE res4 <- TriLLIEM( mtmodel = "HWE", effects = c("C", "M", "Im"), dat = dat, Eanova = TRUE, includeD = TRUE ) anova(res2, res4)res1 <- TriLLIEM(mtmodel = "HWE", effects = c("C", "M", "Im"), dat = example_dat4R) summary(res1) dat <- simulateData( nControl = 1000, propE = c(0.1, 0.4), propE.control = c(0.2, 0.2), nPop = 2, maf = c(0.3, 0.4), prev.byPop = c(0.2, 0.3), prop.byPop = c(0.6, 0.4) ) ## Obtain the non-stratified and stratified models and compare them via anova res2 <- TriLLIEM( mtmodel = "HWE", effects = c("C", "M", "Im", "E:Im"), dat = dat, includeE = TRUE, includeD = TRUE ) res3 <- TriLLIEM( mtmodel = "HWE", effects = c("C", "M", "Im", "E:Im"), dat = dat, includeE = TRUE, Estrat = TRUE, includeD = TRUE ) anova(res2, res3) ## Compare non-stratified model to a model without E by setting Eanova = TRUE res4 <- TriLLIEM( mtmodel = "HWE", effects = c("C", "M", "Im"), dat = dat, Eanova = TRUE, includeD = TRUE ) anova(res2, res4)