BiMM tree: A decision tree method for modeling clustered and longitudinal binary outcomes (2024)

  • Journal List
  • HHS Author Manuscripts
  • PMC7202553

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

BiMM tree: A decision tree method for modeling clustered andlongitudinal binary outcomes (1)

Link to Publisher's site

Commun Stat Simul Comput. Author manuscript; available in PMC 2021 Jan 1.

Published in final edited form as:

Commun Stat Simul Comput. 2020; 49(4): 1004–1023.

Published online 2018 Sep 12. doi:10.1080/03610918.2018.1490429

PMCID: PMC7202553

NIHMSID: NIHMS1514789

PMID: 32377032

Jaime Lynn Speiser,1 Bethany J. Wolf,2 Dongjun Chung,2 Constantine J. Karvellas,3 David G. Koch,4 and Valerie L. Durkalski2

Author information Copyright and License information PMC Disclaimer

Associated Data

Supplementary Materials

Abstract

Clustered binary outcomes are frequently encountered in clinical research(e.g. longitudinal studies). Generalized linear mixed models (GLMMs) forclustered endpoints have challenges for some scenarios (e.g. data with multi-wayinteractions and nonlinear predictors unknown a priori). Wedevelop an alternative, data-driven method called Binary Mixed Model (BiMM)tree, which combines decision tree and GLMM within a unified framework.Simulation studies show that BiMM tree achieves slightly higher or similaraccuracy compared to standard methods. The method is applied to a real datasetfrom the Acute Liver Failure Study Group.

Keywords: classification and regression tree, longitudinal data, clustered data, mixed effects, decision tree

1. Introduction

Clustered binary outcomes are frequently encountered in clinical research.Correlation within datasets may result from variables representing subject clusters,such as medical centers or family groups. Another type of clustered outcome resultsfrom longitudinal or repeated measures studies, where each patient represents acluster. For example, a longitudinal study may collect repeated measurements ofoutcomes to evaluate disease prognosis (e.g. poor versus good outcome), diagnosis ordisease relapse (e.g. disease versus disease-free), or other endpoints (e.g.re-admitted or not re-admitted to the hospital). Outcomes collected on the samepatient at multiple time points are almost always dependent on one another. Thiswithin-subject correlation should be considered because failing to account forcorrelation may result in a loss of prediction efficiency.

Generalized linear mixed models (GLMMs) are typically employed for modelingclustered and longitudinal outcomes, but suffer limitations for some datasets.Interactions between predictor variables should be selected apriori to be included in GLMM modeling. However, knowledge aboutinteractions between predictors is often lacking in practice, especially in complexclinical settings considering many personal, familial, and environmental factors.The GLMM framework also requires users to specify if there is a nonlinearrelationship between predictors and outcome through the link function. Thoughspecification of nonlinear relationships and interaction terms is not impossible, itoften presents a challenge in the GLMM framework since there is not a universalmethod for making these decisions about modeling.

In this paper, we propose an alternative method that may provide greaterflexibility for complex datasets called Binary Mixed Model (BiMM) tree, whichcombines decision tree methodology with mixed models. Decision tree methodology canbe implemented to develop prediction models without the assumption of a linearrelationship between predictor variables and outcome. Interactions between predictorvariables are also naturally modeled within the decision tree framework withoutprior knowledge. In the BiMM tree method, we incorporate results from decision treeswithin mixed models to adjust for clustered and longitudinal outcomes. A Bayesianimplementation of GLMM is used to avoid issues with convergence and quasi- orcompletely separated datasets with binary outcomes.

A specific motivating example dataset for the novel methodology in this paperis a longitudinal registry dataset of acute liver failure (ALF) patients(clinicaltrials.gov ID: NCT00518440). ALF is a rare and devastating conditioncharacterized by rapid onset of severe liver damage, encephalopathy (altered mentalstatus) and coagulopathy (impaired blood clotting), with approximately 25% ofpatients requiring a liver transplant and approximately 30% of patients dying duringthe acute phase (Lee et al. 2008).Complexities of the ALF registry data, including skewed distributions of predictorswith many extreme values, nonlinear predictors of outcome, and a multitude ofpossible interactions among predictors, make it difficult to employ GLMMs forpredicting outcomes.

The paper is structured as follows. In Section2, we present background information about decision tree modeling ingeneral and tree models for longitudinal and clustered continuous outcomes. In Section 3, we introduce the BiMM tree method forpredicting longitudinal and clustered binary outcomes and in Section 4 we describe the motivating ALF registry indetail. We compare the BiMM tree method performance to several other methods with asimulation study in Sections 5 and 6. An application to data from the ALF study isthen presented as an example for the BiMM tree method in Section 7. Finally, in Section 8 we discuss implications of our study, limitations, and avenuesfor further research.

2. Background

A decision tree framework is utilized for the novel BiMM tree method becauseit offers several potential advantages compared to traditional models such as GLMMs.There are many different decision tree methods available, and we implement our BiMMtree method with the classification and regression tree (CART) framework, a commonlyused methodology developed by Breiman (Breiman et al.1984). CART does not require specification of non-linear relationships orinteraction terms, and offers simple and intuitive interpretation of predictorvariables. Moreover, CART provides an alternative method for developing predictionmodels when traditional models are not feasible (e.g. if the number of predictorvariables is greater than the number of observations). For these reasons, CART cansometimes better predict outcomes compared to other procedures such as discriminantanalysis and logistic regression for data captured at a single time point (Hastie, Tibshirani, and Friedman 2001).

In spite of this flexibility, few decision tree methods exist for modelingclustered categorical endpoints. The R package party can be used toimplement CART models if two predictor variables are correlated, but it does notadjust for longitudinal and clustered measurements of the same outcome variable(Hothorn, Hornik, and Zeileis). There aresome techniques which circumvent the issue of adjusting for longitudinal andclustered outcomes, such as summarizing variables (e.g. using averages or mostfrequent categorical values) or using data from only a single time point (e.g.admission values); however, these methods have a marked loss of information sinceavailable data is summarized or partially used.

Several methods have been proposed to modify CART models for longitudinaland clustered continuous outcomes (Abdolell et al.2002, De’Ath 2002, Dine, Larocque, and Bellavance 2009, Hajjem, Bellavance, and Larocque 2011, Keon Lee 2005, Larocque 2010, Loh and Zheng2013, Segal 1992, Sela and Simonoff 2012, Yu and Lambert 1999). Hajjem(2011) and Sela (2012) developsimilar methods for implementing CART models for longitudinal and clustered datawith continuous outcomes. These methods incorporate mixed effects within the treeframework to account for the clustered structure within the data, using an algorithmanalogous to expectation-maximization described by Wu and Zhang (2006). The main idea in the Sela RE-EM tree (2012) andHajjem mixed effects regression tree (2011) algorithms is to dissociate the fixedand cluster-level components within the modeling framework. First, a CART with allpredictors as fixed effects is fitted with the assumption that the random effectsfor the clusters are known. Next, a linear mixed model is fitted using the estimatedfixed effects from the CART and the random cluster effects are estimated whichaccount for correlation induced by clustered variables with the assumption that thefixed effects are known. Finally, the continuous outcome is updated based on thelinear mixed model using an additive effect in which the estimated random clustereffect is added to the original continuous outcome. The algorithm continues toiterate between CART, linear mixed models, and updating the outcome in a frameworksimilar to the expectation-maximization algorithm (Wu and Zhang 2006). The algorithms continue iteratively untilconvergence is satisfied, which is based on the change in the likelihood from themixed model being less than a specified value.

While the framework for clustered CART modeling has been developed forcontinuous outcomes, adjusting the algorithm for clustered categorical outcomes isnon-trivial. For continuous endpoints, the outcomes are updated based on randomeffects from the linear mixed model using an additive effect. For categoricaloutcomes, the optimal method for adjusting outcomes is unclear because a randomeffect cannot simply be added.

3. BiMM Tree Method

The BiMM tree method iterates between developing CART models using allpredictors and then using information from the CART model within a Bayesian GLMM toadjust for the clustered structure of the outcome. Consistent with the continuousmethods for clustered decision trees, we implement an algorithm similar to theexpectation-maximization algorithm, in which the fixed (decision tree) effects aredissociated from the random (cluster-level) effects. The BiMM tree method may beconsidered as an extension of GLMMs where the fixed covariates are not assumed to belinearly associated with the link function of the outcome and interactions do notneed to be pre-specified. The traditional GLMM for binary outcomes has the form

logit(yit)=Xitβ+Zitbit,

where yit is the binaryoutcome for cluster i = 1,…,M forlongitudinal measurementst=1,…,Ti, logit() is thelogistic link function,Xitis a matrix of fixed covariates for cluster i for longitudinalmeasurement t, β is a vectorof fitted coefficients for the intercept and fixed covariates,Zit is the clustered covariatefor cluster i for longitudinal measurement t, andbit is the fitted random effect forcluster i for longitudinal measurement t. Notethat GLMMs may be fitted when the cluster sizes differ (e.g. if there are differentnumbers of longitudinal measurements for each cluster).

The GLMM portion of the BiMM method has the form

CART(Xit)is a row vector represented within the GLMM as indicator variables reflectingmembership of each longitudinal observation t for clusteri in terminal nodes in numerical order from the CART model.Terminal nodes are at the bottom of CART models and provide an outcome predictionfor each subject’s observation. Figure 1provides an example CART model with terminal Nodes 1, 3, 5 and 6. Thus, the terminalnodes of CART provide a method for determining similar groups of observations whichmay be included within the Bayesian GLMM portion of the BiMM method. In thisexample,CART(Xit)would contain indicator variables for being in Node 1, Node 3, and Node 5, in thatorder. It is not necessary to include the indicator variable for the last terminalnode, Node 6, because this would be redundant information within the GLMM. This isconsistent with traditional models, where one includes one less indicator variablesthan the number of categories in the regression framework. Using Figure 1 as an example,CART(Xit)= (0 0 1) for an observation t for cluster icontained within Node 5. Thus, for this example,

logit(yit)=(1001)β+Zitbit.

Open in a separate window

Figure 1:

The decision tree data generating process for the simulation study isdepicted in Figure 1. There are three variables: INR, creatinine, and ventilatoruse (yes/no). The tree contains four terminal nodes: 1, 3, and 5 represent goodoutcomes and 6 represents poor outcomes.

This simplifies to

logityit=β0+β3+Zitbit.

Implementation of GLMMs is more challenging compared to standard linearmixed models employed for continuous outcomes. A consideration within thegeneralized model setting for categorical outcomes is that an iterative procedure(e.g. iterative reweighted least squares or Newton Raphson) must be used to computerandom effects of clustered variables for GLMMs. GLMMs can have computational issueswith model convergence or with inversion of large matrices, particularly for largedatasets, which makes GLMM fitting challenging (Bates2009). Also, if data are quasi-separated or completely separated, meaningthat one or a combination of variables perfectly predicts the outcome, traditionalimplementations of GLMMs cannot be used (Gelman etal. 2008, Zorn 2005).

To address these challenges, we propose an algorithm that integrates CARTand a Bayesian implementation of GLMM. There are several benefits to employing aBayesian implementation of the GLMM instead of the traditional GLMM in ouralgorithm. First, Bayesian computation of GLMMs produce similar parameter estimatesto that of frequentist GLMMs when uninformative prior distributions are used;however, weakly informative prior distributions can be used as a solution toseparated or quasi-separated datasets (Gelman et al.2008). Therefore, Bayesian implementation of the GLMM in the BiMM treemethod offers more flexibility compared to frequentist GLMMs. Second, there areefficient methods for applying Bayesian GLMMs (e.g. integrated nested Laplaceapproximation implemented in the R package INLA (Fong, Rue, and Wakefield 2010) and maximum a posterioriestimation implemented in the R package blme (Dorie 2013, Dorie2014)) easily applied on open source software which offer similarcomputation time to frequentist GLMMs. Finally, employing the Bayesian GLMM avoidsconvergence issues with traditional GLMMs (e.g. using the R packagelme4 (Bates 2009, Bates et al. 2015)).

The Bayesian GLMM within the BiMM tree method considers uninformative priorsfor the fixed effects and random effect covariance parameters using Normal andWishart distributions respectively. An unstructured covariance matrix is employedwithin the Bayesian GLMM. After the random effects for subjects are fitted with theBayesian GLMM, the original outcome variable is updated using results from theBayesian GLMM, which we define as the target outcome variable. A split functionwhich divides the observations into two groups is used to create a binary targetoutcome variable for each iteration since a simple additive effect does not resultin a binary measure.

Specifically, the BiMM tree algorithm is as follows:

  1. Initialize the CART and Bayesian GLMM:

    1. Fit a CART usingyit as theoutcome for fixed predictors(Xit)and develop J-1 indicator variables for thej = 1, …, Jterminal nodes of clusters i =1,…,M for longitudinal measurementst=1,…,Ti:

      Iyitnodej=1ifyitisinterminalnodej0otherwise

      DefineCART(Xit)as the row vector of the J-1 indicatorvariables for cluster i at longitudinal measuret.

    2. Fit a Bayesian GLMM usingyit as theoutcome, includingCART(Xit)and clustered variable(Zit) toobtain fitted values for the random effect(bit):

      logit(yit)=(1CARTXit)β+Zitbit.

    3. Extract the predicted probabilities from Bayesian GLMM(denotedprBGLMM(Xit,Zit)) for eachmeasurement t within clusteri:

      qit=prBGLMMXit,Zit

  2. Iterate through the following steps until convergence is satisfied:

    1. Determine the target outcome (yit*) by adding the predictedprobability (qit)from the original outcome(yit) andapplying a split function h() to make yit* a binary value:

      yit*=h(yit+qit)

    2. Repeat steps 1a-c using yit* as the outcome until the changein the posterior log likelihood from the Bayesian GLMM is lessthan a specified tolerance value

Predictions for observations included within the model development datasetare made using the CART (population-level) and random (observation-level)components. For observations not included within the model development dataset,predictions are made using the CART (population-level) component only.

There are several different split functions (denotedh(yit +qit)) which may be used tocreate the new iteration of the binary target outcome (yit*). We use a function ofyit +qit to update the targetoutcome to account for both the original outcome and the average predictedprobability from the CART and Bayesian GLMM models for the specific observationt within the cluster i. Before introducing thesplit functions, it is necessary to understand the distribution ofyit +qit. Sinceyit is a binary value, it iseither 0 or 1, and qit is a probabilitywhich is between 0 and 1. Therefore, value ofyit +qit is between 0 and 2. Wepresent three options for the split function which may be employed based on theoverall goal of the prediction model. The first split function maximizes modelsensitivity, the second split function maximizes model specificity, and the thirdsplit function equally weights sensitivity and specificity for updating the targetoutcome vector. Now, the split function which maximizes sensitivity uses a threshold(0 < k1 < 1) to update the target outcome:

h1yit+qit=1ifyit+qit>k10otherwise

Thus, using h1(yit +qit), binary outcomes of 0 can beupdated to be 1, but outcomes of 1 cannot be updated to be 0. This provides amechanism for maximizing the sensitivity. Similarly, a split function whichmaximizes specificity may be employed using a threshold (1 <k2 < 2) to update the target outcome:

h2yit+qit=0ifyit+qit<k21otherwise

Using h2(yit +qit), binary outcomes of 1 can beupdated to be 0, but outcomes of 0 cannot be updated to be 1. This provides amechanism for maximizing the specificity. A final, more general, split functionwhich does not favor sensitivity or specificity updates the target outcome using thefollowing method:

h3yit+qit=0ifyit+qit<0.51ifyit+qit>1.51withprobabilityqit0otherwise

Using h3(yit +qit), if the prediction from thecurrent iteration of the BiMM method agrees with the original binary outcome (i.e.if yit +qit < 0.5 or ifyit +qit > 1.5) then thetarget outcome is the same as the original binary outcome. Otherwise, the targetoutcome is updated to be 1 with probabilityqit , and 0 with probability 1− qit. Therefore, originalvalues of 0 can be updated to 1 and original values of 1 can be updated to 0.

An example of the four possible scenarios of an iteration within the BiMMmethod is depicted within Table 1, withk1 = 0.5 and k2 = 1.5for observation t within cluster i. Using thesplit function h1(yit +qit), the original binary outcome(yit) changes from a 0 to a 1 inScenario B, which will increase the sensitivity since the next iteration of the BiMMmethod will contain more values of 1 within the target outcome. Likewise, using thesplit function h2(yit +qit), the original binary outcome(yit) changes from a 1 to a 0 inScenario C, which will increase the specificity since the next iteration of the BiMMmethod will contain more values of 0 within the target outcome. Usingh3(yit +qit), the target outcomes areupdated in Scenarios B and C based on the strength of the predicted probability fromthe BiMM iteration. In all split functions, if the original binary outcome agreeswith the predicted probability from the BiMM iteration (i.e. in Scenarios A and D),then the target outcome is the original outcome.

Table 1:

Example scenarios for split function for the BiMM Tree method aredisplayed in Table 1. The ranges forqit andyit +qit are presented foryit values of 0 and 1, alongwith the resulting target outcomes resulting from the three split functions.

Scenarioyitqityit +qityit*=h1(yit+qit)yit*=h2(yit+qit)yit*=h3(yit+qit)
A00 <qit <0.50 <yit +qit< 0.5000
B00.5 <qit < 10.5<yit +qit< 1101 with probabilityqit, 0otherwise
C10 <qit <0.51 <yit +qit< 1.5101 with probabilityqit, 0otherwise
D10.5 <qit< 11.5 <yit +qit< 1111

Open in a separate window

BiMM trees for this study are computed using R software version 3.1.2 (Team. 2008). CART models are implemented usingthe R package rpart (Therneau andAtkinson 1997). Default settings are used within the CART models, but werequire that the minimum terminal node size be at least 10% of the developmentdataset so that node indicators within the Bayesian GLMM contain adequate data forfitting fixed effects. Bayesian GLMMs within the BiMM method are implemented usingthe R package blme (Dorie2013, Dorie 2014), again with alldefault settings.

4. Data Description

ALF occurs in approximately 2,000 patients in the United States each year,with about half of the cases attributed to acetaminophen overdose (Lee et al. 2008). A critical goal of the ALF Study Groupis to predict the likelihood of poor outcomes of acetaminophen-induced ALF patientswhich may be used both on hospital admission and post-hospital admission (Speiser, Lee, and Karvellas 2015). The ALFStudy Group registry consists of over 2,700 patients with a multitude of clinicaldata (e.g. laboratory values, treatments, complications, etc.) collected daily forup to seven days following enrollment unless a patient is transplanted, dischargedfrom the hospital or dies. To date, most prognosis prediction models for ALFpatients use variables collected at a single baseline time point (e.g. King’sCollege Criteria and Clichy Criteria (Bernuau1990, O’Grady et al.1989)). Many patients may remain alive for longer periods beyond the initialinsult because of advances in intensive care unit management (Antoniades et al. 2008, Stravitz et al. 2007). Thus, there is a need for a prediction modelwhich may be used to determine prognosis of acetaminophen-induced ALF patients (pooror favorable outcome) each day which can aid clinicians in management of patientsduring the first week of hospitalization. We define a poor outcome as having comagrade of III or IV and favorable outcome as having a coma grade of 0, I or II.

The ALF registry dataset contains many clinical predictor variables whichmay be used in modeling outcome. A few fixed predictor variables included within theregistry are gender, ethnicity, and age. Some examples of continuous predictorvariables collected daily for the first week in the hospital include aspartateaminotransferase (AST), alanine aminotransferase (ALT), creatinine, bilirubin andinternational normalized ratio (INR). Categorical variables collected daily includetreatments and clinical measurements such as mechanical ventilation, pressor use,and renal replacement therapy.

5. Simulation Study Design

To assess the predictive performance of the proposed BiMM tree method, weconduct a simulation study based on the real motivating dataset, the ALF Study Groupregistry. We simulate data from the ALF registry for several reasons. First, thecomplexity of the ALF dataset allows comparing of novel and traditionalmethodologies in realistic settings. Additionally, the ALF dataset contains multiplecontinuous predictors which are not normally distributed and several categoricalvariables, so our simulated ALF data provides various types of predictors which areconsistent with data that arises from many real-world scenarios. A final reason wesimulate data based on the real ALF dataset is that a correlation structure betweenrepeated measures on the same person is not imposed, so we can evaluate theperformance of proposed methodology with a real observed correlation structurewithin the ALF data.

We construct a dataset from which we sample simulation data by selecting alldata from acetaminophen-induced ALF patients within the registry (N=1064) andimputing all missing predictor data using an imputation method (Mistler) for multilevel data to preserve the originalcorrelation structure between predictor variables within the dataset. Thus, thesimulated datasets contain 1064 patients with complete data for seven days (threefixed predictors and eight longitudinal predictors). We use two data generatingprocesses for the fixed portion of the outcome: a tree structure and a linearstructure. For both processes, variables related to the outcome include INR,creatinine, and ventilator use, which is consistent with clinical literature (Koch et al. 2016, Speiser, Lee, and Karvellas 2015). The other fivelongitudinal variables and the three fixed predictors are included within thesimulation datasets as noise variables. The tree data generating process is depictedwithin Figure 1, which is read like a CART(i.e. begin at Node 0 and follow the arrow corresponding to the predictor variablevalues until a terminal node is reached). Nodes 1, 3 and 5 represent favorableoutcome for the subject on the specific day, whereas Node 6 represents poor outcomefor the subject on the specific day. The equation for the linear data generatingprocess is:

logit(pooroutcome)=2.3+1.4*ln(INR)+0.6*Creatinine+2.1*I(Ventilator)

where I(Ventilator) is 1 if the patient was on aventilator on the specific day, and is 0 otherwise. Thus, high INR and creatinineand being on a ventilator are associated with higher likelihood of poor outcomes,consistent with clinical literature (Koch et al.2016).

Small and large random effects are added to the fixed portion of the outcometo create a within-subject correlation structure. The small random effect isgenerated for each subject from a normal distribution centered at zero with standarddeviation of 0.1, whereas the large random effect is generated for each subject froma normal distribution centered at zero with standard deviation of 0.5. To derive theoutcome of observations at every time point, the fixed portion (from the tree orlinear data generating process) is added to the random effect, and a cut point isused to create the binary outcome.

Using the simulated datasets described in the previous paragraphs based onthe ALF registry, we compare the performance of several models. We useh1(yit +qit) as our split function forupdating the target outcome with a threshold (k1) of 0.5because clinicians often prefer to develop prediction models maximized forsensitivity to identify patients at highest risk of poor outcomes. All models arefit using all predictors in the data (i.e. both those associated with outcome andthose that were noise variables). We produce models for BiMM trees with oneiteration (denoted BiMM Tree 1) and with multiple iterations (denoted BiMM Tree H1and BiMM Tree H3 for the respective split functionsh1(yit +qit) andh3(yit +qit)) to assess if iteratingbetween fixed and random effects results in increased prediction accuracy. Modelsare compiled for 1000 simulation runs. Sample sizes (number of subjects) fortraining datasets used in model development are 100, 250 and 500. All test datasetsconsist of 500 new subjects not included within the training dataset. The numbers ofrepeated measurements of outcomes in our simulation study are 2, 4 and 7. Since themain objective in this study is to develop methodology for predicting newobservations, we assess the prediction (test set) accuracy of the models, defined asthe number of correct predictions divided by the total number of predictionsmade.

6. Simulation Study Results

Prediction accuracy is presented within Figure2 for the sample size of 100. Overall, the BiMM trees with one iterationor more than one iteration have higher accuracy compared to CART and Bayesian GLMMwhen the random effect is large, regardless of whether the data are generated usinga tree or linear structure. When the random effect is small, the accuracydistributions overlap, with the CART models generally having slightly higheraccuracy compared to the BiMM trees. With a linear data generating process and smallrandom effect, the CART and Bayesian GLMM have similar predictive accuracy, whereaswith a tree data generating process and small random effect, the Bayesian GLMM hasthe lowest prediction accuracy. The Bayesian GLMM also has the lowest predictionaccuracy for the tree data generating process with a large random effect. The BiMMtree models with one iteration and with multiple iterations generally have similarpredictive accuracies for each of the scenarios. Similar results were obtained forthe sample sizes of 250 and 500 (Supplementary Figures 1 and 2).

Open in a separate window

Figure 2:

The simulated prediction (test set) accuracy of models for N=100patients are displayed within Figure 2 for small and large random effects, forlinear and tree data generating processes, and for 2, 4 and 7 repeatedmeasurements per patient. Traditional CART, Bayesian GLMM, BiMM Tree with oneiteration (denoted BiMM Tree 1) and BiMM Tree with multiple iterations for thesplit function maximizing sensitivity and the general split function (denotedBiMM Tree H1 and BiMM Tree H3) are compared.

Most BiMM iterative trees converge in two iterations regardless of the splitfunction, and in rare cases convergence is reached in three or four iterations.Table 2 contains the median(interquartile range) estimates of prediction accuracy of the test dataset for eachsimulation scenario for CART, Bayesian GLMM, BiMM tree with one iteration, and BiMMtree algorithm with more than one iteration. Interquartile ranges of predictionaccuracy for the models in the different scenarios are relatively tight around themedian estimates, indicating that the distribution of prediction accuracy for modelsdoes not vary greatly over the simulation runs. Across 2, 4 and 7 repeatedmeasurements for the models and scenarios, prediction accuracy is similar, exceptfor the tree data generating process with a large random effect, where slight gainsin accuracy are achieved with increasing number of repeated measurements. Ingeneral, the prediction accuracy estimates are similar for sample sizes of 100, 250and 500, with slight improvements in accuracy for BiMM models with larger samplesizes.

Table 2:

The simulated prediction (test set) median accuracy and interquartilerange of models for N=100, 250 and 500 patients are displayed within Table 2 forsmall and large random effects, for linear and tree data generating processes,and for 2, 4 and 7 repeated measurements per patient. Traditional CART, BayesianGLMM, BiMM Tree with one iteration (denoted BiMM Tree 1) and BiMM Tree withmultiple iterations for the split function maximizing sensitivity and thegeneral split function (denoted BiMM Tree H1 and BiMM Tree H3) are compared.

ModelRepeated
Outcomes
N=100N=250N=500
Linear DGPTree DGPLinear DGPTree DGPLinear DGPTree DGP
Small RELarge RESmall RELarge RESmall RELarge RESmall RELarge RESmall RELarge RESmall RELarge RE
CART20.913
(0.890,0.925)
0.678
(0.651.0.699)
0.963
(0.950,0.971)
0.769
(0.745,0.791)
0.929
(0.920,0.937)
0.706
(0.690,0.722)
0.970
(0.965,0.975)
0.796
(0.780,0.812)
0.942
(0.936,0.948)
0.732
(0.718,0.745)
0.971
(0.965,0.975)
0.821
(0.806,0.832)
40.933
(0.922,0.940)
0.679
(0.659,0.699)
0.973
(0.966,0.978)
0.805
(0.788,0.821)
0.942
(0.937,0.947)
0.721
(0.707,0.733)
0.975
(0.970,0.979)
0.830
(0.818,0.843)
0.945
(0.941,0.950)
0.735
(0.724,0.745)
0.975
(0.971,0.980)
0.844
(0.833,0.853)
70.941
(0.935,0.947)
0.697
(0.680,0.714)
0.977
(0.972,0.981)
0.828
(0.814,0.842)
0.946
(0.942,0.950)
0.732
(0.720,0.743)
0.979
(0.975,0.982)
0.853
(0.840,0.863)
0.947
(0.944,0.951)
0.740
(0.729,0.750)
0.979
(0.975,0.982)
0.859
(0.849,0.867)
Bayesian
GLMM
20.924
(0.914,0.933)
0.715
(0.700,0.730)
0.820
(0.805,0.835)
0.709
(0.691,0.723)
0.939
(0.933,0.944)
0.731
(0.718,0.745)
0.836
(0.826,0.847)
0.722
(0.708,0.733)
0.943
(0.938,0.948)
0.737
(0.723,0.748)
0.842
(0.833,0.851)
0.725
(0.714,0.737)
40.922
(0.915,0.928)
0.704
(0.689,0.716)
0.805
(0.795,0.814)
0.728
(0.717,0.738)
0.930,
(0.926,0.934)
0.717
(0.704,0.729)
0.812
(0.804,0.820)
0.735
(0.724,0.744)
0.932
(0.928,0.936)
0.721
(0.710,0.732)
0.814
(0.808,0.821)
0.736
(0.727,0.746)
70.917
(0.912,0.921)
0.695
(0.682,0.707)
0.805
(0.796,0.812)
0.750
(0.741,0.760)
0.921
(0.917,0.925)
0.707
(0.696,0.717)
0.808
(0.801,0.814)
0.755
(0.746,0.763)
0.923
(0.919,0.926)
0.710
(0.699,0.721)
0.809
(0.803,0.814)
0.756
(0.748,0.765)
BiMM
Tree 1
Iteration
20.849
(0.833,0.868)
0.827
(0.776,0.852)
0.916
(0.897,0.927)
0.836
(0.782,0.902)
0.850
(0.838,0.866)
0.850
(0.830,0.873)
0.921
(0.914,0.929)
0.911
(0.880,0.923)
0.850
(0.840,0.863)
0.856
(0.840,0.886)
0.920
(0.915,0.926)
0.917
(0.907,0.925)
40.845
(0.822,0.860)
0.815
(0.780,0.849)
0.917
(0.881,0.952)
0.901
(0.854,0.956)
0.850
(0.835,0.862)
0.837
(0.807,0.862)
0.942
(0.888,0.959)
0.947
(0.894,0.964)
0.849
(0.838,0.862)
0.841
(0.813,0.865)
0.952
(0.891,0.960)
0.953
(0.900,0.963)
70.881
(0.869,0.891)
0.837
(0.806,0.864)
0.902
(0.892,0.913)
0.905
(0.862,0.920)
0.887
(0.878,0.894)
0.860
(0.837,0.889)
0.905
(0.895,0.914)
0.907
(0.864,0.914)
0.890
(0.884,0.895)
0.867
(0.843,0.890)
0.905
(0.897,0.913)
0.906
(0.858,0.913)
BiMM
TreeH1
Algorithm
20.852
(0.832,0.876)
0.840
(0.813,0.854)
0.910
(0.842,0.924)
0.809
(0.757,0.870)
0.856
(0.842,0.874)
0.847
(0.836,0.861)
0.918
(0.904,0.925)
0.858
(0.819,0.916)
0.859
(0.845,0.874)
0.851
(0.841,0.866)
0.918
(0.909,0.924)
0.852
(0.822,0.917)
40.844
(0.823,0.857)
0.806
(0.790,0.840)
0.882
(0.868,0.925)
0.871
(0.817,0.934)
0.847
(0.830,0.860)
0.820
(0.800,0.850)
0.887
(0.873,0.947)
0.882
(0.855,0.952)
0.847
(0.832,0.860)
0.816
(0.796,0.852)
0.886
(0.874,0.901)
0.878
(0.851,0.904)
70.879
(0.862,0.891)
0.835
(0.814,0.847)
0.899
(0.890,0.911)
0.891
(0.801,0.910)
0.887
(0.876,0.894)
0.842
(0.832,0.852)
0.905
(0.894,0.913)
0.897
(0.853,0.909)
0.890
(0.884,0.895)
0.843
(0.836,0.851)
0.905
(0.896,0.912)
0.896
(0.725,0.907)
BiMM
TreeH3
Algorithm
20.849
(0.834,0.867)
0.838
(0.804,0.852)
0.913
(0.850,0.924)
0.834
(0.775,0.909)
0.850
(0.839,0.865)
0.846
(0.835,0.857)
0.920
(0.912,0.928)
0.908
(0.830,0.923)
0.850
(0.840,0.863)
0.848
(0.838,0.859)
0.920
(0.914,0.926)
0.915
(0.852,0.923)
40.843
(0.822,0.857)
0.803
(0.788,0.833)
0.881
(0.868,0.938)
0.874
(0.848,0.943)
0.848
(0.832,0.861)
0.807
(0.793,0.842)
0.881
(0.871,0.949)
0.879
(0.860,0.953)
0.848
(0.836,0.861)
0.805
(0.793,0.839)
0.879
(0.869,0.891)
0.877
(0.862,0.894)
70.879
(0.852,0.891)
0.833
(0.805,0.845)
0.895
(0.887,0.905)
0.891
(0.805,0.909)
0.886
(0.877,0.894)
0.840
(0.831,0.849)
0.899
(0.890,0.909)
0.895
(0.785,0.907)
0.890
(0.884,0.895)
0.842
(0.835,0.849)
0.890
(0.891,0.910)
0.895
(0.776,0.904)

Open in a separate window

In addition to assessing the predictive accuracy of models, we present thedifference between training and test accuracy for models in the simulated scenariosto measure the amount of overfitting in models for sample size of 100 (Figure 3). Within this plot, large values of thedifference between the training and test datasets indicate that the accuracy of thetraining dataset is larger than the accuracy of the test dataset. For small randomeffects, CARTs, Bayesian GLMMs, BiMM trees with one iteration, and BiMM treesupdated with h3(yit +qit) have minimal overfitting,since the difference between training and test set accuracy is small. However, forsmall random effects, BiMM trees with multiple iterations updated withh1(yit +qit) have larger differences inaccuracy, suggesting the models may have overfit the training data. When randomeffects are large, the CART models tend to overfit the training data the most forboth data generating processes. For the tree data generating process with a largerandom effect, the Bayesian GLMM overfits the data more than the BiMM trees, butthis is only a slight difference. Regardless of the data generating process, theBiMM trees overfit the training data the least. As the number of repeatedmeasurements increase, model overfitting slightly decreases for large random effectdatasets, whereas model overfitting remains similar for datasets with small randomeffects. The performance of each model in terms of overfitting is similar for thesample sizes of 250 and 500; however, the amount of overfitting is slightly lesswith the larger sample sizes for datasets with large random effects (Supplementary Figures 3 and 4).

Open in a separate window

Figure 3:

The simulated difference in training and test set accuracy of models forN=100 patients are displayed within Figure 3 for small and large random effects,for linear and tree data generating processes, and for 2, 4 and 7 repeatedmeasurements per patient. Traditional CART, Bayesian GLMM, BiMM Tree with oneiteration (denoted BiMM Tree 1) and BiMM Tree with multiple iterations for thesplit function maximizing sensitivity and the general split function (denotedBiMM Tree H1 and BiMM Tree H3) are compared.

7. Data Application

To illustrate the use of the BiMM tree method, we develop prediction modelsfor acetaminophen-induced ALF patients enrolled in the ALF registry from January1998 to February 2016 (N=1082). The primary endpoint is poor versus favorableoutcome, defined by a daily measurement of coma grade. High coma grade (III or IV)represents poor outcome on a particular day for a patient, whereas low coma grade(0, I or II) represents favorable outcome on a particular day for a patient. Wedefine sensitivity as the proportion of correct predictions for patients within thepoor outcome group and specificity as the proportion of correct predictions forpatients within the favorable outcome group.

We include three fixed predictors as possible variables for the models: sex,ethnicity and age. Several continuous laboratory predictors which are repeatedlycollected for up to seven days when patients are in the hospital are included aswell: ALT, AST, bilirubin, creatinine, phosphate, lactate, pH, platelets, ammoniaand INR. Many categorical (binary for yes/no) variables collected daily abouttreatments and complications experienced by patients are considered as possiblepredictors. Some patients have missing predictor data, and we use the default CARTmethod of surrogate splits to handle this within the BiMM framework. Surrogatesplitting uses non-missing predictor variables for patients who have missingvariables to run down the CART model so that predictions can be made regardless ofmissing values (Breiman et al. 1984). Werandomly split the dataset into a training (N=541 subjects) and test (N=541subjects) dataset so that we can assess the predictive accuracy of the BiMM tree.Patients within both the training and test datasets have on average four days ofdata collected.

The BiMM tree with multiple iterations usingh1(yit +qit) as the split function forpredicting poor versus favorable outcomes for acetaminophen-ALF patients isillustrated in Figure 4. We use this splitfunction because it is desirable to develop a tree prediction model which maximizessensitivity. The model includes four clinical predictors of outcome: pressor use(binary, yes/no), bilirubin, creatinine, and pH. The BiMM tree with one iterationcontains these variables in addition to sex. Patients requiring pressors or withhigh values of bilirubin, creatinine and pH are associated with higher likelihoodsof poor daily outcomes.

Open in a separate window

Figure 4:

The BiMM Tree Algorithm with multiple iterations using the splitfunction that maximizes sensitivity to predict daily prognosis of ALF patientsis represented within Figure 4. The decision tree uses four clinical predictorsof outcome and contains five terminal nodes: 3, 5 and 7 represent good outcomesand 1 and 8 represent poor outcomes.

BiMM tree performance statistics and 95% binomial confidence intervals arepresented within Table 3. Exact binomialconfidence intervals are calculated using the base R functionbinom.test(). BiMM tree models with one iteration and withsplit function h1(yit +qit) have similar training datasetaccuracies of 0.9, sensitivities of 1, and specificities of 0.8, which indicatesoverall good model predictions for the training dataset. However, the accuracies forthe test dataset for the all BiMM trees are lower (between 0.63 and 0.65). The testdataset sensitivity for the BiMM tree with one iteration of 0.67 is lower comparedto the sensitivity for the BiMM tree with multiple iterations withh1(yit +qit) (0.81), indicating that thesplitting function maximizes sensitivity as expected. The BiMM tree with oneiteration has a slightly higher specificity compared to the BiMM tree with multipleiterations with h1(yit +qit) for the test dataset. Theposterior log likelihood of the BiMM tree model with one iteration is larger thanthat of the BiMM tree models with multiple iterations. This indicates that the BiMMtree produced by multiple iterations provides better fit compared to the tree withone iteration.

Table 3:

Accuracy, sensitivity and specificity of models applied to the ALFdataset for predicting outcome are presented in Table 3. Traditional CART,Bayesian GLMM, BiMM Tree with one iteration (denoted BiMM Tree 1) and BiMM Treewith multiple iterations for the split function maximizing sensitivity and thegeneral split function (denoted BiMM Tree H1 and BiMM Tree H3) are compared.

MethodTraining DataTest Data
# Obs
(M)
Run Time
(seconds)
AccuracySensitivitySpecificity# Obs
(M)
AccuracySensitivitySpecificity
CART22530.1090.702
(0.683,0.721)
0.683
(0.656,0.710)
0.723
(0.695,0.749)
22080.639
(0.619,0.660)
0.613
(0.584,0.641)
0.669
(0.640,0.698)
Bayesian GLMM7810.0570.859
(0.762,0.927)
0.881
(0.744,0.960)
0.833
(0.672,0.936)
1010.554
(0.452,0.653)
0.629
(0.449,0.785)
0.515
(0.389,0.640)
BiMM Tree 1 Iteration22531.3380.893
(0.880,0.905)
1.000
(0.996,1.000)
0.818
(0.797,0.839)
22080.645
(0.625,0.665)
0.667
(0.638,0.694)
0.623
(0.593,0.652)
BiMM Tree H1 Algorithm22531.3610.885
(0.871,0.898)
1.000
(0.996,1.000)
0.807
(0.785,0.828)
22080.649
(0.629,0.669)
0.812
(0.779,0.842)
0.585
(0.560,0.609)
BiMM Tree H3 Algorithm22532.7110.666
(0.646,0.686)
0.684
(0.656,0.712)
0.650
(0.621,0.677)
22080.627
(0.607,0.647)
0.669
(0.640,0.698)
0.590
(0.561,0.618)

Open in a separate window

A Frequentist GLMM is implemented using the lme4 R package,which produces error messages and warnings for the ALF data application; therefore,its results are omitted. Because the Bayesian GLMM method requires complete datasetswith no predictor values missing, a substantial amount of observations could not beincluded for model development (training data) or model predictions (test data).When all variables are included in the GLMM, only 78 (3.5%) observations in thetraining dataset and 101 (4.6%) observations in the test dataset could be used (i.e.had no missing values). Overall, methods which included a clustering variable (i.e.Bayesian GLMM and BiMM trees) have higher training dataset accuracy compared toCART. Test dataset accuracy is similar for all methods except for the Bayesian GLMM,which had substantially lower prediction accuracy. CART and BiMM tree with oneiteration offer similar values of sensitivity and specificity, whereas the BiMM treealgorithm with multiple iterations withh1(yit +qit) has slightly highersensitivity compared to specificity, as desired.

Also included in Table 3 are thecomputational run times for developing the models using the training dataset. Ingeneral, computation time will vary based on the size of the dataset and the numberof clusters. For the ALF data application, the CART has the quickest run time,followed by the BiMM tree methods (around 1–3 seconds). The Bayesian GLMM hasthe highest run time, around 10 seconds.

8. Discussion

Overall, the BiMM tree framework may offer advantages compared to CARTs andBayesian GLMMs. The main benefit of BiMM compared to CART is that it can account forclustered outcomes in modeling so that the assumption of independent observations isnot violated. BiMM trees do not require specification of nonlinear relationships orinteraction terms and can be implemented for high dimensional datasets. A strengthof BiMM tree is that nonlinear forms of predictors and interactions betweenpredictors are developed by the method based on the data. The computation time ofGLMMs may be higher compared to BiMM trees, yet BiMM trees may potentially offerhigher prediction accuracy for certain situations: for example, when the underlyingstructure of the data contains interactions between predictors and nonlinearpredictors of outcome or when there is a large clustered effect for the outcomes. Afinal potential strength of BiMM tree methodology is that missing values inpredictor data is naturally handled using surrogate splits (using other non-missingvariables) within the CART portion of the algorithm; thus, observations with missingpredictor data can still be included within BiMM models. GLMMs only use completecases within datasets, so missing values would need to be imputed (filled in) withinthe GLMM setting in order to use the entire dataset. The BiMM tree method does notrequire missing data to be imputed prior to model development.

A distinction of the BiMM tree framework compared to other decision treemethods for longitudinal and clustered outcomes within the literature (Abdolell et al. 2002, De’Ath 2002, Dine,Larocque, and Bellavance 2009, Hajjem,Bellavance, and Larocque 2011, Keon Lee2005, Larocque 2010, Loh and Zheng 2013, Segal 1992, Sela andSimonoff 2012, Yu and Lambert1999) is the Bayesian implementation of GLMMs. For continuous outcomes, thereare fewer issues with GLMM convergence because estimates may be computed directly;however, with categorical outcomes complete or quasi-separation may pose a challengeto GLMM fitting. The default priors specified in the BiMM tree method areuninformative, but if convergence issues arise, weakly informative priors may beused for estimating the random effects (Gelman etal. 2008).

BiMM tree provides a flexible, data-driven predictive modeling framework forlongitudinal and clustered binary outcomes. Our simulation study demonstrates thatBiMM tree may potentially be advantageous compared to CART which ignores clusteredoutcomes and Bayesian GLMM when predictors are not linearly related to the outcomethrough the link function and when the random effect of the clustered variable islarge. Though standard CART models can have high predictive accuracy if randomeffects are small, failing to account for large clustering effects causes a sizeabledecrease in prediction accuracy in our simulations. While Bayesian GLMM can be usedto adjust for clustering within the data, model misspecification may reduceprediction accuracy (e.g. not including a significant interaction term or specifyingan incorrect nonlinear relationship between predictor and outcome). This is evidentin our simulation study, where BiMM tree models have higher prediction accuracycompared to Bayesian GLMMs if the data has a tree structure or if there is a largeclustering effect between outcomes. One possible reason that the Bayesian GLMMs didnot perform well for simulated data in this study is that some of the continuouspredictor variables have skewed distributions, and extreme values may have adverselyaffected parameter estimates.

The BiMM trees with one iteration generally have similar prediction accuracycompared to BiMM trees with more than one iteration within our simulation study.While the training dataset accuracies for the BiMM trees with more than oneiteration are higher than the BiMM trees with only one iteration, the multipleiteration method with the split function which maximizes sensitivity producesoverfitted models which do not predict well for test datasets if the effect ofclustering within subjects is small. BiMM trees which iterate between fixed andrandom effects have slightly higher computation time and offer minimal increases inprediction accuracy, suggesting that BiMM trees with one iteration may besufficient. It is possible that real-world datasets are more complex than oursimulated datasets, though, so multiple iterations may be necessary in somesituations. However, one may assess this by compiling both BiMM tree models with oneiteration and with multiple iterations and comparing the posterior loglikelihoods.

Another interesting result from the simulation study is that the predictionaccuracy of models remained similar whether models were developed using 2, 4 or 7repeated measurements. We expected to see increases in prediction accuracy withincreases in the number of repeated measurements. However, the simulated dataset forour study is created based on the real ALF Study Group registry, so this result maybe because the clustering effect for repeated outcomes does not change whether 2, 4or 7 measurements are included. Though a simulated dataset could have beenconstructed to induce a specific correlation structure for repeated observations(e.g. autoregressive structure), we wanted the data simulation to resemble ourmotivating dataset as closely as possible. Our simulated dataset based on the realALF registry also allows us to assess how the models performed when certain aspectsof the data make modeling challenging (e.g. collinear predictors, predictors withskewed distributions with extreme values, and complex interactions betweenpredictors). A future study could assess the performance of BiMM tree methodologyfor more complex simulated scenarios, such as a high dimensional dataset or adataset containing nonlinear predictors and high-order interactions.

An application of the novel BiMM tree methodology using the split functionwhich maximizes sensitivity provides a prediction model for acetaminophen-inducedALF patients for daily use for the first week of hospitalization. The model offerspredictive (test set) accuracy of 65% and provides clinicians with a simple,interpretable model which can be easily used in practice. Compared to otherprediction models in the literature, the BiMM tree uses similar predictors (e.g.King’s College Criteria also uses pH and creatinine, and the model for endstage liver disease uses bilirubin and creatinine (Bernuau 1990, Wiesner et al.2003)).

The main objective of this study is to develop a flexible framework forconstructing prediction models for binary outcomes. BiMM tree methodology may offercomparable or slightly higher prediction accuracy to other models and may beconsidered an alternative to using GLMM for complex datasets. Future work couldinvestigate the use of alternative implementations of decision tree algorithmswithin the BiMM tree framework for modeling longitudinal and clustered binaryoutcomes (e.g. C4.5, GUIDE, QUEST, CRUISE, BART and bartMachine (Chipman, George, and McCulloch 2010, Kapelner and Bleich 2013, Loh 2014)).

An R package for implementing BiMM tree methodology is being developed andwill be available on the Comprehensive R Archive Network. An R program implementingBiMM tree methodology is available within Supplemental File 1.

Supplementary Material

Supp1

Supplemental Figure 1: The simulated prediction (test set) accuracyof models for N=250 patients are displayed within Supplemental Figure 1 forsmall and large random effects, for linear and tree data generatingprocesses, and for 2, 4 and 7 repeated measurements per patient. TraditionalCART, Bayesian GLMM, BiMM Tree with one iteration (denoted BiMM Tree 1) andBiMM Tree with multiple iterations for the split function maximizingsensitivity and the general split function (denoted BiMM Tree H1 and BiMMTree H3) are compared.

Click here to view.(44K, docx)

Supp2

Supplemental Figure 2: The simulated prediction (test set) accuracyof models for N=500 patients are displayed within Supplemental Figure 2 forsmall and large random effects, for linear and tree data generatingprocesses, and for 2, 4 and 7 repeated measurements per patient. TraditionalCART, Bayesian GLMM, BiMM Tree with one iteration (denoted BiMM Tree 1) andBiMM Tree with multiple iterations for the split function maximizingsensitivity and the general split function (denoted BiMM Tree H1 and BiMMTree H3) are compared.

Click here to view.(45K, docx)

Supp3

Supplemental Figure 3: The simulated difference in training and testset accuracy of models for N=250 patients are displayed within SupplementalFigure 3 for small and large random effects, for linear and tree datagenerating processes, and for 2, 4 and 7 repeated measurements per patient.Traditional CART, Bayesian GLMM, BiMM Tree with one iteration (denoted BiMMTree 1) and BiMM Tree with multiple iterations for the split functionmaximizing sensitivity and the general split function (denoted BiMM Tree H1and BiMM Tree H3) are compared.

Click here to view.(49K, docx)

Supp4

Supplemental Figure 4: The simulated difference in training and testset accuracy of models for N=500 patients are displayed within SupplementalFigure 4 for small and large random effects, for linear and tree datagenerating processes, and for 2, 4 and 7 repeated measurements per patient.Traditional CART, Bayesian GLMM, BiMM Tree with one iteration (denoted BiMMTree 1) and BiMM Tree with multiple iterations for the split functionmaximizing sensitivity and the general split function (denoted BiMM Tree H1and BiMM Tree H3) are compared.

Supplemental Figure 5: The BiMM Tree with one iteration to predictdaily prognosis of ALF patients is represented within Supplemental Figure 5.The decision tree has identical splits and nodes as the BiMM Tree withmultiple iterations using the split function maximizing sensitivity, butalso includes nodes 9 and 10 with a split for gender.

Click here to view.(48K, docx)

9. Acknowledgements

This study was funded by the National Institute of Diabetes and Digestiveand Kidney Diseases (DK U01–58369). This work was partially supported by theSouth Carolina Clinical and Translational Research Institute NIH/NCATS Grant(UL1TR001450 and TL1TR001451), NIH/NIAMS Grant (P60 AR062755), NIH/NIGMS Grant (R01GM122078), and NIH/NCI Grant (R21 CA209848). Authors would like to thank Dr. PaulNietert for his critical review of this manuscript.

Footnotes

10Declaration of Conflicting Interests

Authors declare that there is no conflict of interest.

References

  • Abdolell M, M LeBlanc DStephens, and Harrison RV. 2002. “Binary partitioning forcontinuous longitudinal data: categorizing a prognosticvariable.” Statistics in medicine21(22):3395–3409. [PubMed] [Google Scholar]
  • Antoniades CG, Berry PA, Wendon JA, and Vergani D. 2008. “The importance of immunedysfunction in determining outcome in acute liverfailure.” Journal of hepatology49(5):845–61. doi: 10.1016/j.jhep.2008.08.009. [PubMed] [CrossRef] [Google Scholar]
  • Bates D2009. “Online Response to Convergence Issues in CRANR LME4 Package.” Retrievedfrom.
  • Bates Douglas, Maechler Martin, Bolker Ben, Walker Steven, Rune Haubo Bojesen Christensen, Henrik Singmann, BinDai, Gabor Grothendieck, C+ Eigen, and LinkingTo Rcpp.2015. “Package‘lme4’.” convergence12:1. [Google Scholar]
  • Bernuau J1990. “[Fulminant and subfulminant viralhepatitis].” La Revue du praticien40(18):1652–5. [PubMed] [Google Scholar]
  • Breiman L, Friedman JH, A Olshen R, and Stone CJ. 1984. Classification and regressiontrees. Monterrey, CA, USA:Wadsworth and Brooks. [Google Scholar]
  • Chipman Hugh A, George Edward I, and McCulloch Robert E. 2010. “BART: Bayesian additiveregression trees.” The Annals of AppliedStatistics:266–298. [Google Scholar]
  • De’Ath Glenn. 2002. “Multivariate regressiontrees: a new technique for modeling species-environmentrelationships.” Ecology83(4):1105–1117. [Google Scholar]
  • Dine Abdessamad, Larocque Denis, and Bellavance François. 2009. “Multivariate trees for mixedoutcomes.” Computational Statistics &Data Analysis53(11):3795–3804. [Google Scholar]
  • Dorie V2013. blme: Bayesian Linear Mixed-Effects Models.R package. [Google Scholar]
  • Dorie Vincent. 2014. “Mixed Methods for MixedModels” ColumbiaUniversity. [Google Scholar]
  • Fong Y, Rue H, and Wakefield J. 2010. “Bayesian inference forgeneralized linear mixed models.”Biostatistics11(3):397–412. doi: 10.1093/biostatistics/kxp053. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
  • Gelman Andrew, Jakulin Aleks, Pittau Maria Grazia, and Su Yu-Sung. 2008. “A weakly informative defaultprior distribution for logistic and other regressionmodels.” The Annals of AppliedStatistics:1360–1383. [Google Scholar]
  • Hajjem Ahlem, Bellavance François, and Larocque Denis. 2011. “Mixed effects regressiontrees for clustered data.” Statistics &probability letters81(4):451–459. [Google Scholar]
  • Hastie T, Tibshirani R, and Friedman J. 2001. The elements of statisticallearning. 2nd ed.New York:Springer. [Google Scholar]
  • Hothorn T, Hornik K, and Zeileis A. “party: A Laboratory for Recursive Part (y) itioning. Rpackage version 0.9–9999. 2011.” URL:>http://cran.r-project.org/package=party (1 December2010, date lastaccessed).
  • Kapelner Adam, and Bleich Justin. 2013. “bartMachine: Machine Learning withBayesian Additive Regression Trees.” arXiv preprintarXiv:1312.2171. [Google Scholar]
  • Lee Keon, Seong. 2005. “On generalized multivariatedecision tree by using GEE.” ComputationalStatistics & Data Analysis49(4):1105–1119. [Google Scholar]
  • Koch David G, Holly Tillman, Valerie Durkalski, Lee William M, and Reuben Adrian. 2016. “Development of a Model toPredict Transplant-free Survival of Patients with Acute LiverFailure.” Clinical Gastroenterology andHepatology. [PMC free article] [PubMed] [Google Scholar]
  • Larocque D2010. “Mixed Effects Random Forest for ClusteredData A. Hajjem, F. Bellavance.”
  • Lee William M, Squires Robert H, Nyberg Scott L, Doo Edward, and Hoofnagle Jay H. 2008. “Acute liver failure: summaryof a workshop.” Hepatology47(4):1401–1415. [PMC free article] [PubMed] [Google Scholar]
  • Loh Wei-Yin, and Zheng Wei. 2013. “Regression trees forlongitudinal and multiresponse data.” TheAnnals of Applied Statistics7(1):495–522. [Google Scholar]
  • Loh Wei‐Yin. 2014. “Fifty Years of Classificationand Regression Trees.” InternationalStatistical Review. [PMC free article] [PubMed] [Google Scholar]
  • Mistler Stephen A. “A SAS macro for applying multiple imputation tomultilevel data.”
  • O’Grady JG, Alexander GJ, Hayllar KM, and Williams R. 1989. “Early indicators of prognosisin fulminant hepatic failure.”Gastroenterology97(2):439–45. [PubMed] [Google Scholar]
  • Segal Mark Robert. 1992. “Tree-structured methods forlongitudinal data.” Journal of the AmericanStatistical Association87(418):407–418. [Google Scholar]
  • Sela Rebecca J, and Simonoff Jeffrey S. 2012. “RE-EM trees: a data miningapproach for longitudinal and clustered data.”Machine learning86(2):169–207. [Google Scholar]
  • Speiser Jaime L, Lee William M, and Karvellas CJ. 2015. PloS one. [Google Scholar]
  • Stravitz RT, Kramer AH, Davern T, Shaikh AO, Caldwell SH, Mehta RL, Blei AT, Fontana RJ, McGuire BM, Rossaro L, Smith AD, and Lee WM. 2007. “Intensive care of patientswith acute liver failure: recommendations of the U.S. Acute Liver FailureStudy Group.” Crit Care Med35(11):2498–508. doi: 10.1097/01.CCM.0000287592.94554.5F. [PubMed] [CrossRef] [Google Scholar]
  • Team., R Development Core.2008. “R: a language and environment forstatistical computing.”, Vienna,Austria. [Google Scholar]
  • Therneau TM, and Atkinson EJ. 1997. An introduction to recursive partitioningusing the Rpart routines. MayoFoundation. [Google Scholar]
  • Wiesner Russell, Edwards Erick, Freeman Richard, Harper Ann, Kim Ray, Kamath Patrick, Kremers Walter, Lake John, Howard Todd, and Merion Robert M. 2003. “Model for end-stage liverdisease (MELD) and allocation of donor livers.”Gastroenterology124(1):91–96. [PubMed] [Google Scholar]
  • Wu Hulin, and Zhang Jin-Ting. 2006. Nonparametric regression methods forlongitudinal data analysis: mixed-effects modeling approaches. Vol.515: John Wiley &Sons. [Google Scholar]
  • Yu Yan, and Lambert Diane. 1999. “Fitting trees to functionaldata, with an application to time-of-day patterns.”Journal of Computational and graphical Statistics8(4):749–762. [Google Scholar]
  • Zorn Christopher. 2005. “A solution to separation inbinary response models.” PoliticalAnalysis13(2):157–170. [Google Scholar]
BiMM tree: A decision tree method for modeling clustered and
longitudinal binary outcomes (2024)

References

Top Articles
Methods To Go Shopping For Breckie Hill | The New York Times Student Journalism Institute
Frontiers | Paradoxes in practices of inclusion in physical education
Knoxville Tennessee White Pages
Canya 7 Drawer Dresser
Nfr Daysheet
Www.metaquest/Device Code
Terraria Enchanting
Red Wing Care Guide | Fat Buddha Store
Shaniki Hernandez Cam
Back to basics: Understanding the carburetor and fixing it yourself - Hagerty Media
Love Compatibility Test / Calculator by Horoscope | MyAstrology
Slushy Beer Strain
Learn2Serve Tabc Answers
Saberhealth Time Track
Midlife Crisis F95Zone
The Cure Average Setlist
1773X To
Caledonia - a simple love song to Scotland
Milanka Kudel Telegram
Panic! At The Disco - Spotify Top Songs
eHerkenning (eID) | KPN Zakelijk
Gina Wilson All Things Algebra Unit 2 Homework 8
Cincinnati Adult Search
Mega Personal St Louis
Brbl Barber Shop
6 Most Trusted Pheromone perfumes of 2024 for Winning Over Women
Why Are Fuel Leaks A Problem Aceable
Cb2 South Coast Plaza
Pain Out Maxx Kratom
Violent Night Showtimes Near Johnstown Movieplex
My Dog Ate A 5Mg Flexeril
La Qua Brothers Funeral Home
Swgoh Boba Fett Counter
Reli Stocktwits
Netherforged Lavaproof Boots
Weekly Math Review Q4 3
The Land Book 9 Release Date 2023
Obsidian Guard's Skullsplitter
Frank 26 Forum
Vindy.com Obituaries
Parent Portal Pat Med
Watch Chainsaw Man English Sub/Dub online Free on HiAnime.to
Centimeters to Feet conversion: cm to ft calculator
Booknet.com Contract Marriage 2
A rough Sunday for some of the NFL's best teams in 2023 led to the three biggest upsets: Analysis
Xre 00251
Wzzm Weather Forecast
Campaign Blacksmith Bench
Ciara Rose Scalia-Hirschman
Billings City Landfill Hours
Gelato 47 Allbud
Skybird_06
Latest Posts
Article information

Author: Carmelo Roob

Last Updated:

Views: 6515

Rating: 4.4 / 5 (45 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Carmelo Roob

Birthday: 1995-01-09

Address: Apt. 915 481 Sipes Cliff, New Gonzalobury, CO 80176

Phone: +6773780339780

Job: Sales Executive

Hobby: Gaming, Jogging, Rugby, Video gaming, Handball, Ice skating, Web surfing

Introduction: My name is Carmelo Roob, I am a modern, handsome, delightful, comfortable, attractive, vast, good person who loves writing and wants to share my knowledge and understanding with you.