On the estimation of misclassification probabilities of chronic kidney disease using continuous time hidden Markov models

Introduction Chronic kidney disease (CKD) is defined as the presence of kidney damage along with the reduction in the level of kidney function. According to National Kidney Foundation (NKF) recommendations, and based on the estimated glomerular filtration rate (eGFR) classification intervals, CKD can be classified into five categories of disease stages. The worldwide prevalence of CKD is varying between 10.5% and 13.1% (1). In Western countries, 2/3rd cases of CKD are due to diabetes and hypertension (2). The prevalence of CKD in India is approximately 800 per million populations (3). The progression of CKD can be shown by Markov process permitting only the forward transitions from one state to another state over time (4). The progression of the disease is continuous in time and the time of transitions are random in nature. Homogeneous continuous time multistate model based on Markov processes is the suitable model to describe the course of progression of CKD (5). In the case of hidden Markov model (HMM), it is assumed that the true stages of disease are hidden (unobservable or latent). The actual stage of disease can only be determined indirectly with the help of the disease marker. It is assumed that the generated observations are conditionally independent given the true stages. HMM has been extensively applied in areas of speech and signal Gurprit Grover1 ID , Alka Sabharwal2, Shrawan Kumar2* ID , Arpan Kumar Thakur1


Introduction
Chronic kidney disease (CKD) is defined as the presence of kidney damage along with the reduction in the level of kidney function.According to National Kidney Foundation (NKF) recommendations, and based on the estimated glomerular filtration rate (eGFR) classification intervals, CKD can be classified into five categories of disease stages.The worldwide prevalence of CKD is varying between 10.5% and 13.1% (1).In Western countries, 2/3 rd cases of CKD are due to diabetes and hypertension (2).The prevalence of CKD in India is approximately 800 per million populations (3).
The progression of CKD can be shown by Markov process permitting only the forward transitions from one state to another state over time (4).The progression of the disease is continuous in time and the time of transitions are random in nature.Homogeneous continuous time multistate model based on Markov processes is the suitable model to describe the course of progression of CKD (5).In the case of hidden Markov model (HMM), it is assumed that the true stages of disease are hidden (unobservable or latent).The actual stage of disease can only be determined indirectly with the help of the disease marker.It is assumed that the generated observations are conditionally independent given the true stages.HMM has been extensively applied in areas of speech and signal Grover G et al processing (6).Chen et al proposed anHMM for breast cancer screening (7).Kirby et al described HMM in the case of cervical smear tests (8), Satten and Longini (9) and Guihenneuc-Jouyaux et al (10) used HMM for describing the progression of HIV infection to different stages on the basis of CD4 cell counts.Jackson et al used HMM model for representing the decline in the function of the lung after lung transplantation (11).
In the current study, homogeneous continuous-time HMM has been used to study the progression of CKD.We have estimated the transition rates between various states and probabilities of misclassification between the true and observed stages of the disease.Using these parameters, sojourn times of states and conditional probabilities are also computed.The mean sojourn time of a state is the length of time a CKD patient spends on an average in that stage before moving to the next stage (characterized by a state of the Markov chain).
We have also investigated the impact of prognostic factors such as diabetes, hypertension and age on transition rates and misclassification probabilities.The odds ratios of misclassification probabilities have been computed in the presence of covariates (prognostic factors).To the best of our knowledge, HMM has not been applied to study the course of progression of CKD.Section 1 includes introduction and objective of the study.Section 2 deals with the materials and methods, results are presented in section 3 followed by discussions and conclusions in section 4 and section 5 respectively.

Objectives
This study aimed to ascertain whether the observed stages based on eGFR match with the true stages of CKD.In addition, we aimed to determine the most likely misclassified stage using an appropriate model and its impact on the progression of the disease.

Study population
For the present study the retrospective data of 117 patients suffering from CKD during the period March 2006 to October 2016 is used.The registered laboratories and hospitals were approached for the records of CKD patients.We contacted 550 patients, however more than 50% were reluctant to share their data.Only 248 patients responded positively.Informed consent was taken from all of them.After scrutiny, data of 117 patients were found to be suitable for the present study.The information such as gender, age, diabetes, hypertension, body mass index, hemoglobin level, urea, serum creatinine and albumin were recorded for each patient.The data are the observed stages of CKD based on the value of eGFR with individual-specific values of covariates.The five stages of chronic disease based on the value of GFR are; stage 1(GFR ≥90 ml/min/1.73m 2 ); stage 2 ( 60 ≤GFR ≤89 mL/min/1.73m 2 ); stage 3 ( 30 ≤ GFR ≤59 mL/min/1.73m 2 ); stage 4 (15 ≤GFR ≤29 mL/ min/1.73m 2 ) and stage 5 ( GFR < 15 mL/min/1.73m 2 ).Stages 1, 2, 3 and 4 are transient states that is movements from these states to other states in the forward direction are allowed.Stage 5 ( ESRD) is an absorbing state and the movement from this state to any other state is not possible.It shows the loss of kidney functions and requires either dialysis or kidney transplantation.The number of followup time points for each patient was different and visiting times were irregular.The latest recorded stage of each patient is corresponding to the last follow-up time of the patient.The covariates considered are gender (0 female, 1 male), age, diabetes (0 No, 1 Yes) hypertension (0 No, 1 Yes), body mass index, hemoglobin level, urea, serum creatinine and albumin.

Homogeneous continuous-time multistate Markov model
A homogeneous continuous-time multistate Markov model is an appropriate model for describing the progression of CKD.In this model, the observed states are precisely the same as true states of disease.States of Markov process are calculated on the basis of eGFR values.Observation times are taken as noninformative (12).A patient may make forward transition only among different transient states continuously.The arrows show the possible transition between stages.
The progression of CKD is presented below in Figure 1.
Where the intensity λ kl is the instantaneous risk of moving from state k to l.
Let P(t) denotes the transition probability matrix with transition probabilities p kl (t), k = 1,2,...,5 and l = 1,2,…,5 as its elements, Where, p ( ) The likelihood function of transition intensities is the product of probabilities of transition between observed states over all individuals and observation times.The likelihood L(Q) is maximized in term of log(λ kl ).Therefore; the estimates λ kl are obtained from log(λ kl ) using optimization technique.
The maximum likelihood estimates of the transition intensities have been obtained which in turn enable us to obtain P(t) = exp(tQ) (13).
Mean sojourn time for states can be computed from the estimated transition rates as The maximum likelihood estimates for baseline intensities (14) and regression coefficients are obtained using the msm package in R. The classification of stages of CKD is based on the value of GFR.The computation of eGFR is subject to error.This may give rise to misclassification of states.In this case we apply HMM instead of simple Markov model.

Homogeneous continuous-time HMM
Under the HMM, we assume that the unobserved true states follow Markov process with transition matrix Q and the observed states are generated from the true underlying states through misclassification matrix.The effects of covariates on rates of transition and misclassification probabilities can be modeled using generalized regressions.A general model for the progression of disease and misclassification error can be explained as follows.
Where the summation is over all feasible paths of underlying states 1, ......
S S S = . The observations are assumed to be independent given the series of true states.These are governed by misclassification probability matrix E and are independent of time.There is an assumption that disease stages are misclassified into the adjacent stage only.The misclassification probability matrix E for CKD is given by Where e rs denotes the probability that the observed state is s given that true state is r.The diagonal elements e rr of the misclassification matrix E indicate the correct classification of patients in the true stages of diseases.Rows of the matrix indicate the true states and columns of the matrix represent the observed states.Some entries of the matrix E have value zero where misclassification is not permitted.Presence of covariates affects the misclassification probabilities e rs .Covariate effects on these probabilities can be estimated using multinomial logistic regression model (15) with baseline state S 0 .Where, β rs are regression coefficients and Z are covariates.
The msm package in R has been used for estimating parameters of the model using maximum marginal likelihood method with numerical computations.There is a provision of fitting multistate Markov models in continuous time with or without misclassification error in msm package of software R. From the fitted model, the misclassification matrix, odds ratios for misclassification, observed and expected prevalence can be obtained using appropriate functions.Viterbi algorithm is the widely used method for reconstructing the most likely pathway or true pathway taken by a patient to reach the current stage without having the misclassification of stages.We can recreate the true stages actually experienced by a CKD patient to reach his current state.It was first suggested by Vitrebi et al (16) and later on Durbin (17), while Macdonald et al (18) described it for discrete time hidden Markov chains.

Results
The transition states of CKD patients in their subsequent visits have been summarized in Table 1.Generally, number of visits depends on the severity of the disease and awareness about the disease.Visits in stage 4 of CKD patients will have a higher frequency as compared to visits in lower stages.The table has been prepared by counting the number of transitions for each patient in their subsequent visits.The total number of times patients of stage 1 remains in stage 1 in their subsequent visits is 75.The number of transitions to stage 5 from stage 1, stage 2, stage 3 and stage 4 are respectively 0, 1, 2 and 51.The number of transitions from higher stages to lower stages is zero as CKD is irreversible.Estimated transition intensities of simple homogeneous continuoustime multistate model based on Markov process have been shown in Table 2.The lower diagonal entries are zero as reverse transition are not allowed in CKD.Mean sojourn time of each state with 95% confidence interval and standard error are summarized in Table 3.
Estimated survival probability curves for stage 1, stage 2, stage 3 and stage 4 have been shown in Figure 2. It is clear from the figure that there is a sharp decline in survivability of stage 4 patients nearing 1 year.
The estimated transition intensities for HMM are summarized in Table 4. Transition intensity corresponding to transition from stage 1 to stage 2 is 0.0405 whereas the transition intensity for transition from stage 4 to stage 5 is 0.346.The mean sojourn time for each state for HMM have been summarized in Table 5.The table also shows   the standard error and 95% confidence interval for mean sojourn time for each stage of CKD.The mean sojourn time for stage 1 is approximately 16 years whereas it is about 2.9 years for CKD patient with disease stage as 4.
Misclassification probabilities l for each state of CKD under HMM have been shown in Table 6.A CKD patient with stage 1 of disease will be misclassified as a patient of stage 2 with probability 0.211.A patient of stage 2 will be misclassified as stage 1 patient with probability 0.280.Table 7 shows the odds ratios for misclassification probabilities in the presence of categorical covariates.
The probability of misclassification corresponding to the observed stage 2 given the true stage 1 for females is approximately 3.8 times more than that of males.Similarly, the misclassification probability for CKD patient corresponding to the observed stage 2 given the true stage 1 having hypertension is 2.28 times more than CKD patient having no hypertension and it is 3.56 times more for CKD patients with diabetes.
The odds ratios of misclassification probabilities in the presence of continuous type covariates have been summarized in Table 8.For one year increase in the age of CKD patient, the odds are 3.52 times higher for the probability of misclassification corresponding to observed state 2 given true state 1.Similarly for a CKD patient having 1 more unit of serum creatinine the probability of misclassification corresponding to observed state 2 given true state 1 is 2.34 times more.
The most likely state of randomly chosen CKD patients of stage 2 and stage 3 based on Viterbi algorithm have been summarized in Table 9.It is clear from the table that state of CKD patient number 27 has been misclassified at time 3.91.His actual stage is 3 but has been observed as stage 2.

Discussion
CKD is a severe disease.The overall prevalence of CKD in India is about 17.2% with stage 1, stage 2, stage 3, stage 4 and stage 5 as 7%, 4.3%, 4.3% and 0.8% and 0.8% respectively (19).Multistate Markov model is often used to describe the path of the progression of stage based impact of explanatory variables (prognostic factors) on particular transition intensity can be explained by modeling the transition intensity as a function of these covariates which results in transition probability matrix P(t, x(t)).In such case, the new transition matrix Q is used in the likelihood function for estimating the transition intensities.Marshall and Jones used the proportional hazards model for studying the effect of a vector of explanatory variables on transition intensity for the individual i at a time j by replacing the transition intensity element λ kl by

Table 4 .
Estimated transition intensities for misclassification model

Table 3 .
Mean sojourn times at different stages

Table 5 .
Mean sojourn times for misclassification model

Table 7 .
Odds ratios for misclassification probabilities under categorical variable

Table 8 .
Odds ratios for misclassification probabilities under continuous variables

Table 9 .
Viterbi sequencedisease.There is a possibility of misclassification of stages as measurements and clinical observations are prone to errors.HMM is more appropriate for such cases as it takes into account the probabilities of misclassification of stages as well.The number of transitions from state i to state i in their subsequent visits are more than moving to next higher stage.This shows the slow progression of the disease.The number of transitions to next higher stage is maximum for stage 4 indicating the rapid progression of disease in advance stage 4. The diagonal entries of the transition matrix are negative which indicates the negation to the instantaneous risk of moving to other stage.The mean sojourn time for stage 1 is 15.923 years.It means that a patient of stage 1 spends on an average 15.923 years in stage 1 only before moving to stage 2. The mean sojourn time for stage 4 is 2.890 years.This reveals the fact that the progression of disease is very slow in early stages as compared to higher stages.Misclassification of stages is possible for adjacent stages only.The probabilities of misclassification of stages are higher for early stages compared to advanced stages.A patient of CKD with stage 2 of disease is most likely to be misclassified as stage 1 of CKD patient if he is not suffering from diabetes and hypertension.On the other hand, there is a high probability that a patient of CKD suffering from diabetes