Author Archives: Tommi Taneli Härkänen

Example on using the Lexis prior: “Nonparametric Bayesian Intensity Model: Exploring Time-to-Event Data on Two Time Scales”

Our article (Härkänen, But and Haukka, 2017, Nonparametric Bayesian Intensity Model: Exploring Time-to-Event Data on Two Time Scales) will appear in the Scandinavian Journal of Statistics soon. The lexis example as well as the example on reading the output using the simulated data can be found in the Download section.

The Lexis prior incorporates multidimensional smoothing of the hazard surface. We demonstrated in our paper that this approach provides more accurate results than some common unidimensional methods such as the Poisson regression with splines, because the Lexis prior borrows strength in more than one dimension. This method can be useful not only in analyzing multiple time scales but also in case of ordinal covariates defining a stratification in a hazard model. In the latter case one can assume that the hazard functions of the neighbouring covariate categories are similar thus there is some continuity over the hazard functions.

The smoothing is especially useful in case of relatively small number of observations per stratum as the smoothing reduces the risk of false positive findings. When using other existing methods, the common approach is to merge strata in order to have a larger number of observations, but this approach can hide the actual change points, which can be avoided in our approach.

Example on the Stanford heart transplant survival data added

There is now a worked example containing the data files and the model description file (h1.opt) in the Download section. There is also an R markdown example on how to process the output. The example covers both the posterior expectations and credible intervals of hazard functions (both figures and tables) as well as of scalar regression coefficients.

Flexible event history analyses using the Bite software

Introduction: As data sets with different follow-up data containing event times such as dates of diagnosis, treatment, recovery and death are becoming more commonly available, also the need for more detailed statistical analyses to accommodate these additional event history data has been increasing. Each time scale defined by an event can be assumed to influence the hazard of the future events as a function. For example, consider death as the outcome. Time since birth (age) is an important determinant of the hazard of death, and in general population this hazard is monotonically increasing. After a diagnosis of cancer (without information on the lethality of cancer) the risk of death may not be monotonic as the patients with a benign tumor have a risk close to the healthy population whereas the other patients have a higher risk. Therefore the estimates of the additional hazard may be much higher during the first couple of years after the diagnosis, but after that close to the risk of the general population. Of the other time scales, the introduction of new treatments can decrease the hazard for all patients regardless of their age, thus the hazard of death can decrease sharply after that point in the calendar time.

Standard statistical software to incorporate different time scales and flexible nonparametric methods have been limited although during the past decades we have seen a rapid improvement both in numerical methods and computational resources allowing applications of wider range of statistical methodology than before.

MethodIntensity processes are a particularly useful family of models to accommodate past event times into a model to predict future events. Several approaches have been introduced to combine the effects of different risk factors into a hazard models. The most common have been based on the assumption of multiplicative hazards, and the other approach on additivity. These assumptions can be applied also to combine the hazard rates of different time scales.

Motivation: When I started working on my PhD thesis (Härkänen 2001) in the mid-90’s, theoretical work and first applications of nonparametric Bayesian methods on intensity processes had just been published. In my own applications I noticed, that the models based on intensity processes were intuitively easy to construct, but an efficient implementation required plenty of coding. The Markov chain Monte Carlo (MCMC) methods are computer intensive, thus the C language was the only viable choice 20 years ago. In the optimization of the code for updating a parameter of a hazard function, it was necessary to avoid calculating the excess Poisson likelihood terms, which would cancel out in the Hasting ratio. In a multiplicative or additive hazards model this is not straightforward, thus I decided to write a program to avoid this optimization by hand for each model separately. The result is the Bite software, which can be downloaded from this site.

Example: To illustrate a multiplicative and an additive hazards model for the hazard of death after a cancer diagnosis, one can specify these in Bite using the syntax

## Choose a multiplicative hazards model:
model death = f(birth) * g(diagnosis);
## ... or ...
## an additive  hazards model:
model death = f(birth) + g(diagnosis);

As we can see, these models can be easily defined in Bite, and more complicated models can be defined in a similar fashion. Here ‘death‘ is the outcome variables containing information on the start and end times of the follow-up time, and the event time. The outcome data file could contain an entry like e.g. ‘1972.25 1997.5 1997.5‘, in which the length of the follow-up time was 25 years and 3 months, and the death was observed at the end of the interval. A right-censored data entry would be ‘1972.25 1997.5‘. An repeated events data entry can be entered by adding the event times after the two first numbers (endpoints of the interval). ‘birth‘ and ‘diagnosis‘ are variables containing the corresponding event times, e.g. ‘1936.7′ and ‘1961.9‘, respectively. All dates use the same origin, based on the calendar time in this example. ‘f()‘ and ‘g()‘ are the hazard functions for age and time-since-diagnosis, respectively.

Foreword: Although Bite was developed about 20 years ago, it seems that its potential in event history analyses is even better today than it was before as the computing resources has dramatically increased. We hope that our software helps researchers to utilize their data sets more efficiently than before, and we look forward to hear from users’ experiences and wishes!

Tommi Härkänen.