February | 2017

Introduction: As data sets with different follow-up data containing event times such as dates of diagnosis, treatment, recovery and death are becoming more commonly available, also the need for more detailed statistical analyses to accommodate these additional event history data has been increasing. Each time scale defined by an event can be assumed to influence the hazard of the future events as a function. For example, consider death as the outcome. Time since birth (age) is an important determinant of the hazard of death, and in general population this hazard is monotonically increasing. After a diagnosis of cancer (without information on the lethality of cancer) the risk of death may not be monotonic as the patients with a benign tumor have a risk close to the healthy population whereas the other patients have a higher risk. Therefore the estimates of the additional hazard may be much higher during the first couple of years after the diagnosis, but after that close to the risk of the general population. Of the other time scales, the introduction of new treatments can decrease the hazard for all patients regardless of their age, thus the hazard of death can decrease sharply after that point in the calendar time.

Standard statistical software to incorporate different time scales and flexible nonparametric methods have been limited although during the past decades we have seen a rapid improvement both in numerical methods and computational resources allowing applications of wider range of statistical methodology than before.

Method: Intensity processes are a particularly useful family of models to accommodate past event times into a model to predict future events. Several approaches have been introduced to combine the effects of different risk factors into a hazard models. The most common have been based on the assumption of multiplicative hazards, and the other approach on additivity. These assumptions can be applied also to combine the hazard rates of different time scales.

Motivation: When I started working on my PhD thesis (Härkänen 2001) in the mid-90’s, theoretical work and first applications of nonparametric Bayesian methods on intensity processes had just been published. In my own applications I noticed, that the models based on intensity processes were intuitively easy to construct, but an efficient implementation required plenty of coding. The Markov chain Monte Carlo (MCMC) methods are computer intensive, thus the C language was the only viable choice 20 years ago. In the optimization of the code for updating a parameter of a hazard function, it was necessary to avoid calculating the excess Poisson likelihood terms, which would cancel out in the Hasting ratio. In a multiplicative or additive hazards model this is not straightforward, thus I decided to write a program to avoid this optimization by hand for each model separately. The result is the Bite software, which can be downloaded from this site.

Example: To illustrate a multiplicative and an additive hazards model for the hazard of death after a cancer diagnosis, one can specify these in Bite using the syntax

## Choose a multiplicative hazards model:
model death = f(birth) * g(diagnosis);
## ... or ...
## an additive  hazards model:
model death = f(birth) + g(diagnosis);

As we can see, these models can be easily defined in Bite, and more complicated models can be defined in a similar fashion. Here ‘death‘ is the outcome variables containing information on the start and end times of the follow-up time, and the event time. The outcome data file could contain an entry like e.g. ‘1972.25 1997.5 1997.5‘, in which the length of the follow-up time was 25 years and 3 months, and the death was observed at the end of the interval. A right-censored data entry would be ‘1972.25 1997.5‘. An repeated events data entry can be entered by adding the event times after the two first numbers (endpoints of the interval). ‘birth‘ and ‘diagnosis‘ are variables containing the corresponding event times, e.g. ‘1936.7′ and ‘1961.9‘, respectively. All dates use the same origin, based on the calendar time in this example. ‘f()‘ and ‘g()‘ are the hazard functions for age and time-since-diagnosis, respectively.

Foreword: Although Bite was developed about 20 years ago, it seems that its potential in event history analyses is even better today than it was before as the computing resources has dramatically increased. We hope that our software helps researchers to utilize their data sets more efficiently than before, and we look forward to hear from users’ experiences and wishes!

Tommi Härkänen.

BITE

Bayesian Intensity Estimator

Monthly Archives: February 2017

Flexible event history analyses using the Bite software