The Glasgow outcome scale (GOS) is the most widely used method to describe overall outcome after head injury.1 The GOS is quick to administer, can be applied to all cases, and has clinically relevant categories. These practical advantages have led to its widespread adoption in early management studies and clinical trials. We have recently described a standard format for an interview on which to base assessment of GOS category,23 which removes ambiguity,4 and previous lack of guidelines.5This improves the reliability of the scale,2 but it is important to establish the relation between this format of the GOS, particularly the extended GOS (GOSE), and other measures of outcome and clinical status.
Relations have been reported between the GOS and injury severity,6 neuropsychological impairment,67and measures of disability and handicap.8-11 However, in outcome assessment there is an increasing focus on measures of health outcome incorporating the person's own perspective. The extent to which assignments on the GOS may miss important aspects of quality of life has not been studied.8
The present study therefore had two aims. Firstly, to investigate aspects of the validity of the structured interviews for the GOS and GOSE through relating their results to indices of injury severity and subsequent limitations. Secondly, to gain greater understanding of the relation between disability and handicap as assessed by the GOS and other measures of outcome, particularly those reflecting subjective perception of health.
One hundred and thirty five patients with head injury (113 male patients) who had been admitted to the regional neurosurgical unit were recruited to the study. We excluded patients previously admitted for a neurosurgical or psychiatric disorder, or treatment for alcohol misuse. Informed consent was obtained for all patients. The participants were aged between 16 and 69 (mean (SD) 36.74 (13.92) years). Classified by the worst recorded Glasgow coma score (GCS)12 48 (36%) patients had severe injury (GCS⩽8), 28 (21%) a moderate injury (GCS 9–12), and 59 (44%) a mild/minor injury (GCS 12–15). Seventy two (53%) had undergone a neurosurgical operation.
Participants were assessed 5 to 10 months after injury (mean (SD) 7.39 (1.19) months) and interviewed either alone (65%) or with a relative or friend (35%). The duration of post-traumatic amnesia (PTA) was determined at interview.
Ratings on the GOS and GOSE were obtained by a research psychologist using a structured interview.2 The GOS is a five point scale: death, vegetative state, severe disability (SD), moderate disability (MD), and good recovery (GR). The GOSE is an eight point scale in which the last three categories on the GOS are divided into upper and lower bands. The interview to obtain information to apply the GOSE consists of a series of questions covering consciousness, independence inside and outside the home, major social roles (work, social and leisure activities, family and friendships), and return to normal life. The final rating is based on the lowest category of outcome indicated by the responses. For the purposes of this study we did not attempt to distinguish disability due to brain injury from disability due to extracranial injuries occurring at the same time as the head injury. Categories of GOS were obtained by collapsing the subdivisions of the GOSE. Of the 135 participants 39 (29%) were rated as severely disabled, 44 (33%) as moderately disabled, and 52 (39%) as good recovery.
The disability rating scale (DRS)13 and the Barthel activities of daily living (ADL) index14 were completed by the interviewer at the time of follow up.
The tests administered were selected from the portfolio described by a working party of the National Institute of Neurological Disorders and Stroke.15 The tests were the Rey figure copy and delayed recall; grooved pegboard with left and right hands; controlled oral word association test (COWAT); symbol digit modalities test; trail making form B; and Wisconsin card sort test. In addition, verbal memory ability was assessed using immediate and delayed paired associates learning and immediate and delayed logical memory.16 We also employed the national adult reading test-revised (NART) as an estimate of before injury.17
The United Kingdom version of the 36-item short form health survey (SF-36)18 (n=135), the general health questionnaire19 (n=109), and the Beck depression inventory20 (n=109) were administered. The purpose of each questionnaire was explained to the patient with head injury, and all patients indicated that they comprehended the task involved.
Head injury symptoms and problems
The neurobehavioural functioning inventory (NFI)21was completed independently by the patient with head injury (n=106) and also by a relative or close friend (n=100). The frequency of occurrence for each item on the NFI was rated on a four point scale: never (1), sometimes (2), often (3), or always (4).
The main relations are summarised first, and then their nature is described in more detail in subsequent sections. Table 1 shows Spearman correlations between the GOS and GOSE ratings and the main variables and measures studied. There are substantial correlations between the GOS and GOSE ratings and measures of both initial injury severity (particularly PTA) and of sequelae of injury assessed by disability scales (particularly the DRS). Relations with cognitive tests are generally more modest, and strongest for controlled oral word association and delayed logical memory. There are strong correlations with self report measures of emotional state and quality of life; thus, the GOS and GOSE ratings were related to the extent of depression measured by the Beck depression inventory, and to each subscale of the SF-36. The GOS and GOSE ratings were also related to all subscales of the NFI, with strongest correlations with the reports provided by relatives or friends.
There was no significant effect of age on quality of outcome in these survivors of head injury. Preinjury IQ estimated by the NART had modest but significant relations with the GOS and GOSE ratings.
MEASURES OF INJURY SEVERITY
There was a stronger correlation between PTA and outcome than between GCS and outcome. Cross tabulation of GOSE with duration of PTA (table 2) indicated that the relation held across the full range of outcomes. In the lower category of GR 71% of participants had a PTA of greater than 1 day compared with only 33% of participants in the upper category of GR. This is consistent with the sequelae experienced by participants in the lower GR group being at least in part a reflection of more severe brain injury than in the upper GR group. Furthermore the proportion of patients with PTA greater than 7 days was highest (77%) in the lower category of SD, consistent with this group having had the most severe injuries.
Scores on the Barthel index of daily living correlated significantly with GOS and GOSE ratings, but showed a very substantial ceiling effect (fig 1). Thus, 85% of participants were assigned a maximum score on the Barthel index. Although the categories of the Barthel index discriminate between upper and lower categories of severe disability the ceiling effect was already apparent in the upper category of SD. The Barthel index reflects competence in activities of daily living within the home, and does not assess abilities necessary for independence outside the home. There was a strong correlation between allocation on the DRS and on the GOS. The relation is illustrated in fig 2, which shows also that there is a ceiling effect on the DRS in the upper moderate disability range. The DRS apparently does not discriminate effectively between the top three categories in the GOSE, to which 56% of participants in the current sample were allocated.
Box plot of Barthel ADL index scores against category on the GOSE. The box represents the interquartile range which contains 50% of values on the Barthel index in each GOSE category; the median is indicated by a heavy line. The whiskers are lines that extend from the box to the highest and lowest values, excluding outliers. Outliers (o) are cases with values between 1.5 and 3 box lengths from the upper or lower edge of the box. Extremes (*) are cases with values more than three box lengths from the upper or lower edge of the box.
Box plot of disability rating scale scores against category on the GOSE.
There were significant correlations between the GOS scales and eight of the 12 neuropsychological tests (table 1). To determine if significant relations were due to differences in premorbid IQ, partial correlation coefficients were computed controlling for NART. There were significant (p<0.05) correlations between the GOS in original and extended forms and the results obtained from the COWAT, symbol digit modalities test, grooved pegboard dominant and non-dominant hands, logical memory immediate and delayed; and between the GOSE and paired associates delayed recall. Thus associations between outcome classified by the GOS and the results of cognitive tests do not simply reflect preinjury IQ.
SUBJECTIVE PERCEPTION OF HEALTH OUTCOME
The relation between assignments on the GOSE and the score obtained using the Beck depression inventory is shown in figure 3. On the GOSE scale, the median values of the Beck scores for the groups are appropriately ordered, apart from the scores reported from the most severely disabled group. This reflects a wide range of values in the lower category of SD, which contained those reporting the highest levels of depressive symptomatology, but also contained patients reporting relatively little affective disturbance.
Box plot of Beck depression inventory scores against category on the GOSE.
The allocation on the GOS correlated significantly with scores on the general health questionnaire, and with results for each subscale of the SF-36. Standardised scores for the SF-36 were calculated using the norms provided by Jekinson et al.18 Mean scores of groups in each category of the GOSE for each subscale are illustrated in fig 4. The mean scores of the groups are generally appropriately ordered for each subscale; there was a particularly strong relation between the rating on the GOSE and social functioning.
SF-36: mean subscale scores of groups in each category of the GOSE. Error bars represent 1 SD.
HEAD INJURY SYMPTOMS AND PROBLEMS
Items in the NFI were grouped into six subscales and mean ratings were calculated. The relation between the GOSE category and the relatives' NFI score is shown in fig 5, and the patients' NFI in fig6. The figures illustrate consistent associations between NFI subscale scores and the GOSE ratings, particularly for the NFI scores based on reports from relatives or friends. Comparison of figs 5 and 6 indicates that the overall frequency of reports of problems by patients and relatives was similar.
Relatives' NFI: mean subscale scores of groups in each category of the GOSE. The minimum NFI score is 1 and the maximum is 4. Error bars represent 1 SD.
Head injured participants' NFI: mean subscale scores of groups in each category of the GOSE. Error bars represent 1 SD.
Spearman correlations between the GOS and GOSE and demographic, clinical, and outcome variables
Duration of post-traumatic amnesia (PTA) and the GOSE
The results of this study show good concordance between the GOS scales and injury severity, rating of disability, the results of cognitive testing, measures of the perception of health, and symptoms reported by people with head injury and their relatives. The findings thus strongly support the validity of assigning the GOS using the structured interview,2 both for the original five category scale and for eight categories in the GOSE. The structured interview specifies criteria for subdividing upper and lower bands of the top three outcome categories, rather than simply depending on the judgement of the rater as originally proposed.22 We focused on outcome at 6 months, which is often chosen as a follow up period for assessing outcome in studies of patients with head injury.22 At this stage after injury the severity of the initial injury is still quite closely related to the extent of sequelae. However, adaptation and adjustment continue beyond 6 months, and this may change the relations between the GOS and cognitive impairment7 and emotional state.23
The DRS score also showed a strong correlation with the GOS rating, but the results confirm that DRS grades show a ceiling effect when compared with the GOS.11 The results support the view that the DRS may be of value in monitoring the progress of severely disabled patients,10 but is of less value in rating higher levels of outcome. Similarly, the results show that the Barthel index may be of use in subdividing the severely disabled group but it is of no value in discriminating the status of most survivors of head injury.
The current findings are in accord with previous studies of the relation between outcome categories on the GOS and evidence of neuropsychological impairment that suggested a significant but modest relation.6724The finding that cognitive impairment is not more strongly related to social disability after head injury is in line with several recent reports that cognitive status does not have a major influence on disability and handicap.2526
The comparisons with the results from the Beck, SF-36, and general health questionnaires showed good general agreement between these subjective measures of health outcome and the GOS and GOSE. Previous work has shown a strong association between the GOS and affective state 6 months after injury.23 The results show particularly pronounced separation of GOSE categories on the social functioning subscale of the SF-36. This strongly supports the view that the GOS does not simply reflect physical disability but also encompasses social limitations after head injury. The GOS thus successfully captures aspects of outcome which are significant for emotional adjustment and quality of life.
Boake and High8 report examples of cases illustrating a dissociation between disability and quality of life after head injury. A severely disabled person may be supported financially and become adjusted emotionally to loss of independence; on the other hand, a moderately disabled person may find themselves in poor circumstances and be much more distressed. These cases raise the issue of whether support for severely disabled patients typically leads to better emotional adjustment in these patients than moderately disabled patients. This is not supported by findings for our sample of patients, although there may be individual exceptions The findings also contradict the common belief that patients with head injury often lack insight into their difficulties. If so, then there should be little or no relation between emotional state and the extent of disability. By contrast, we found that the overwhelming trend is for greater disability and handicap to be associated with poorer subjective outcome. Only in the lower severely disabled category was it possible that a substantial proportion of our patients may have shown loss of insight or paradoxical euphoria.
Appropriate use of the GOS depends on having a clear conception of the strengths and limitations of the scale. The GOS provides an overall measure of social changes due to head injury, and does not provide a detailed assessment of impairment and disability. Simplification is achieved by using a core set of roles to describe major aspects of people's lifestyles, including ability to manage their own affairs, employment, social and leisure activities, and close relationships. These roles are readily understood by the patient and, in conscious survivors, changes in these roles are used to assess the impact of impairment and disability caused by head injury. It should be borne in mind that the GOS is primarily intended to describe outcome in groups of cases, and is not of value, for example, in the individual assessment necessary for rehabilitation or treatment of specific problems related to head injury. The current comparison between GOS and GOSE ratings and other measures supports the validity of the scales. The results also support the appropriateness of the GOS as an overall summary measure of outcome after head injury. The assessment is relevant to differentiating the sequelae of different injuries, and provides a link between the specific sequelae and effects on wellbeing and lifestyle.
The study was supported by a project grant from the Chief Scientist Office, Scottish Home and Health Department.
Methodological shortcomings of trials to date and the failure to translate
For clinical trials in TBI, validated tools with which to measure entry severity and outcome – namely the Glasgow Coma Scale (Teasdale and Jennett, 1974) and the Glasgow Outcome Scale (Jennett et al., 1976; Wilson et al., 1998) – only became available in the 1970s (Maas et al., 2010a). A systematic literature review from 1980 to 2009 demonstrated that there were 27 large phase III studies for TBI with an additional six trials unpublished (Narayan et al., 2002; Maas et al., 2007; Margulies and Hicks, 2009; Maas et al., 2010a), some of which were discussed above. No agent has as yet demonstrated sufficient efficacy for US FDA approval and therefore none has been successfully translated to application in human TBI.
Pessimism surrounding these failures has led to a decrease in TBI studies since the mid-1990s (Maas et al., 2010a). Moreover, there has been a shift away from the study of neuroprotective drugs to the study of therapeutic strategies such as hypothermia or decompressive craniectomy (Maas et al., 2010a). The neurotrauma community has responded with appropriate introspection aimed at improving the chances of future success. Indeed, a 2000 NINDS workshop was convened to try to prevent repeating mistakes of the past (Narayan et al., 2002).
It is likely that the many hundreds of therapeutic agents which have shown benefit in preclinical trials have failed to show efficacy for a variety of reasons (Reinert and Bullock, 1999; Doppenberg et al., 2004; Margulies and Hicks, 2009). A number of potential explanations are summarized here and in Table 47.2; as in stroke trials, several causes likely contribute to the problem to some degree (Table 47.3).
Limitations of animal models
It is important for us to carefully consider the limitations of current animal models of TBI. The marked discrepancy is clear when we consider that investigators do not injure animals in a fashion mandating ICU-like ventilation and critical care for coma in traumatically brain-injured animals in the fashion required by humans with severe injuries (Narayan et al., 2002; Tolias and Bullock, 2004). TBI with associated coma has been explored in a small number of large animal models with some success but these lack the extensive characterization and validated outcome measures inherent to rodent studies (Smith et al., 2000). The rodent lateral fluid percussion injury model is the most frequently used TBI model worldwide, though rodent models of controlled cortical contusion or weight-drop have also been widely used (Povlishock et al., 1994). Ideally, an agent should demonstrate efficacy in numerous preclinical injury models rather than in a single rodent strain subjected to the same form of injury (Tolias and Bullock, 2004).
Though many agents were extensively characterized prior to initiation of human clinical trials, it is clear that this characterization was universally insufficient. Some agents did not demonstrate efficacy in a clinically relevant time point, such as Cerestat, which was never tested later than 15 minutes post-insult in the animal model (Tolias and Bullock, 2004; Maas et al., 2010a). This might explain not only the lack of efficacy but also account for unexpected toxicity encountered in some trials (Selfotel, Cerestat, hypothermia) (Tolias and Bullock, 2004). Demonstration of acceptable blood–brain barrier penetration is also important but often overlooked in preclinical studies (Tolias and Bullock, 2004; Maas et al., 2010a). It is also possible that some therapeutic agents are more important for certain TBI subsets or more effective at certain postinjury time points (Saatman et al., 2008). Efforts to more precisely identify patients likely to benefit from particular drugs and target these patients in clinical trials is an important goal (Doppenberg et al., 2004). Moreover, the use of combined therapies which act by blocking different components of the secondary injury cascade important at different times after injury may be prudent (Faden, 2001; Margulies and Hicks, 2009).
There has been much debate about the utility and necessity of testing experimental therapeutics in gyrencephalic large animal models of TBI. Prior to human testing it would be helpful for experimental agents to demonstrate robust efficacy in a gyrencephalic brain. Indeed, the Stroke Treatment Academic Industry Roundtable (STAIR) committee has recommended that agents being tested for efficacy in acute stroke demonstrate benefit in primate models prior to human testing (Cook and Tymianski, 2012) (Table 47.3). Such studies would, however, be extremely costly and it is uncertain if they predict success in humans to a greater degree than rodent models. Indeed, the neuroprotectant drug NXY-059 demonstrated benefit in a primate stroke model but failed to show benefit in a recent human clinical trial (Marshall et al., 2003; Shuaib et al., 2007; Cook and Tymianski, 2012). To date no gyrencephalic animal model has made any major contribution to evaluation of neuroprotectant drugs for TBI and to our knowledge none of the agents that reached phase III testing for TBI were tested in primates.
Pathomechanistic primate studies done in the 1980s allowed a dose-response concept to be proposed for magnitude and duration of deceleration force in relation to the extent of DAI lesions (Gennarelli et al., 1982; Margulies et al., 1990; Maxwell et al., 1993). We believe these to be the only primate studies that have advanced understanding of TBI, however. Literature to date does not strongly support the use of primate models for the evaluation of putative neuroprotective drugs for TBI. Primate models of TBI have been infrequently studied, while rat models offer excellent concordance between the distribution and magnitude of histologic damage – especially hippocampal – and behavior. This is especially true of Morris water maze performance after TBI, as shown in the fluid percussion injury model (Dixon et al., 1988) and the weight drop injury model of Marmarou (Foda and Marmarou, 1994). These rat models have also shown excellent sensitivity for the evaluation of neuroprotective drugs, as discussed elsewhere in this volume.
Heterogeneity of the traumatic brain injury population
Without question, one of the most important barriers to successful translation of therapeutics has been the tremendous heterogeneity of the human TBI population (Saatman et al., 2008). Multicenter trials to date have clearly demonstrated tremendous heterogeneity in TBI, as well as significant differences in the studied populations related to admission policies, population coverage, age and injury mix, and treatment protocols (“type I error” effects). Some of this heterogeneity also involves alcohol and drug use, comorbidities, polytrauma, and genetics which are additional influences on the clinical course following TBI (Maas et al., 2007). The resulting intercenter variability and population heterogeneity in human trials (Clifton et al., 2001b; Hukkelhoven et al., 2002) is in stark contrast with the fact that therapeutic agents tested first proved themselves in homogeneous populations of animals, often with a single model or species and strain employed (Maas et al., 2010a). In preclinical experimental studies, genetically similar animals, often of the same sex, are subjected to highly uniform TBI in an effort to maximize the chances of detecting a therapeutic effect between groups. There has been growing concern among preclinical scientists who evaluated therapies in rodent models of TBI that these models may be fundamentally incapable of effectively reproducing the complexity of severe human TBI, which is characterized by multiple interacting pathomechanisms within the same patient at the same or at different times.
Indeed, the pattern of injury in human TBI patients enrolled in clinical trials is extremely complex and highly variable by comparison. It must also be recognized that the brain has been described as the most complex structure in the known universe and the same injury processes can have markedly different consequences when acting in brain regions located only millimeters away from each other. Interacting with the incredible complexity of the brain are the multitude of common TBI lesions which consist of extra-axial hematomas (subdural and epidural), brain contusion and diffuse axonal injury which often coexist to varying degrees. Because of this complexity, human TBI has traditionally been classified merely by severity. It is thus understandable that a markedly larger sample size of human patients would be required in a human study, even if a “single” subset of human TBI subjects were to be studied (such as hematomas or contusions).
We must also consider that animal studies typically employ standardized time points for administration of experimental agents. Such an approach in humans is not possible given variability inherent to transport times and uncertainty related to the time of injury. Thus it is probable that many human TBI studies have been markedly underpowered in the context of their heterogeneity.
Insufficient sample size
Heterogeneity markedly increases the sample size needed to demonstrate a clinical effect. In this context it is generally felt that the great majority of trials in TBI have been grossly underpowered and many may have been subject to type II error. Indeed, Dickinson et al. (2000) demonstrated that out of 203 human TBI trials none was large enough to reliably detect a 5% absolute reduction in the risk of death or disability, and only eight were large enough to detect an absolute reduction of 10%. Novel more sensitive statistical approaches have been proposed to analyze clinical trials, as will be discussed. An alternate approach is to accrue an extremely large sample size (Tolias and Bullock, 2004). The CRASH trials are an example of the “mega-trial” approach to dealing with heterogeneity (Ghajar and Hesdorffer, 2004); however, these are so resource intensive they are not feasible for routine application.
Questionable experimentation/result reporting
A growing literature suggests that many published scientific findings are not readily reproducible (Laine et al., 2007). It is possible that this stems from type I error or from biased experimentation or data interpretation. Although some of the agents explored in TBI trials appear to be supported by robust preclinical data, it is important to consider that some tested agents may not be as efficacious as reported. Indeed, it has reported that in some instances source document verification, independent outcome observer blinding verification, and second laboratory verification of animal studies should be mandatory before deployment of millions of dollars in clinical trials (Maas et al., 2010a).