Aaron Tansy, MD

Duffy L, Gajree S, Langhorne P, Stott DJ, Quinn TJ. Reliability (Inter-rater Agreement) of the Barthel Index for Assessment of Stroke Survivors: Systematic Review and Meta-analysis. Stroke. 2013; 44:462-468.

The Barthel Index (BI), a ten item measure of activities of daily living, is a frequently employed outcome metric in the clinical practice and investigation of stroke.  Its pervasive use commands it a powerful degree of influence within the stroke community: perhaps, nowhere more so than in the research space where it oft determines the success or failure of many potential therapies.

Surprisingly then, whether or not the BI is robustly reliable across a diverse observership in this patient cohort had not been examined formally – until now. In an upcoming issue of Stroke, Laura Duffy and colleagues take the first stab at establishing whether the BI stands strong or wilts in the face of scrutiny as a bona fide assessment of functional recovery in stroke.   

The authors set about to accomplish this task in a three-step process. First, they briefly lay their groundwork, based in a priori classical test theory, for what are the most critical facets that define a clinical measure as robust and reliable: in the case of a metric like the BI that is utilized in clinical trials with multiple observers, low inter-rater variability is cited as most important. Next, they conducted a thoroughly exhaustive electronic database search for titles that met a strict inclusion criterion from which they culled a short list of original stroke clinical research (10 studies; n=543 subjects) in which the BI was administered via equivalent interview methods and for which inter-rater reliability data was reported. Lastly, after performing quality and risk of bias assessments on the trials of interest, they conducted a meta-analysis on BI inter-observer variability.  

Results indicated that inter-rater reliability was excellent (kw: 0.95, 95% CI: 0.94-0.96 fixed-effects modeling; kw: 0.93, 95% CI: 0.90-0.96 random-effects modeling). Furthermore, although a non-significant trend, sub-group analysis also suggested that BI training conferred a possible improvement in reliability between raters. In total, these findings suggested that the BI is a robust and reliable measure of stroke functional recovery.  

Despite demonstrating favorable results, as a meta-analysis, this study suffered some limitations largely with respect to its chosen data set. For one, it demonstrated significant clinical heterogeneity across studies with respect to BI observers, populations studied, and number examined. Moreover, only 20% (2) of the chosen studies were evaluated as high quality based upon the authors’ own chosen index of quality and risk of bias. So, does this study herald the BI’s imminent overtaking of the spotlight from the mRS in stroke? It seems unlikely. Both indices measure different functional parameters and have different inherent limitations providing support for their combined use.  And finally now, both have a little evidence to back it up.