|Year : 2015 | Volume
| Issue : 4 | Page : 178-180
Sources of bias in diagnostic accuracy studies
Areej Abdul Ghani Al Fattani1, Abdulla Aljoudi2
1 Department of Pediatrics, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
2 Department of Family and Community Medicine, College of Medicine, King Fahd Hospital of the University, University of Dammam, Saudi Arabia
|Date of Web Publication||16-Dec-2015|
Areej Abdul Ghani Al Fattani
Department of Pediatrics, King Faisal Specialist Hospital and Research Centre, Riyadh
Source of Support: None, Conflict of Interest: None
Diagnostic accuracy studies are known to have different sources of bias. Usually, those biases are not stated clearly in papers and lead to disruption in conclusion. Because pathologists are the producers, readers, or consumers of diagnostic accuracy studies, it is essential for them to know the types of bias that will affect the outcome of the test. This article provides spot light on couple of common sources of bias and their effect on validity and reliability of the results.
Keywords: Accuracy, bias, diagnostic, spectrum, verification
|How to cite this article:|
Al Fattani AA, Aljoudi A. Sources of bias in diagnostic accuracy studies. J Appl Hematol 2015;6:178-80
| Introduction|| |
In classical diagnostic accuracy study, the investigator includes consecutive group of patients suspected with a disease condition. They are all tested by an index test under investigation, and then they are all tested by a reference test (gold standard test). The results of both tests are read independently by independent interpreters blinded to the results of both tests. The measures of diagnostic accuracy, such as sensitivity and specificity are then calculated. Deficiencies in this classical design of diagnostic accuracy studies could arise either from bias or variation (or called random errors). Bias refers to the systematic difference between observed measurements and true value. Unlike variations which may limit the generalizability, bias affects the validity, and it will not be covered or balanced by repetition or by increasing the sample size because it is related to the study design.
Whiting and colleagues summarized in their systematic review, many variations and sources of bias by analyzing 10 diagnostic accuracy studies. This imitative work has been complemented by Rutjes and colleagues empirical study on 31 meta-analyses including 487 primary diagnostic accuracy studies. They reported 15 potential sources of bias, and only one study had no design deficiencies, but most studies were poorly reported which makes it difficult to assess sources of bias. In both of these two studies, three consistent design features appear as common sources of bias. The first source was spectrum bias where patients with and without the target disease were poorly selected. The second source was verification bias where only patients who tested positive with the index test were tested by the gold standard test. The third type of bias was lack of blinding among interpreters of the results of gold standards and index test.
Spectrum bias appears when there is difference in the performance of the sensitivity and specificity of a test with different mix of patient's populations. Populations might vary by sex, age, and severity of disease as examples. Spectrum bias could be raised by two factors; different mix of cases, different mix of controls. In cases, the researcher may have different stages or grades of the target disease, which may affect the overall accuracy when comparing to similar studies took two groups, sever versus mild cases. Such selection can enhance for higher test accuracies of the diagnostic test. Similarly in controls, the researcher may recruit incongruent controls that have different underlying disease or co-morbidities that may affect the target disease in some way. The point here is that advanced disease is more detectable compared to early-stage disease because of the evident signs and symptoms. We would expect the diagnostic accuracy for an index test in highly suspected patients is to be greater than in patients with less susceptibility. That explains why some case–control studies tend to report higher test accuracies than cohort studies for a single disease. Thus, studies that examine the same index test but with populations of different disease severity might be not comparable. In Rutjes report, they found the highest estimates of bias in diagnostic studies that had severe cases versus healthy controls (relative diagnostic odds ratio [RDOR] 4.9 95% confidence interval [CI]: 0.6–37.3).
| Case Report|| |
Dr. Hasan sees a 12-year-old girl complaining of a sore throat, without cough, fever, or tonsillar exudate. Group A beta hemolytic Streptococcus (GABHS) comes to his mind which may cause such symptoms. Would he start the treatment for this patients or he should request for further investigation like throat swap? He consulted the literature and found a study of Dagnelie et al. who reported a prevalence of 33% and a LR+ 1.8 (postprobability 47%) of GABHS in patients with similar features attended family medicine clinics. He also found McIsaac et al. study, who studied a group of patients with similar features but the population were children (3–17-year-old). In a later study, the prevalence is 34% and the LR+ is 4.1 (postprobability is 68%) in patients from general practice.
The likelihood ratio links the pretest probability (disease prevalence) and the posttest probability to the stent that, even in similar settings, features, and with a similar prevalence of the disease, the prediction measures behave differently. The LR+ for those in primary health care setting presenting with a sore throat is 1.8, compared to 4.1 in the subgroup of children. This difference will have its impact on the diagnostic decision. [Table 1] shows the differences of diagnostic accuracy measures of two studies in patients with a sore throat.
|Table 1: The diagnostic accuracy measures of the two studies in patients with a sore throat. (Willis BH, 2008)|
Click here to view
In summary, the right population for diagnostic accuracy test study includes (1) uncertain cases which we will use the test to resolve our uncertainly, (2) patients with disease who have wide spectrum of severity, (3) patients without the disease, but have symptoms commonly associated with it. Referral pattern is an important determinant which affects spectrum bias and should be looked at when comparing test performance. With each referral process, some cases are removed which leads to narrower spectrum of disease by removal of easy cases and subsequently impacting the diagnostic accuracy.
Partial verification bias is another type of bias in diagnostic accuracy studies that related to the reference test and the outcome. All those who are tested by the index test should be tested by the reference test (gold standard), regardless of the results of the index test. Failure to do so can cause bias in accuracy estimate called partial verification bias. This type of bias is common when the reference test is invasive or expensive and only cases with positive results are sampled and examined by the reference test. For example, a set of 1000 patients suspected for lower extremity thrombosis were selected to estimate the diagnosis accuracy by D-dimer as the index test and magnetic resonance venography (MRV) as reference test. Due to high cost of MRV, not all patients undergo the verification part. [Table 2] shows the difference in diagnostic performance in the observed in practice set and in the actual (i.e., the results that would have been obtained when all patients are verified).
|Table 2: Illustrating example for the effect of partial verification bias on the diagnostic performance|
Click here to view
As noted above, partial verification bias can be eliminated by applying the gold standard test to all cases. As this could not be practical or ethical, one alternative is to verify the remaining cases with different reference test. The issue here is the difference between accuracies of the two reference tests. The group of cases who referred to the inferior reference test would be suspected for bias which we call it differential verification bias. Both types of verification will falsely increase the estimate of sensitivity and may/may not decrease the specificity due to the positive correlation between index and reference test. Using different reference tests for patients tested positively and negatively by the index test compared to using one reference test for all patients tested by index test leads to an overestimation of the diagnostics performance (RDOR 2.2; 95% CI: 1.5–3.3).
Diagnostic review bias occurs when the interpretation of the reference test results is dependent on the index test results and weakens the results of retrospective studies. As an example, when the pathologist interpreting the final report of bone marrow biopsy is aware of complete blood count differential test. This might make the pathologist search more carefully for evidence of leukemia, if the differential shows some blasts. Although clinically it is important to use all the information's when making diagnosis, this weakens the diagnostic accuracy studies. That is why considering and reporting the blinding in diagnostic studies is very important, but actually it happens quite poor.
| Conclusion|| |
The evidence from good, considerable systematic reviews shows the importance of careful design in diagnostic accuracy studies. Results from studies of same test can release different results depending on the design choices. Sources of bias in diagnostic studies should be considered for the investigators when they design for primary studies, for the reviewers, and for the readers. Quality of reporting of such studies, besides the good design will provide reliable, reproducible, and robust estimates. A very helpful tool to standardize diagnostic reporting is STARD (standards for reporting of diagnostic accuracy). This link http://www.equator-network.org/reporting-guidelines/stard/includes checklist and information STARD initiative.
Financial Support and Sponsorship
Conflicts of Interest
There are no conflicts of interest
| References|| |
Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. United States: Oxford University Press; 2003.
Whiting PF, Rutjes AWS, Westwood ME, Mallett S, Group Q-S, Mallett S, et al
. A systematic review classifies sources of bias and variation in diagnostic test accuracy studies. J Clin Epidemiol 2013;66:1093-104. PubMed PMID: Medline:23958378. English.
Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PM. Evidence of bias and variation in diagnostic accuracy studies. CMAJ 2006;174:469-76.
Furukawa TA, Guyatt GH. Sources of bias in diagnostic accuracy studies and the diagnostic process. CMAJ 2006;174:481-2.
Goehring C, Perrier A, Morabia A. Spectrum bias: A quantitative and graphical analysis of the variability of medical diagnostic test performance. Stat Med 2004;23:125-35.
Willis BH. Spectrum bias – Why clinicians need to be cautious when applying diagnostic test studies. Fam Pract 2008;25:390-6.
Dagnelie CF, Bartelink ML, van der Graaf Y, Goessens W, de Melker RA. Towards a better diagnosis of throat infections (with group A beta-haemolytic Streptococcus
) in general practice. Br J Gen Pract 1998;48:959-62.
McIsaac WJ, Kellner JD, Aufricht P, Vanjaka A, Low DE. Empirical validation of guidelines for the management of pharyngitis in children and adults. JAMA 2004;291:1587-95.
Montori VM, Wyer P, Newman TB, Keitz S, Guyatt G; Evidence-Based Medicine Teaching Tips Working Group. Tips for learners of evidence-based medicine: 5. The effect of spectrum of disease on the performance of diagnostic tests. CMAJ 2005;173:385-90.
Schmidt RL, Factor RE. Understanding sources of bias in diagnostic accuracy studies. Arch Pathol Lab Med 2013;137:558-65.
Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, et al.
Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061-6.
[Table 1], [Table 2]