Volume 58, Issue 6, pages 2142–2152, December 2013
You have free access to this content
Christopher Koh1,*, Xiongce Zhao2, Niharika Samala1, Sasan Sakiani1, T. Jake Liang1, Jayant A. Talwalkar3
Article first published online: 18 OCT 2013
© 2013 by the American Association for the Study of Liver Diseases
The American Association for the Study of Liver Diseases (AASLD) practice guidelines provide recommendations in diagnosing and managing patients with liver disease from available scientific evidence in combination with expert consensus opinions. The aim was to systematically review the evolution of recommendations from AASLD guidelines and identify gaps limiting the evidence-based foundations of these guidelines. Initial and current AASLD guidelines published from January 1998 to August 2012 were reviewed. The AGREE II instrument was used to evaluate rigor and transparency of guideline development. The number of recommendations, distribution of grades (strength or certainty), classes (benefit versus risk), and types of recommendations were evaluated. Whenever possible, multiple versions were evaluated for evolving scientific evidence. A total of 991 recommendations from 28 guidelines on 17 topics were evaluated. From initial to current guidelines, the total number of recommendations increased by 36% (512 to 699). The largest increases were from chronic hepatitis B virus (HBV) (+71), liver transplantation (+53), and autoimmune hepatitis (AIH) (+27). Most current recommendations are grade II (44%) and less than 20% are grade I. The AGREE II evaluation showed global improvement in guideline quality. Both HBV and chronic hepatitis C guidelines had greatest increases in grade I recommendations (+383% and +67%, respectively). The greatest increases in treatment recommendations were from HBV (grade I, +1,150%), liver transplantation (grade II, +112%), and AIH (grade III, +105%). Conclusion: Despite significant increases in the numbers of recommendations within AASLD practice guidelines over time, only a minority are supported by grade I evidence, highlighting the need for developing well-designed investigations to provide evidence for areas of uncertainty and improving the quality of future guidelines in hepatobiliary diseases. (Hepatology 2013; 58:2142–2152)
American Association for the Study of Liver Diseases
American College of Cardiology
American Heart Association
Agency for Healthcare Research and Quality
acute liver failure
Grading of Recommendation Assessment, Development, and Evaluation
hepatitis B virus
hepatitis C virus
Institute of Medicine
nonalcoholic fatty liver disease
primary biliary cirrhosis
primary sclerosing cholangitis
transjugular intrahepatic portosystemic shunt
Clinical practice guidelines are systematically developed statements that attempt to synthesize large amounts of available scientific information for providing best practices to healthcare providers. These statements often represent the official opinion of single or multiple professional societies and are developed by individuals recognized for their expertise and contributions to the field. Topics often covered include conditions (diseases, signs, and symptoms) and technologies (diagnostic tests and therapeutic procedures) where recommendations about preferred approaches for patient management are provided. The creation of recommendations is often based on a formal review and analysis of the published literature along with weighing the strength of the available scientific evidence. In situations where the data are inconclusive or absent, recommendations are often based on consensus expert opinion.
Internationally, more than 3,700 clinical practice guidelines from 39 countries are identified within the Guidelines International Network database. In the U.S., there are over 2,300 guidelines registered within the National Guidelines Clearinghouse which is supported by the Agency for Healthcare Research and Quality (AHRQ). Given the variability in terms of breadth and depth from available clinical practice guidelines, the U.S. Congress has identified the importance of establishing rigorous processes for developing trustworthy, consistent, and scientifically valid documents. In turn, the Institute of Medicine (IOM) released eight standards for the development of clinical practice guidelines in March 2011.4 Within the framework of the IOM's recommendations, there has been little systematic review of the body of clinical practice guidelines put forth by various medical societies. Recently, clinical practice guideline catalogs from the American College of Cardiology (ACC)/American Heart Association (AHA) and all endocrinology guidelines published in North America from 2007-2010 have been examined.[5, 6]
The field of hepatology has experienced significant growth in the production of relevant scientific literature over the past few decades. However, the question of whether clinical practice guidelines have truly evolved with more evidence-based recommendations has not been systematically investigated. Thus, we performed a systematic review of the American Association for the Study of Liver Diseases (AASLD) clinical practice guidelines issued from January 1998 to August 2012 with the aim of evaluating the evolution of recommendations that have been issued over time. The ultimate goal was to evaluate methodological rigor and quality of reporting of AASLD guidelines, elucidate possible gaps that limit the use of evidence-based medicine to support certain recommendations within the AASLD guidelines, and to highlight potential opportunities for improvement.
Materials and Methods
All initial published versions of the AASLD practice guidelines for a given topic issued from January 1998 to August 1, 2012 were abstracted for data.[7-23] If available, the current updated versions for each topic was also evaluated.[18, 24-34] Current AASLD guidelines are defined as the most recently published document on a specific topic which is posted on the AASLD website as of August 1, 2012 (http://www.aasld.org). For this investigation, only complete clinical practice guidelines and position papers were evaluated, thus focused updates were not included.
Evaluation of Methodological Rigor and Transparency
To evaluate the evolutionary process of guideline development and quality of reporting, the Appraisal of Guidelines for Research and Evaluation II (AGREE II) instrument was used on all comparable guidelines and position papers. The AGREE II has been widely used in the assessment of methodological rigor and transparency of guideline development and has been cited for its validity and reliability. Briefly, this tool that evaluates 23 items organized into six domains (scope and purpose, stakeholder involvement, rigor of development, clarity of presentation, applicability, and editorial independence) followed by two global rating items (overall assessment) and includes a user manual that provides guidance on rating of each item. The scope and purpose domain evaluates the specific health questions covered by the guideline, target population, and the overall objective of the guideline. The stakeholder involvement domain evaluates the appropriateness of the guideline development group and its representation of the views of its intended users. The rigor of development domain evaluates the systemic methodology used to gather and synthesize evidence, methods of recommendation formulation, and the mechanisms to update them. The clarity of presentation domain evaluates the overall structure, format, and language of the guideline. The applicability domain evaluates barriers, facilitators, and ease of implementation and resource implications of guideline application. Finally, the editorial independence domain evaluates the extent to which external influences or competing interests may have affected the specific guideline.
For this study, three appraisers conducted the assessment (C.K., S.S., N.S.) after using the online training tools recommended by the AGREE collaboration. After guideline evaluation, domain scores were calculated (as per the AGREE II manual) by summing all individual scores in each domain and then scaling the total as a percentage of the maximum possible score for a given domain according to the formula:
Evaluation of Strength of Recommendations
All guideline recommendations published by the AASLD are classified by a “grade” or “level” of recommendation. The “grade” or “level” designations are synonyms and provide an assessment of strength or certainty for a given recommendation. For the purposes of this study, the grade/level designation will be designated as “grade” hereafter.
Since 1998, the AASLD practice guideline development program has used three evidence classification systems to grade recommendations. These include (1) the Infectious Diseases Society of America's Quality Standards; (2) the American College of Cardiology / American Heart Association system; and (3) the Grading of Recommendation Assessment, Development, and Evaluation (GRADE) workgroup system (Table 1).[36-39] Despite the use of three systems, these schemes are based on the same criteria and comparable structure. Therefore, for the purposes of this study, a composite grade system was created to represent all of the issued recommendations:
- Data derived from multiple randomized controlled trials, or meta-analysis, involving a number of participants to be of statistical power and where further research is unlikely to change the confidence in the estimate of clinical effect.
- Data derived from a single randomized trial or nonrandomized studies, cohort or case-control analytic studies, and multiple time series where further research may change confidence in the estimate of the clinical effect.
- Evidence based on clinical experience, descriptive studies, opinion of respected authorities where further research is very likely to impact confidence on the estimate of clinical effect.
|Grade of Evidence|
|I = Evidence from multiple well-designed randomized controlled trials, each involving a number of participants to be of sufficient statistical power|
|II = Evidence from at least one large well-designed clinical trial with or without randomization, from cohort or case-control analytic studies, or well-designed meta-analysis|
|III = Evidence based on clinical experience, descriptive studies or reports of expert committees|
|IV = Not rated|
|I = Randomized controlled trials|
|II-1 = Controlled trials without randomization|
|II-2 = Cohort or case-control analytic studies|
|II-3 = Multiple time series, dramatic uncontrolled experiments|
|III = Opinion of respected authorities, descriptive epidemiology|
|A = Data derived from multiple randomized clinical trials or meta-analysis|
|B = Data derived from a single randomized trial, or nonrandomized studies|
|C = Only consensus opinion of experts, case studies or standard-of-care|
|High (A) = Further research is unlikely to change confidence in the estimate of the clinical effect|
|Moderate (B) = Further research may change confidence in the estimate of the clinical effect.|
|Low (C) = Further research is very likely to impact confidence on the estimate of clinical effect.|
|Class of Recommendations|
|A = Survival benefit|
|B = Improved diagnosis|
|C = Improvement in quality of life|
|D = Relevant pathophysiologic parameters improved|
|E = Impacts cost of health care|
|I = Conditions for where there is evidence and/or general agreement that a given diagnostic evaluation, procedure or treatment is beneficial, useful and effective|
|II = Conditions for which there is conflicting evidence and/or divergence of opinion about the usefulness/efficacy of a diagnostic evaluation, procedure, or treatment|
|Ila = Weight of evidence/opinion is in favor of usefulness/efficacy|
|Ilb = Usefulness/efficacy is less well established by evidence/opinion|
|III = Conditions for which there is evidence and/or general agreement that a diagnostic evaluation/procedure/treatment is not useful/effective and in some cases may be harmful|
|Strong (1) = Factors influencing the strength of the recommendation included the quality of evidence, presumed patient-important outcomes, and cost|
|Weak (2) = Variability in preferences and values, or more uncertainty. Recommendation is made with less certainty, or higher cost or resource consumption|
Evaluation of Types of Recommendations
Another aim of this study was to evaluate the evolution of the type of recommendations issued by the AASLD. Recommendations provided in AASLD practice guidelines can be classified into three types:
- (1) Recommendations based on known features of a given liver disease which should prompt further evaluation (i.e.: “Wilson Disease must be excluded in any patient with unexplained liver disease along with neurological or neuropsychiatric disorder.”).
- (2) Recommendations on specific testing for a given liver disease (i.e.: “Liver biopsy is recommended to stage the degree of liver disease in C282Y homozygotes or compound heterozygotes if liver enzymes (ALT, AST) are elevated or if ferritin is >1000 μg/L.”).
- (3) Recommendations on specific treatment for a given liver disease (i.e.: “UDCA in a dose of 13-15 mg/kg/day orally is recommended for patients with PBC who have abnormal liver enzyme values regardless of histological stage.”).
Thus, all recommendations for this analysis were classified into one of three categories: (1) Feature of Disease Recommendation; (2) Diagnostic Recommendation; or (3) Treatment Recommendation.
Evaluation of Benefit Versus Risk of Recommendations
As previously discussed, three different guideline classification systems have been used during the evolution of AASLD practice guidelines. Depending on the system used, certain guidelines provided information regarding benefit versus risk for a given recommendation. This information is different from the “grade” of recommendation and was designated as the “class” of recommendation. In the final part of this analysis, we evaluated the evolution of “class” recommendations provided in multiple versions of guidelines for a specific liver disease topic. However, unlike the grade systems assessing strength and certainty, the “class” systems used over time differed greatly and the development of a composite scoring system could not be created for comparative analysis. Therefore, the “class” analysis was only performed on guidelines that used the same scoring system.
Historical Guideline Summary
From January 1998 to August 1, 2012, the AASLD issued 28 clinical practice guidelines on 17 topics, yielding a total of 991 recommendations. When examining the initial publication for each AASLD guideline topic, a total of 512 recommendations were issued. The three guidelines with the greatest number of recommendations include Vascular Disorders of the Liver (64), Hepatitis C (HCV) (49), and the Diagnosis and Management of Nonalcoholic Fatty Liver Disease (NAFLD) (45). Of these 512 recommendations, 14% were grade I recommendations, 40% were grade II, and 46% were grade III (Table 2). Regarding the types of recommendations, 14% were Feature of Disease recommendations, 28% were Diagnostic Recommendations, and 58% were Treatment Recommendations (Supporting Table 1).