Category: Reflections

Reflections of the guideline methodologist

  • What the discussions at GRADE Porto suggest about where evidence-based guidance is heading

    What the discussions at GRADE Porto suggest about where evidence-based guidance is heading

    The image above is of the Douro River in Porto. I was surprised to see so many port manufacturers clustered in the same part of the town, all on the same side of the river and within a relatively small area . Quite different from the more dispersed pattern I would normally expect from wine producers. I’m deliberately avoiding any neat metaphors about research or guideline development here. I just thought it was interesting.

    I am not a clinician but a guideline methodology expert; you can read my full disclaimer here.

    Where GRADE Porto suggests evidence-based guidance is heading

    This post reflects my interpretation of broad themes emerging from discussions at the recent GRADE Working Group meeting in Porto. It is not intended as a summary of specific sessions or ongoing projects, but rather as a reflection on wider methodological and strategic directions within evidence-based guidance.

    The recent GRADE Working Group meeting in Porto highlighted several interesting refinements across guideline methodology and persistent challenges in producing trustworthy recommendations. Across these discussions, two related but distinct themes stood out to me.

    The first is a gradual shift toward more automated and scalable evidence processes combined with an increasing emphasis on human judgement. The second is a growing recognition that guideline development is not only about interpreting evidence, but also about systematically shaping future research.

    Many of the discussions suggested that evidence-based guidance is moving into a phase where methodological refinement is no longer the sole focus. Over the past decade, evidence workflows have already undergone substantial automation, well before the emergence of current AI systems. Improved database search platforms, automated de-duplication, online risk-of-bias tools, semi-automated screening, structured data extraction, PDF retrieval systems, and support for more efficient search strategies (including improved “pearl growing” approaches) have progressively reduced the manual burden of evidence synthesis. Building on this trajectory, newer AI-enabled tools are further extending what can be automated across evidence retrieval, synthesis, and surveillance.

    Should we be worried? I don’t think so. None of this removes the need for human judgement. It just changes its role because judgement now becomes more central and more explicit. The emerging challenge is how to make expert interpretation more transparent, particularly in contexts involving uncertainty, indirect evidence, competing outcomes, or contextual constraints that cannot be resolved by algorithm.

    In this sense, AI-enabled tools and automated platforms can be regarded as extensions of the evidence process rather than replacements for expert reasoning. They increase efficiency, but they also increase the importance of direction-setting and correction when outputs are flawed, which can only be corrected if errors are identified by an expert human operator. This is where methodological expertise becomes essential: knowing how to frame the question, how to translate it into a usable workflow, what to ask of the system, and how to recognise whether the outputs are valid. It also requires judgement about where efficiency can be safely introduced and where rigour must be maintained, as well as the ability to check that automated processes have actually done what they were intended to do. In practice, this depends on a deep understanding of guideline methodology and what might be described as a form of epistemic control: the ability to define, shape, and critically interrogate how evidence is generated and interpreted, rather than use the technology blindly.

    A second, equally important theme is the growing recognition that guideline development should not be seen solely as the endpoint of evidence synthesis. As guidelines are updated, rapidly in some areas, it becomes increasingly clear that primary research does not always produce the evidence that guideline developers and clinicians need for decision-making. It is therefore unsurprising that many methodologists now see guideline methodology as a mechanism that should also be used for identifying what is not known, so that we can feed those gaps back into the research system.

    I may be somewhat biased here, having previously written about the extent to which research effort can become misaligned with actual needs. In some areas, evidence continues to accumulate around questions that are already reasonably well understood, including the use of proxy outcomes that are only weakly connected to clinically meaningful endpoints, despite the existence of more direct evidence. At the same time, other clinically important uncertainties remain unresolved across many areas, despite repeated acknowledgement that evidence is lacking or highly uncertain. This becomes particularly important in areas where evidence is persistently sparse or difficult to generate.

    Rare diseases are a clear example, but similar issues arise in areas such as infection prevention and control, where some interventions became embedded in practice long before rigorous evaluation was commonplace. Experimental designs in these settings are often constrained by ethical or practical realities. In addition, many funding systems have historically operated on the basis that the best application wins, meaning that research topics have often been shaped by the interests and capacity of applicants rather than by systematic assessments of clinical need. This approach has been understandable, because there were large gaps in evidence across many areas and expanding the overall research base was the priority.

    A related issue is that parts of the research system can generate evidence that is methodologically valid but of limited clinical relevance, particularly where surrogate outcomes are used in place of patient-important endpoints without a clearly established causal pathway between them. Concerns about poorly designed or poorly reported trials, recognised since the early years of evidence-based medicine, remain unresolved and continue to add to this methodological challenge.

    It is time that we move from research volume to more explicit design and prioritisation, a gap that guideline processes are well placed to help address. Without this, guideline updates risk repeating the same conclusions across successive editions, acknowledging uncertainty without creating a realistic pathway for resolving it.

    Concluding reflections

    Taken together, the discussions in Porto suggested to me that evidence-based guidance is entering a phase defined by two parallel shifts. The first is the integration of automation with increasingly explicit, structured, and defensible human judgement. The second is a movement toward viewing guideline development as part of a broader evidence ecosystem that not only evaluates research, but also actively shapes what research is needed next.

    Producing trustworthy recommendations depends not only on methodological sophistication in evidence synthesis, but also on the ability to connect evidence processes to real-world decision-making and research prioritisation. In that sense, guideline systems are becoming less like static products and more like coordinating infrastructures within a continuously evolving evidence landscape. This also implies a more active role in shaping and directing evidence generation, and it was encouraging to hear similar perspectives reflected across many of the discussions. As automation reduces the burden of some traditional synthesis tasks, guideline methodologists have increasing scope to focus more deliberately on these higher-order functions.

    That, to me, was one of the clearest signals emerging from Porto: that guideline systems are shifting from endpoints of evidence synthesis and recommendations toward coordinating infrastructures that both interpret evidence and actively shape what research is needed next.

  • What’s Cap Got to Do With It?

    What’s Cap Got to Do With It?

    Co-authored with Peter Hoffman

    First published 23 March 2026

    Explaining Infection Risk from Theatre Headwear: A Microbiologist and Guideline Methodologist Perspective

    I am not a clinician but a guideline methodology expert; you can read my full disclaimer here.

    I feel very fortunate to have had the opportunity to work with Peter Hoffman on guideline development over the years. His ability to cut through assumptions and focus on underlying mechanisms has always been fascinating, and I have learnt a great deal from listening to his perspective.

    This post arose from a recent discussion about a study examining contamination of different types of theatre headwear. The findings have been interpreted by some as having implications for surgical site infections (SSIs). However, the key issue is not the study itself, but how its findings are being used.

    Please note that this is not a criticism of the study. It is a question of relevance: Does measuring contamination of headwear meaningfully inform our understanding of SSI risk?

    Currently, no established link exists between theatre cap contamination and SSIs. Yet, the discussion appears to move quickly from “contamination was detected” to “this may increase infection risk.” This leap in reasoning is where the real problem lies.

    To explore this properly, it is helpful to consider two complementary perspectives. This is why I invited Peter to contribute to this post, and I am very grateful that he agreed. I hope you find it valuable to read both a microbiologist’s view of contamination and transmission, and a guideline methodologist’s perspective on how evidence should inform practice.

    Peter’s perspective: the microbiologist’s view 

    Context: Operating theatres are not sterile. They should be reasonably clean (a poorly definable concept), until you put the operating team and patient in them. This introduces billions of microbes. Infection prevention is about ensuring that as few as possible of these microbes get into the surgical wound. The core concept here is the “sterile field” – a functional unit within the operating theatre that separates out truly sterile surfaces (e.g. the scrub team’s gloves, sterile gowns, the instruments, sterile drapes etc.) and these can have contact with themselves and the surgical wound but no contact with surfaces outside this defined area.

    There’s an additional factor that needs consideration – the air. The air supplied to an operating theatre has passed through fine filters. It’s not sterile, but very low contamination. This is where the people in the theatre and their billions of microbes mess things up. Skin is a body organ that is continually renewing itself; the basal layer of cells produces new cells (“squames”) that are pushed upwards. As they reach the top, they die and dry (think microscopic cornflakes) and are released from the very top layer into the air. (The light grey dust that accumulates on bookshelves is mainly settled skin squames from people who have been in that room). Some of these will carry microcolonies of the bacteria and yeasts that grow on the skin and the ducts, glands and follicles within. In the context of an operating theatre, the requirement is to prevent these from settling out not only into the surgical wound but also onto the instruments that will have contact with the surgical wound. This is achieved by introducing vast amounts of clean air into the theatre and encouraging it to then flow out of the theatre into adjacent less clean rooms, taking these contaminants with it. Whilst well designed full body exhaust suits and integral helmets worn by the surgical team may reduce dispersion into the air, normal theatre wear does not.

    So this aspect of infection prevention does not focus on elimination of microbes, but on interrupting identified routes of microbial transmission within an inevitably contaminated greater environment. Yes, it’s nice if the environment isn’t too mucky but that, of itself, isn’t going to solve your problems.

    All this is a preamble to what Aggie has asked me to comment on: theatre headwear and contamination recoverable from it. I think that hair has maybe too high a profile in the perceptions of infection prevention in theatres. Hair is contaminated, but each hair is a very self-contained unit. It is not going to release fragments equivalent to skin squames and microbes are not going to spontaneously detach themselves into the air. If a hair, along with all the microbes on it, falls onto exposed instruments or into the surgical wound, that is a very significant event. The function of headwear is to stop that happening; it is not to prevent airborne dispersion. Headwear will inevitably be contaminated by the wearer, both by contact with their contaminated hair and by trapping squames released from the scalp, ears and neck. That contamination is trapped within the fabric of the headwear. A small amount may then be released, but that would be miniscule compared to what is released from the rest of each body* in the theatre**. If a sterile gloved hand were to touch a person’s headwear, that would breach the sterile field and immediate regloving should occur. I see addressing the varying level of contamination found on different types of headwear as a distraction from factors more relevant to infection prevention.

    *The most prolific site of release is probably the perineum. There have, in the distant past, been suggestions that the surgical team wear tight-fitting rubber underwear to contain such dispersion. This did not seem to find favour with the surgical community.

    **The overall level of airborne dispersion in theatre depends on the number of people and their degree of movement. There is a longstanding tradition that the surgical team is both more numerous and more mobile than the patient. The patient contributes minimally to the contamination of theatre air.

    Aggie’s perspective: guideline methodologist’s view 

    From an evidence synthesis perspective, the central issue is not whether contamination can be detected on theatre caps, because it clearly can. The question is what that observation means for patient outcomes.

    The first concern is the implicit assumption that contamination equates to infection risk: contamination on caps → contamination of the sterile field → wound contamination → SSI.

    What we really care about is SSI; everything else is less directly relevant. This can be thought of as a hierarchy of outcomes: outcomes that are closer to the patient provide more meaningful and direct evidence, while upstream (surrogate) outcomes sit further away from what ultimately matters. As we move further from the outcome of interest, each step introduces additional assumptions about how one finding leads to another. At each step, underlying mechanisms or confounding factors that we may not fully understand can influence the observed relationships, increasing the risk of misinterpreting the true impact. A useful real-world analogy is coconut oil: it has been shown to reduce cholesterol levels, but this does not translate into a reduction in heart attacks. Do we really care if cholesterol is lower if the outcome that matters remains unchanged? Surrogate outcomes are always of uncertain relevance. Where more direct evidence exists, a study measuring a more distal outcome does not materially change the overall understanding of risk. This reflects a familiar principle: correlation does not imply causation. Detecting contamination does not, in itself, demonstrate an increased risk of infection.

    There is also a question of biological plausibility and relative contribution. As Peter mentioned, even if small amounts of contamination are released from headwear, the microbiologist perspective highlights that this is likely to be negligible compared to other, more significant sources of microbial dispersion in the operating theatre. Focusing on a minor or uncertain pathway risks diverting attention from interventions with a clearer evidence base.

    Another key consideration is how individual studies are interpreted in the context of the wider evidence base. Studies on this topic, examining more direct outcomes (SSIs and contamination of a sterile field), exist. The systematic review of these studies provides a more reliable summary of the available evidence. While the overall evidence on theatre headgear is generally weak, it consistently shows that head coverings, regardless of type, have little or no effect on SSIs. Where differences in contamination of the sterile field are reported, they suggest the same. For the Rituals and Behaviours guidelines, the guideline development group, which included surgeons, microbiologists, and other experts, interpreted this body of evidence as indicating that headgear has minimal direct impact on patient infection risk. Seen in this context, a single study reporting a less direct outcome does not materially change the overall understanding of risk. Interpreted in isolation, however, it risks distorting that understanding, potentially undermining carefully developed guidance and shifting practice in ways that are not supported by the totality of evidence. A more useful approach is to ask whether new evidence meaningfully changes what we already know. In this case, when considered alongside the existing body of evidence, it adds very little. This touches on principles used in formal evidence appraisal frameworks (e.g. GRADE: assessing study design, quality, and directness), but this is a topic that deserves its own post.

    Conclusion

    Together, these perspectives highlight a recurring challenge in infection prevention and other fields: overinterpreting findings that are biologically interesting but clinically unproven.

    The study of theatre cap contamination is not without value. It may help us better understand potential mechanisms of microbial dispersion, including whether and how such contamination could translate into a risk of surgical site infection. It may also prompt further questions. For example, if certain materials retain bacteria more effectively, could similar principles be applied to control microbial shedding from other, more significant sources? However, the study’s relevance to surgical site infections remains uncertain and such findings should be interpreted with caution.

    The key lesson is threefold: 

    1. Changes in distal or surrogate outcomes do not necessarily translate into meaningful clinical benefits.

    2. Isolated studies should not be used to overturn the careful, holistic judgements of experts who have evaluated the totality of the evidence.

    3. Many studies require interpretation by different types of experts to assess whether the findings meaningfully contribute to clinical practice.

  • We don’t need more evidence. We need better evidence.

    We don’t need more evidence. We need better evidence.

    First published 2 February 2026

    I am not a clinician but a guideline methodology expert; you can read my full disclaimer here.

    When Evidence Becomes Noise

    It does not happen often that I work on reviews with an overwhelming amount of evidence. Far more frequently, I find myself starting evidence statements with the words: “No evidence was found…”

    There are, of course, exceptions. Diagnostic accuracy questions, often include a large number of studies. This is understandable. There are many studies reporting the value of different signs and symptoms, biomarkers, screening tools, competing technologies, produced by different companies, tested in different populations. Diagnostic accuracy questions also tend to be broad, asking whether we can rely on alternatives compared with a gold standard. In that context, a large and heterogeneous evidence base makes sense.

    However, it was during the development of the Faecal Microbiota Transplant (FMT) guidelines that I observed a completely different, and troubling, pattern: a proliferation of near-identical primary studies and systematic reviews addressing essentially the same questions. Once this became apparent, it was difficult not to recognise similar patterns across other topics.

    FMT is a relatively new intervention that involves transferring the microbiome from the faeces of a healthy donor into the gastrointestinal tract of a recipient. It is primarily used for people with severe Clostridioides difficile infection who have failed antibiotic treatment, although it has increasingly been trialled for other conditions.

    The first area in which an unusually large body of evidence appeared was the question of risk factors for FMT failure. Very early in the full-text sift, the problem became obvious. There was a large number of case-control and cross-sectional studies that met the inclusion criteria, many of which tested a wide range of variables with little apparent rationale.

    On the surface, these looked like exploratory studies. In practice, the choice of variables was inconsistent and often implausible, suggesting that they were selected primarily because they were available in existing datasets rather than because anyone had a clear hypothesis about their relevance (for example, whether patients were hospitalised before or during FMT).

    As a result, the systematic review had to stratify findings across numerous “risk factors,” most of which were clinically meaningless.

    “If you torture the data long enough, it will confess to anything”

    Famous words often attributed to Ronald Coase. They captured exactly what we observed during data extraction.

    Many of the identified risk factors were non-modifiable and, even where associations reached statistical significance, they offered little in the way of clinically meaningful insight. Statistically detectable but practically irrelevant.

    FMT for the resolution of C. difficile infection symptoms is highly effective. Our meta-analysis showed success rates of 85%, compared with 38% in people receiving antibiotics or placebo. This included one study using enema administration, now known to be the least effective method. When that study was removed, success rates increased to 92%, while outcomes in the control group remained essentially unchanged (36%). 

    The conclusion was therefore obvious from the outset: there is little value in studying non-modifiable risk factors for FMT failure. Even if such factors were statistically significant, and ultimately, none were, patients would still be far better off receiving FMT than further antibiotic treatment. Had these studies instead focused on safety in specific populations, they might have generated valuable and actionable evidence. Instead, safety data had to be drawn from entirely separate studies, mostly case series. It is difficult not to see this as a substantial waste of effort.

    Systematic Reviews on Repeat

    A similar pattern emerged when examining FMT for conditions other than C. difficile infection.

    For ulcerative colitis alone, there were eight nearly identical systematic reviews published within four years. They included largely overlapping studies, differed only marginally in scope, and reached essentially the same conclusion: FMT may be beneficial, but better-quality studies are needed. Since publication of the guidelines, at least two additional reviews and one registered protocol have appeared on the same topic. None of these reviews appeared to have PROSPERO registrations.

    A similar situation existed for irritable bowel syndrome, with ten systematic reviews reporting near-identical conclusions. Two further reviews became available after the FMT guidelines were published.

    A Pragmatic Guideline Response

    In developing the FMT guidelines, we took a pragmatic approach. For each of these two questions, we included the most recent good-quality systematic review and supplemented it with any missing randomised controlled trials. The existence of other reviews was acknowledged, but they did not meaningfully alter conclusions.

    The broader conclusion is not new, but it remains unresolved. Large parts of the research literature are shaped not by unanswered questions, but by incentives that reward production over purpose. Systematic reviews and exploratory risk-factor studies have become particularly common because they are low cost, low risk, and low accountability. They rely on existing data, require no ethical approval, no patient consent, and few practical hurdles. They are frequently justified using formulations such as “No review has focused specifically on adults aged 43–57 in middle-income countries”. In other words, prior studies, conducted in the general population, are acknowledged but dismissed on the basis that narrowly defined subgroups have not yet been examined, whether or not such distinctions are justifiable.

    At the same time, many researchers believe they know how to conduct a systematic review despite limited methodological training. Journals frequently accept reviews that meet reporting standards but fall short on rigour. These reviews are also unlikely to be proven wrong in ways that carry consequences, making retraction rare.

    The result is a steady stream of low-quality, duplicative work that appears methodologically robust while adding little to decision-making.

    It’s a System With No Brake Pedal

    This reflects a system with no effective brake pedal. There is no mechanism to say, “this question has been answered as far as it can be”. No penalty for redundancy, and no requirement to justify whether a study is actually needed. Instead, the system rewards being an author more than solving the problem.

    For individual researchers, this creates a rational strategy: minimal investment yields maximum career return. At an institutional level, the same incentives apply. Universities are rewarded for volume and visibility, not for whether outputs resolve uncertainty or change practice. In this context, any research becomes preferable to no research because it contributes to metrics such as the REF.

    Publications accumulate. Careers progress. The literature grows. Uncertainty remains.

    Drawing a Line

    I am very proud of the FMT guideline development group for making some bold but necessary decisions.

    First, the group agreed that future guideline updates would not revisit the effectiveness of FMT for C. difficile infection. Despite imperfections in the evidence, the magnitude of effect is clear, and revisiting the question would represent a poor use of resources. Until a genuinely new technology emerges (for example, new antibiotics), further evaluation is unnecessary. A similar decision was made regarding non-modifiable risk factors for FMT failure.

    Second, the panel issued a research recommendation explicitly calling for an end to the production of duplicate systematic reviews on this topic. This was a bold and unusual step, but a necessary one.

    A New Research Gap?

    The problem is not a lack of studies, but a lack of mechanisms to ensure that research priorities identified through systematic reviews and guidelines are actually taken up.

    Evidence synthesis routinely signals where uncertainty is genuine, where further research would be futile, and where alternative approaches are required, yet these signals are rarely operationalised. Without a functional feedback loop, research continues to be generated independently of what the evidence base is asking for. In my opinion, this constitutes a distinct research gap. For now, I will call it the evidence-back-to-research gap, until someone comes up with a better name.

    Addressing this gap does not necessarily require expensive or large-scale trials. Not every question needs to start (or even end) with an RCT. What is needed instead is intentional research. Rather than defaulting to low-effort, low-value studies that serve no one, researchers need to ask how a question can be meaningfully answered with the resources available. To some extent, the research ecosystem already has mechanisms intended to reduce unnecessary duplication—such as PROSPERO for systematic reviews and PREPARE for guidelines. However, these tools focus primarily on what should not be done again. And while some publishers now require authors to state where reviews or protocols were registered, this practice is inconsistent and should be standard across journals. After all, publishers and peer reviewers have a parallel responsibility: to require justification of necessity of the study and highlighting the contribution it brings to the knowledge base. Simply, to say no to redundant work. 

    What is still missing is a clear, operational signal about what should happen next when genuine research gaps are identified. Although guidelines routinely list research recommendations, these are typically included at the end of the document and rarely translated into funded, coordinated research activity. I often wonder how many people are aware that they are there. I am frequently told that clinicians do not have time to read long guideline documents, which is why we also produce short summaries, algorithms, posters, and other derivative products – formats in which research recommendations are never included.

    Until this research gap is closed, the system will continue to generate volume while failing to address the questions that matter most. And at the next guideline update, we will again have no new evidence, acknowledge the same level of uncertainty, and repeat both the recommendations and the research recommendations.

  • No Evidence? No Problem.

    No Evidence? No Problem.

    First published 7 March 2026

    No Evidence? No Problem. Why you still can (and should) do Evidence-Based Guidelines

    I am not a clinician but a guideline methodology expert; you can read my full disclaimer here.

    I hear this a lot when working with guideline development groups:

    There’s no evidence for this, so this should just be a guideline based on expert opinion. No need to do the reviews.”

    Often, people assume that because they haven’t come across evidence, it simply doesn’t exist. Sometimes the evidence they did see didn’t support their hypothesis, so they conclude that there is “no evidence” to support their beliefs. Either way, the temptation is the same: skip the systematic approach, rely on expert opinion, and move on. After all, systematic reviews are time-consuming and expensive. 

    But there are good reasons why every recommendation should still be underpinned by a systematic review.

    1. Transparency

    A systematic review documents exactly what was searched for, and what was and wasn’t found. When a systematic review concludes that “there is no evidence,” what it really means is: “We looked. Systematically. And we didn’t find anything that met the inclusion criteria”. This transparency matters. By clearly describing the methods used, including search strategies and inclusion criteria, others can understand the scope of the work, the limits of current knowledge, and the basis for the recommendation.

    Without this step, “no evidence” can easily mean: “We didn’t check properly”, or “ The evidence didn’t fit our narrative”.

    If, truly, no evidence exists, the recommendation may still ultimately be based on expert opinion, but the difference is that this is made explicit. Because the certainty of evidence is very low, the guideline group is more likely to issue a weaker recommendation or a good practice statement. This is an important distinction.

    2. Driving research forward

    If guideline development groups don’t clearly signal where evidence is lacking, there is little hope of ever filling those gaps. A guideline that explicitly states “evidence gap here” does something powerful: it gives researchers a direction. Evidence-based guidelines are not only about telling people what to do today. They also shape the evidence base for tomorrow.

    If guideline groups quietly default to expert opinion without documenting the uncertainty, they effectively close the door on progress. When the guideline is updated years later, nothing has changed, because no one knew these questions needed answering.

    3. You might be surprised

    Sometimes you expect to find nothing, and you still find something useful. Not necessarily a large randomised trial. Sometimes the evidence is indirect, observational, or incomplete. But it can still provide valuable insights.

    For example, in work on Rituals and Behaviours in Operating Theatres, the guideline development group did not expect to find much evidence for some recommendations. And largely, they were right. But structured searching for a question about scrubs being worn outside theatres revealed something interesting. Several studies described the difficulties of conducting research in this area, including compliance problems and logistical barriers. One study revealed an issue that was surprisingly simple: clean scrubs were not always consistently available. If the hospital cannot get the logistics right, it becomes difficult to expect perfect compliance. In that context, the absence of strong evidence for or against a practice becomes much easier to understand.

    A similar issue appeared in norovirus guidelines. One research group attempted an ambitious randomised trial comparing different levels of outbreak control measures in nursing homes. In practice, once an outbreak began, institutions assigned to the lower-intervention arms abandoned their protocols and implemented additional measures. Hardly surprising. When an outbreak occurs, staff will do whatever they believe might control it. From a practical and ethical standpoint, it is unrealistic to expect strict adherence to a restrictive research protocol.

    These insights were extremely valuable for implementation planning and making recommendations. They highlighted behavioural and system-level constraints that are rarely captured. Had the guideline groups skipped the reviews because “there’s no evidence,” they would have missed information that fundamentally improved how recommendations were made to align with work in real-world settings. Even small details can make recommendations far more realistic and implementable.

    So my point is this

    Robustness does not come from skipping steps. The evidence may genuinely not exist. Guideline groups cannot control that. What they can control is the process: systematically searching for what exists and being transparent about what does not.

    In the absence of evidence, expert opinion is still valid within evidence-based guidelines. The difference is that the opinion is framed by what is known and unknown, and expressed through a structured and transparent discussion, not by whoever speaks the loudest in the room.

    So next time you hear someone say: “There’s no evidence here”, remember: the evidence you don’t find can be just as important as the evidence you do.

    Doing the systematic work makes your guideline stronger, your recommendations more transparent, and your research agenda clearer.

    No evidence? No problem. Evidence-based guidelines are not about having perfect evidence. They are about being honest about the evidence we have and the evidence we don’t.

  • If Not Evidence, Then What?

    If Not Evidence, Then What?

    First published 17 January 2026

    I am not a clinician but a guideline methodology expert; you can read my full disclaimer here.

    Guidelines: Why Evidence Matters

    Guidelines are structured recommendations that translate the available evidence into practical guidance for clinicians and institutions. Experts contribute by interpreting the evidence, assessing its applicability, and advising on safe implementation.

    Evidence-based guidelines are not perfect. They can be expensive to produce, slow to develop, and even when completed, they may not deliver the clear answers people hope for. But what’s the alternative?

    Let’s not confuse expert opinion with evidence-based advice

    It may sound reassuring when an expert shares their view. But expert opinion is just that – the perspective of an individual. It may (or may not) be informed by evidence, but it is not evidence itself. Opinion is not transparent or reproducible. It is shaped by knowledge, experience, professional context, and incentives.

    Who decides who the expert is? In practice, expertise is often given to visibility, seniority, or confidence rather than the quality of reasoning. And experts disagree – often loudly. Not because one is better or worse, but because the scientific literature is vast, and no one can read it all.

    A classic example is the Cochrane logo: a forest plot representing an iconic systematic review on corticosteroids for women in premature labour. The review combined seven trials which were available at that time. Even if a clinician was lucky enough to come across and read all seven, they might conclude corticosteroids offered little benefit. Yet systematic analysis showed a clear advantage, likely saving thousands of lives. This demonstrates that we cannot rely on individuals alone to interpret the evidence.

    Rules are not a substitute for proof

    Here’s the practical tension: certain technical standards — HTMs, regulatory requirements — must be followed, even though they are not always evidence-based. It may feel like a double standard, but it reflects reality: healthcare operates within multiple frameworks. Evidence guides clinical decisions, while technical and regulatory standards ensure safety, consistency, and legal compliance.

    Recognizing this distinction helps us understand the scope and purpose of different guidance types. But there’s an interesting twist: by building evidence, we can highlight where regulatory standards fall short – and the best way to do this is through guideline development. Guidelines can summarize evidence and challenge the status quo, nudging regulations toward practices that are truly evidence-based.

    Embracing evidence

    Healthcare organizations rely on Evidence-Based Practice (EBP) to ensure safe, high-quality, and effective care. Guidelines based on opinion, rather than evidence, risk letting down clinicians, patients, and the systems that deliver care.

    In the context of guidelines, evidence means all available primary or secondary research, systematically synthesized, alongside an assessment of quality, relevance, and limitations. White papers, consensus statements, literature reviews, and opinion pieces are not evidence and should not be treated as such.

    Evidence-based guidelines do not privilege one study design by default. Different types of evidence may be more appropriate for different questions. The key is transparency. Evidence does not have to be perfect to make recommendations. What matters is making uncertainty explicit, identifying gaps, and providing a shared reference point for decisions.

    Even low-quality evidence can be useful. Done appropriately, guidelines offer a defensible rationale for clinical decisions and protect against arbitrary or biased choices. Guidelines don’t tell clinicians exactly what to do — they provide decision support, rationale, and a safe way to act under uncertainty. They also reassure patients, supporting shared decision-making with clear, transparent information.

    Evidence alone is not enough. But it is the perfect place to start. Where would healthcare be without it? Organizations that provide advice fail their audience when they ignore evidence.

    The questions everyone asks next

    • What if the evidence is weak?
    • What if it doesn’t exist?
    • What if trials are impossible or unethical?
    • What if I need advice fast?

    In the next post, we’ll tackle the question that challenges every guideline developer: what do you do when the evidence is scarce — or entirely missing?

    Several of the ideas discussed in this post are explored further, in practical terms, in the accompanying Practical Guidance on Guidelines articles:

    “Evidence-Based” is a process, not a label

    When Evidence Review Surprised Experts