We don’t need more evidence. We need better evidence.

First published 2 February 2026

I am not a clinician but a guideline methodology expert; you can read my full disclaimer here.

When Evidence Becomes Noise

It does not happen often that I work on reviews with an overwhelming amount of evidence. Far more frequently, I find myself starting evidence statements with the words: “No evidence was found…”

There are, of course, exceptions. Diagnostic accuracy questions, often include a large number of studies. This is understandable. There are many studies reporting the value of different signs and symptoms, biomarkers, screening tools, competing technologies, produced by different companies, tested in different populations. Diagnostic accuracy questions also tend to be broad, asking whether we can rely on alternatives compared with a gold standard. In that context, a large and heterogeneous evidence base makes sense.

However, it was during the development of the Faecal Microbiota Transplant (FMT) guidelines that I observed a completely different, and troubling, pattern: a proliferation of near-identical primary studies and systematic reviews addressing essentially the same questions. Once this became apparent, it was difficult not to recognise similar patterns across other topics.

FMT is a relatively new intervention that involves transferring the microbiome from the faeces of a healthy donor into the gastrointestinal tract of a recipient. It is primarily used for people with severe Clostridioides difficile infection who have failed antibiotic treatment, although it has increasingly been trialled for other conditions.

The first area in which an unusually large body of evidence appeared was the question of risk factors for FMT failure. Very early in the full-text sift, the problem became obvious. There was a large number of case-control and cross-sectional studies that met the inclusion criteria, many of which tested a wide range of variables with little apparent rationale.

On the surface, these looked like exploratory studies. In practice, the choice of variables was inconsistent and often implausible, suggesting that they were selected primarily because they were available in existing datasets rather than because anyone had a clear hypothesis about their relevance (for example, whether patients were hospitalised before or during FMT).

As a result, the systematic review had to stratify findings across numerous “risk factors,” most of which were clinically meaningless.

“If you torture the data long enough, it will confess to anything”

Famous words often attributed to Ronald Coase. They captured exactly what we observed during data extraction.

Many of the identified risk factors were non-modifiable and, even where associations reached statistical significance, they offered little in the way of clinically meaningful insight. Statistically detectable but practically irrelevant.

FMT for the resolution of C. difficile infection symptoms is highly effective. Our meta-analysis showed success rates of 85%, compared with 38% in people receiving antibiotics or placebo. This included one study using enema administration, now known to be the least effective method. When that study was removed, success rates increased to 92%, while outcomes in the control group remained essentially unchanged (36%). 

The conclusion was therefore obvious from the outset: there is little value in studying non-modifiable risk factors for FMT failure. Even if such factors were statistically significant, and ultimately, none were, patients would still be far better off receiving FMT than further antibiotic treatment. Had these studies instead focused on safety in specific populations, they might have generated valuable and actionable evidence. Instead, safety data had to be drawn from entirely separate studies, mostly case series. It is difficult not to see this as a substantial waste of effort.

Systematic Reviews on Repeat

A similar pattern emerged when examining FMT for conditions other than C. difficile infection.

For ulcerative colitis alone, there were eight nearly identical systematic reviews published within four years. They included largely overlapping studies, differed only marginally in scope, and reached essentially the same conclusion: FMT may be beneficial, but better-quality studies are needed. Since publication of the guidelines, at least two additional reviews and one registered protocol have appeared on the same topic. None of these reviews appeared to have PROSPERO registrations.

A similar situation existed for irritable bowel syndrome, with ten systematic reviews reporting near-identical conclusions. Two further reviews became available after the FMT guidelines were published.

A Pragmatic Guideline Response

In developing the FMT guidelines, we took a pragmatic approach. For each of these two questions, we included the most recent good-quality systematic review and supplemented it with any missing randomised controlled trials. The existence of other reviews was acknowledged, but they did not meaningfully alter conclusions.

The broader conclusion is not new, but it remains unresolved. Large parts of the research literature are shaped not by unanswered questions, but by incentives that reward production over purpose. Systematic reviews and exploratory risk-factor studies have become particularly common because they are low cost, low risk, and low accountability. They rely on existing data, require no ethical approval, no patient consent, and few practical hurdles. They are frequently justified using formulations such as “No review has focused specifically on adults aged 43–57 in middle-income countries”. In other words, prior studies, conducted in the general population, are acknowledged but dismissed on the basis that narrowly defined subgroups have not yet been examined, whether or not such distinctions are justifiable.

At the same time, many researchers believe they know how to conduct a systematic review despite limited methodological training. Journals frequently accept reviews that meet reporting standards but fall short on rigour. These reviews are also unlikely to be proven wrong in ways that carry consequences, making retraction rare.

The result is a steady stream of low-quality, duplicative work that appears methodologically robust while adding little to decision-making.

It’s a System With No Brake Pedal

This reflects a system with no effective brake pedal. There is no mechanism to say, “this question has been answered as far as it can be”. No penalty for redundancy, and no requirement to justify whether a study is actually needed. Instead, the system rewards being an author more than solving the problem.

For individual researchers, this creates a rational strategy: minimal investment yields maximum career return. At an institutional level, the same incentives apply. Universities are rewarded for volume and visibility, not for whether outputs resolve uncertainty or change practice. In this context, any research becomes preferable to no research because it contributes to metrics such as the REF.

Publications accumulate. Careers progress. The literature grows. Uncertainty remains.

Drawing a Line

I am very proud of the FMT guideline development group for making some bold but necessary decisions.

First, the group agreed that future guideline updates would not revisit the effectiveness of FMT for C. difficile infection. Despite imperfections in the evidence, the magnitude of effect is clear, and revisiting the question would represent a poor use of resources. Until a genuinely new technology emerges (for example, new antibiotics), further evaluation is unnecessary. A similar decision was made regarding non-modifiable risk factors for FMT failure.

Second, the panel issued a research recommendation explicitly calling for an end to the production of duplicate systematic reviews on this topic. This was a bold and unusual step, but a necessary one.

A New Research Gap?

The problem is not a lack of studies, but a lack of mechanisms to ensure that research priorities identified through systematic reviews and guidelines are actually taken up.

Evidence synthesis routinely signals where uncertainty is genuine, where further research would be futile, and where alternative approaches are required, yet these signals are rarely operationalised. Without a functional feedback loop, research continues to be generated independently of what the evidence base is asking for. In my opinion, this constitutes a distinct research gap. For now, I will call it the evidence-back-to-research gap, until someone comes up with a better name.

Addressing this gap does not necessarily require expensive or large-scale trials. Not every question needs to start (or even end) with an RCT. What is needed instead is intentional research. Rather than defaulting to low-effort, low-value studies that serve no one, researchers need to ask how a question can be meaningfully answered with the resources available. To some extent, the research ecosystem already has mechanisms intended to reduce unnecessary duplication—such as PROSPERO for systematic reviews and PREPARE for guidelines. However, these tools focus primarily on what should not be done again. And while some publishers now require authors to state where reviews or protocols were registered, this practice is inconsistent and should be standard across journals. After all, publishers and peer reviewers have a parallel responsibility: to require justification of necessity of the study and highlighting the contribution it brings to the knowledge base. Simply, to say no to redundant work. 

What is still missing is a clear, operational signal about what should happen next when genuine research gaps are identified. Although guidelines routinely list research recommendations, these are typically included at the end of the document and rarely translated into funded, coordinated research activity. I often wonder how many people are aware that they are there. I am frequently told that clinicians do not have time to read long guideline documents, which is why we also produce short summaries, algorithms, posters, and other derivative products – formats in which research recommendations are never included.

Until this research gap is closed, the system will continue to generate volume while failing to address the questions that matter most. And at the next guideline update, we will again have no new evidence, acknowledge the same level of uncertainty, and repeat both the recommendations and the research recommendations.