Dual-FND: A Simple Agentic Framework for Fake News Detection

Arka Mukherjee
Feb 26
4 min read

Updated: Feb 26

A mock-up of the dual-agent Dual-FND framework

Agentic frameworks have taken the internet by storm. Leveraging multiple AI models to ground each other's outputs have proven to improve performance for all — a key finding I tried to translate to fake news detection this afternoon. Dual-FND is a baseline agentic framework that uses two agents to guide zero-shot fake news detection.

In this blog post, I aim to outline the engineering choices and details of the framework. Do note this is a small cog in a larger corpus of fake news detection research I am conducting over the past few months. So far, I have built UNITE-FND, which you can now read on ArXiv. Dual-FND is the first step towards scaling the findings and limitations discussed in that paper.

The Dual-FND Agentic Framework

Workflow of the simple 2-agent fake news detection framework

Agent 1: FactBase

The first agent in our framework is known as FactBase. As the name suggests, this first step extracts the actionable facts from a news piece. For each extracted bit of information, we identify a source — a critical step in fake news detection. For the task, we utilize Google's Gemini 1.5 Pro. This engineering choice is driven by several factors:

Gemini 1.5 Pro is one of the state-of-the-art large language models, promising high performance and reliability.
The model has a long context window of 2M tokens, which helps analyze long news pieces effectively.
It is available via API access from Google Cloud, which eases deployment and testing. The first $300 of inference is free of cost.

In our framework, the agent performs the following functions:

Fact Extraction: FactBase leverages Gemini 1.5 Pro's advanced language understanding and extensive knowledge base to extract and verify a set of facts.
Output Format: The verified facts are returned in a CSV format containing three specific headers: claim, source, and confidence.
Confidence Scoring: Each claim is accompanied by a confidence score ranging from 0 to 1, indicating how verifiable the claim is based on its specificity.
Data Transfer: The resulting CSV file is then passed to Agent 2 for further processing or analysis.

Given Gemini 1.5 Pro's state-of-the-art performance, reliable structured outputs are guaranteed. The lightweight fact verification steps helps subsequent agents to build upon the validated data.

Agent 2: Verifier

Recently developed RL algorithms like GRPO train smaller language models like Phi-4 2.5B and Llama 3.1 8B to develop reasoning traces. However, upon experimentation, I found state-of-the-art large LMs can reason with zero-shot prompting. This provides a unique ground for our fake news detection framework: zero-shot reasoning with Google's Gemini 2.0 Flash.

Previously, DeepMind had released a fine-tuned thinking variant of the smaller variant of their next-gen AI model. Playing with it on AI Studio leads to exciting findings: small LM + reasoning = amazing performance in classification tasks! I leveraged this finding with careful prompting to develop the second agent "Verifier."

Much like 1.5 Pro, Gemini 2.0 Flash is a capable model when it comes to instruction following. Zero-shot prompting is enough to guide it towards reliable XML outputs. We can extract required information with regex for effortless integration with an existing code base.

The model is prompted to output in this format:

<verification>
  <overall_assessment>1 for REAL or 0 for FAKE</overall_assessment>
  <confidence_score>0-1</confidence_score>
  <key_issues>Brief issues</key_issues>
  <reasoning>Brief reasoning</reasoning>
</verification>

Performance and Ablation Studies

I took the fake news detection framework for a spin, and compared it to a slightly more complicated three-agent setup and simple zero-shot prompting with Gemini 1.5 Pro and Gemini 2.0 Flash. The dataset used is the Uni-Fakeddit-55k corpus which is available Hugging Face.

For all of my experiments, I used 500 prompts to ensure a large enough sample that leads to acceptable generalization while being enough to finish the tests within an evening. Binary classification is best suited for rapid prototyping and our small sample size doesn't suit fine-grained tasks, which is why I skipped it.

Dual-FND with claim extraction and verification achieved 72.86%, which is commendable for a baseline. However, if we look at some simpler (and free-to-use) transformer-based models, the story quickly changes.

I tested the same 500 prompts with TinyBERT (14.5M), DistilBERT(66M), BERT (110M), RoBERTa-Base (125M), RoBERTa-Large (355M), and DeBERTa (435M). The tiny dataset translates to poor generalization and overfitting on larger models and are best suited for mid-sized variants.

For a quick comparison, I prompted Claude 3.7 Sonnet with my existing prompt designs and asked it to improve it. Here's what the model was asked:

<existing prompts, results, and zero-shot numbers>
Let's engineer prompts that can beat Gemini 2.0 Flash's numbers.

Upon this request, Claude developed a 3-agent strategy, which it touted "improves upon your current strategies." You can check the prompts in detail in the GitHub repo linked at the bottom of this post.

Here's a detailed ablation study (it was quite a few fine-tunes).

Model	Model Size	VRAM Usage	Accuracy	Training Time
Claude 3.7 Sonnet's 3-agent framework	N/A	N/A	57.40%	N/A
TinyBERT	14.5M	0.2 GB	65.33%	4.6s
RoBERTa-Large	355M	7.9 GB	67.33%	21.6s
Zero-shot Gemini 1.5 Pro	N/A	N/A	68.79%	N/A
DistilBERT	66M	1.1 GB	72.00%	10.2s
Zero-shot Gemini 2.0 Flash	N/A	N/A	72.69%	N/A
Dual-FND (2-agent framework)	N/A	N/A	72.86%	N/A
RoBERTa-Base	125M	4.3 GB	73.33%	12.6s
DeBERTa V3 Large	435M	15.4 GB	74.67%	50.1s
BERT	110M	2.5 GB	76.00%	18.6s

Clearly, basic BERT-based architectures are better suited for the task than relying on LLMs. In the UNITE-FND paper, we share similar findings where Vision-Language Models couldn't match the accuracy numbers hit by TinyBERT. We achieve 92.52% accuracies on a larger 55k corpus with careful tuning of hyperparameters and model choices in the paper. I highly recommend reading up the details on how we did

The biggest gains with Dual-FND (and other agent-based architectures) is that no GPUs are required. You can get comparable accuracy numbers with just a CPU. The code can be hosted with any cloud Python host for cheap (or FREE with a service like Python Anywhere).

However, the laughably low accuracy of the 3-agent framework proves the inability of SoTA "thinking" models like Claude 3.7 Sonnet to innovate. Ironically, this post is being written on the same day as Anthropic's "most intelligent model" dropped.

Prompts and code @ my GitHub repo: https://github.com/ArkaMukherjee0/DualFND

Dual-FND: A Simple Agentic Framework for Fake News Detection

The Dual-FND Agentic Framework

Agent 1: FactBase

Agent 2: Verifier

Performance and Ablation Studies

Recent Posts

Comments