Skip to content
Learn Agentic AI
Learn Agentic AI13 min read4 views

AI Research Agent: Automated Literature Search and Summary Generation

Build an AI research agent that searches academic papers via the Semantic Scholar API, summarizes key findings, manages citations, and synthesizes insights across multiple sources into a coherent literature review.

The Research Bottleneck

A typical literature review involves searching multiple databases, skimming dozens of abstracts, reading a handful of full papers, extracting key claims, and weaving them into a coherent narrative. A single researcher might spend two to three weeks on this process. An AI research agent compresses the search-and-summarize loop from days to minutes while the human focuses on critical evaluation and synthesis decisions.

The agent we build here uses the Semantic Scholar API for paper discovery, LLM-powered summarization for each paper, and a synthesis step that identifies themes and contradictions across the collected literature.

Paper Search Tool

Semantic Scholar provides a free API that returns paper metadata, abstracts, citation counts, and more. The search tool wraps this API:

flowchart TD
    START["AI Research Agent: Automated Literature Search an…"] --> A
    A["The Research Bottleneck"]
    A --> B
    B["Paper Search Tool"]
    B --> C
    C["Citation Management Tool"]
    C --> D
    D["Assembling the Research Agent"]
    D --> E
    E["Running a Literature Search"]
    E --> F
    F["Enhancing with Full-Text Analysis"]
    F --> G
    G["Handling Rate Limits and Errors"]
    G --> H
    H["FAQ"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
import httpx
from agents import Agent, Runner, function_tool

S2_BASE = "https://api.semanticscholar.org/graph/v1"

@function_tool
async def search_papers(query: str, limit: int = 10) -> str:
    """Search Semantic Scholar for papers matching a query.
    Returns titles, authors, year, citation count, and abstracts."""
    params = {
        "query": query,
        "limit": limit,
        "fields": "title,authors,year,abstract,citationCount,externalIds",
    }
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"{S2_BASE}/paper/search", params=params)
        resp.raise_for_status()

    papers = resp.json().get("data", [])
    results = []
    for p in papers:
        authors = ", ".join(a["name"] for a in (p.get("authors") or [])[:3])
        doi = (p.get("externalIds") or {}).get("DOI", "N/A")
        abstract = (p.get("abstract") or "No abstract available.")[:500]
        results.append(
            f"Title: {p['title']}\n"
            f"Authors: {authors}\n"
            f"Year: {p.get('year', 'N/A')} | Citations: {p.get('citationCount', 0)}\n"
            f"DOI: {doi}\n"
            f"Abstract: {abstract}\n"
        )
    return "\n---\n".join(results) if results else "No papers found."

Citation Management Tool

Keeping track of references is essential. This tool stores papers the agent decides are relevant and outputs formatted citations:

_citation_store: list[dict] = []

@function_tool
def save_citation(title: str, authors: str, year: str, doi: str) -> str:
    """Save a paper to the citation list for the final bibliography."""
    entry = {"title": title, "authors": authors, "year": year, "doi": doi}
    _citation_store.append(entry)
    return f"Saved. Total citations: {len(_citation_store)}"

@function_tool
def get_bibliography() -> str:
    """Return all saved citations in APA-like format."""
    if not _citation_store:
        return "No citations saved yet."
    lines = []
    for i, c in enumerate(_citation_store, 1):
        lines.append(f"[{i}] {c['authors']} ({c['year']}). {c['title']}. DOI: {c['doi']}")
    return "\n".join(lines)

Assembling the Research Agent

The agent needs instructions that define a clear research workflow — search, filter, summarize, synthesize:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

research_agent = Agent(
    name="Research Agent",
    instructions="""You are an academic research agent. When given a research topic:
1. Use search_papers to find the 10 most relevant papers.
2. Evaluate each abstract for relevance. Discard papers that do not directly
   address the topic.
3. For each relevant paper, save_citation to build the bibliography.
4. Summarize each relevant paper in 2-3 sentences focusing on methodology
   and key findings.
5. After reviewing all papers, write a synthesis section that identifies
   common themes, conflicting results, and open questions.
6. End with the full bibliography from get_bibliography.""",
    tools=[search_papers, save_citation, get_bibliography],
)
import asyncio

async def main():
    result = await Runner.run(
        research_agent,
        "Survey the recent literature on retrieval-augmented generation "
        "for question answering systems. Focus on papers from 2024-2026.",
    )
    print(result.final_output)

asyncio.run(main())

The agent searches for RAG papers, filters by relevance and recency, saves citations for the strongest matches, summarizes each one, and produces a synthesis section identifying trends like the shift from sparse to dense retrieval and the emergence of hybrid chunking strategies.

Enhancing with Full-Text Analysis

Abstracts only tell part of the story. For deeper analysis, add a tool that fetches full paper text via open-access repositories:

@function_tool
async def fetch_paper_text(doi: str) -> str:
    """Fetch the full text of an open-access paper via Unpaywall."""
    async with httpx.AsyncClient() as client:
        resp = await client.get(
            f"https://api.unpaywall.org/v2/{doi}",
            params={"email": "[email protected]"},
        )
        if resp.status_code != 200:
            return "Paper not available open-access."
        data = resp.json()
        oa_url = data.get("best_oa_location", {}).get("url_for_pdf")
        if not oa_url:
            return "No open-access PDF URL found."
        return f"Full text available at: {oa_url}"

Handling Rate Limits and Errors

Academic APIs enforce rate limits. Wrap HTTP calls with exponential backoff:

import asyncio

async def resilient_get(client, url, params, max_retries=3):
    for attempt in range(max_retries):
        resp = await client.get(url, params=params)
        if resp.status_code == 429:
            wait = 2 ** attempt
            await asyncio.sleep(wait)
            continue
        resp.raise_for_status()
        return resp
    raise Exception("Max retries exceeded")

FAQ

Can this agent access papers behind paywalls?

No. The agent uses public APIs and open-access repositories. For paywalled content, you would need institutional access or an API key from a licensed database like IEEE Xplore or PubMed Central.

How accurate are the LLM-generated summaries?

LLM summaries of abstracts are generally reliable for capturing high-level findings. However, they can miss nuances in methodology sections. Always have a domain expert review the synthesis before using it in a formal publication.

How do I focus the search on a specific time range?

Add a year filter to the Semantic Scholar API request by appending &year=2024-2026 to the query parameters. You can also instruct the agent to discard papers outside the target date range during the filtering step.


#Research #LiteratureReview #SemanticScholar #Summarization #AIAgents #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Automating Client Document Collection: How AI Agents Chase Missing Tax Documents and Reduce Filing Delays

See how AI agents automate tax document collection — chasing missing W-2s, 1099s, and receipts via calls and texts to eliminate the #1 CPA bottleneck.

Learn Agentic AI

API Design for AI Agent Tool Functions: Best Practices and Anti-Patterns

How to design tool functions that LLMs can use effectively with clear naming, enum parameters, structured responses, informative error messages, and documentation.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.

Learn Agentic AI

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns

Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.

Learn Agentic AI

Google Cloud AI Agent Trends Report 2026: Key Findings and Developer Implications

Analysis of Google Cloud's 2026 AI agent trends report covering Gemini-powered agents, Google ADK, Vertex AI agent builder, and enterprise adoption patterns.