Skip to content
Learn Agentic AI
Learn Agentic AI archive page 22 of 146

Learn Agentic AI — Build Voice & Chat Agents

Step-by-step tutorials on building voice and chat AI agents using OpenAI Agents SDK, Realtime API, function calling, multi-agent orchestration, and production deployment patterns.

9 of 1309 articles

Learn Agentic AI
11 min read9Mar 17, 2026

UFO's Visual Understanding: How GPT-4V Interprets Windows Application Screenshots

Explore how UFO captures, annotates, and sends Windows application screenshots to GPT-4V for UI element detection, control identification, and intelligent action mapping at each automation step.

Learn Agentic AI
13 min read2Mar 17, 2026

Playwright Network Interception: Capturing API Calls and Modifying Requests

Master Playwright's network interception API to capture API responses, log request/response data, mock endpoints, and extract structured data from XHR and fetch calls in your AI agents.

Learn Agentic AI
11 min read5Mar 17, 2026

Claude Vision for PDF Processing in the Browser: Reading Documents Without Download

Use Claude Computer Use to read PDFs rendered in browser viewers — navigating pages, extracting text and tables, detecting annotations, and converting visual PDF content to structured data without file downloads.

Learn Agentic AI
12 min read11Mar 17, 2026

Multi-Table Text-to-SQL: Handling JOINs, Subqueries, and Complex Relationships

Master multi-table text-to-SQL challenges including JOIN inference, ambiguous column resolution, query planning for complex questions, and techniques that help LLMs reason across table relationships.

Learn Agentic AI
13 min read3Mar 17, 2026

Building a Vision-Based Web Navigator: GPT-4V Sees and Acts on Web Pages

Build a complete screenshot-action loop where GPT-4V analyzes web pages, decides where to click, and navigates autonomously. Learn coordinate extraction, click targeting, and navigation decision-making.

Learn Agentic AI
10 min read3Mar 17, 2026

GPT Vision for CAPTCHA and Challenge Detection: Identifying Blocking Elements

Learn how to use GPT Vision to detect CAPTCHAs, cookie banners, paywalls, and other blocking elements that interrupt browser automation — and implement graceful handling strategies.

Learn Agentic AI
12 min read4Mar 17, 2026

UFO's Dual-Agent Architecture: How HostAgent and AppAgent Coordinate Tasks

Deep dive into Microsoft UFO's dual-agent system where HostAgent orchestrates application selection and AppAgent executes in-app UI actions, with detailed coordination flow and plan execution examples.

Learn Agentic AI
11 min read5Mar 17, 2026

Building a Legal Reasoning Agent: Multi-Step Argument Construction with Evidence

Build an AI agent that performs structured legal reasoning — searching precedents, constructing multi-step arguments with evidence chains, generating counter-arguments, and producing balanced legal analysis.