Skip to content
Learn Agentic AI
Learn Agentic AI13 min read4 views

OpenAI Fine-Tuning API: Training Custom Models Step by Step

A complete walkthrough of fine-tuning models through the OpenAI API, covering data preparation in JSONL format, file upload, training job creation, evaluation, and deploying your custom model.

Why Fine-Tune Through OpenAI

The OpenAI fine-tuning API lets you train a customized version of GPT-4o-mini, GPT-4o, or other supported models on your own data. The result is a model that behaves the way you want without requiring a long system prompt on every request. OpenAI manages the training infrastructure, GPU allocation, and model hosting. You provide the training data and configuration.

Fine-tuning through the API is particularly valuable when you need consistent output formatting, domain-specific terminology, or a particular reasoning style that few-shot prompting cannot reliably achieve.

Step 1: Prepare Your Training Data

Training data must be in JSONL format where each line is a JSON object containing a messages array. Each message has a role (system, user, or assistant) and content field.

flowchart TD
    START["OpenAI Fine-Tuning API: Training Custom Models St…"] --> A
    A["Why Fine-Tune Through OpenAI"]
    A --> B
    B["Step 1: Prepare Your Training Data"]
    B --> C
    C["Step 2: Validate Your Data"]
    C --> D
    D["Step 3: Upload the Training File"]
    D --> E
    E["Step 4: Create the Fine-Tuning Job"]
    E --> F
    F["Step 5: Monitor Training Progress"]
    F --> G
    G["Step 6: Use Your Fine-Tuned Model"]
    G --> H
    H["Step 7: Evaluate Against the Base Model"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
import json

training_examples = [
    {
        "messages": [
            {
                "role": "system",
                "content": "You are a medical coding assistant. Output ICD-10 codes in the format CODE: DESCRIPTION."
            },
            {
                "role": "user",
                "content": "Patient presents with acute bronchitis."
            },
            {
                "role": "assistant",
                "content": "J20.9: Acute bronchitis, unspecified"
            }
        ]
    },
    {
        "messages": [
            {
                "role": "system",
                "content": "You are a medical coding assistant. Output ICD-10 codes in the format CODE: DESCRIPTION."
            },
            {
                "role": "user",
                "content": "Diagnosis: Type 2 diabetes with diabetic chronic kidney disease, stage 3."
            },
            {
                "role": "assistant",
                "content": "E11.22: Type 2 diabetes mellitus with diabetic chronic kidney disease\nN18.3: Chronic kidney disease, stage 3 (moderate)"
            }
        ]
    },
]

# Write to JSONL
with open("training_data.jsonl", "w") as f:
    for example in training_examples:
        f.write(json.dumps(example) + "\n")

Step 2: Validate Your Data

Before uploading, validate that every line parses correctly and follows the expected schema. OpenAI provides a data preparation utility, but you can also validate manually.

import json

def validate_training_file(filepath: str) -> dict:
    errors = []
    valid_count = 0

    with open(filepath, "r") as f:
        for line_num, line in enumerate(f, 1):
            try:
                data = json.loads(line)
            except json.JSONDecodeError:
                errors.append(f"Line {line_num}: Invalid JSON")
                continue

            if "messages" not in data:
                errors.append(f"Line {line_num}: Missing 'messages' key")
                continue

            messages = data["messages"]
            roles = [m.get("role") for m in messages]

            if "assistant" not in roles:
                errors.append(f"Line {line_num}: No assistant message")
                continue

            for msg in messages:
                if "content" not in msg or not msg["content"].strip():
                    errors.append(f"Line {line_num}: Empty content in {msg.get('role')}")
                    continue

            valid_count += 1

    return {
        "total_lines": line_num,
        "valid": valid_count,
        "errors": errors[:20],
    }

result = validate_training_file("training_data.jsonl")
print(f"Valid examples: {result['valid']}/{result['total_lines']}")

Step 3: Upload the Training File

from openai import OpenAI

client = OpenAI()

# Upload training file
training_file = client.files.create(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune",
)
print(f"File ID: {training_file.id}")
# Output: File ID: file-abc123...

# Optionally upload a validation file
validation_file = client.files.create(
    file=open("validation_data.jsonl", "rb"),
    purpose="fine-tune",
)

Step 4: Create the Fine-Tuning Job

job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    validation_file=validation_file.id,
    model="gpt-4o-mini-2024-07-18",
    hyperparameters={
        "n_epochs": 3,
        "batch_size": "auto",
        "learning_rate_multiplier": "auto",
    },
    suffix="medical-coder",  # Custom name suffix
)
print(f"Job ID: {job.id}")
print(f"Status: {job.status}")

The suffix parameter adds a custom label to your model name, making it easy to identify: ft:gpt-4o-mini-2024-07-18:your-org:medical-coder:abc123.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

flowchart LR
    S0["Step 1: Prepare Your Training Data"]
    S0 --> S1
    S1["Step 2: Validate Your Data"]
    S1 --> S2
    S2["Step 3: Upload the Training File"]
    S2 --> S3
    S3["Step 4: Create the Fine-Tuning Job"]
    S3 --> S4
    S4["Step 5: Monitor Training Progress"]
    S4 --> S5
    S5["Step 6: Use Your Fine-Tuned Model"]
    style S0 fill:#4f46e5,stroke:#4338ca,color:#fff
    style S5 fill:#059669,stroke:#047857,color:#fff

Step 5: Monitor Training Progress

import time

def monitor_job(client, job_id: str, poll_interval: int = 30):
    while True:
        job = client.fine_tuning.jobs.retrieve(job_id)
        print(f"Status: {job.status}")

        if job.status == "succeeded":
            print(f"Fine-tuned model: {job.fine_tuned_model}")
            return job.fine_tuned_model

        if job.status == "failed":
            print(f"Error: {job.error}")
            return None

        # List recent events
        events = client.fine_tuning.jobs.list_events(
            fine_tuning_job_id=job_id, limit=5
        )
        for event in events.data:
            print(f"  [{event.created_at}] {event.message}")

        time.sleep(poll_interval)

model_name = monitor_job(client, job.id)

Step 6: Use Your Fine-Tuned Model

Once training succeeds, use the fine-tuned model exactly like any other OpenAI model.

response = client.chat.completions.create(
    model=model_name,  # ft:gpt-4o-mini-2024-07-18:your-org:medical-coder:abc123
    messages=[
        {
            "role": "system",
            "content": "You are a medical coding assistant. Output ICD-10 codes in the format CODE: DESCRIPTION."
        },
        {
            "role": "user",
            "content": "Patient diagnosed with essential hypertension and hyperlipidemia."
        },
    ],
    temperature=0.0,
)
print(response.choices[0].message.content)
# I10: Essential (primary) hypertension
# E78.5: Hyperlipidemia, unspecified

Step 7: Evaluate Against the Base Model

Always compare your fine-tuned model against the base model on a held-out test set.

import json

def evaluate_model(client, model: str, test_file: str) -> dict:
    correct = 0
    total = 0

    with open(test_file, "r") as f:
        for line in f:
            example = json.loads(line)
            messages = example["messages"]
            expected = messages[-1]["content"]
            prompt = messages[:-1]

            response = client.chat.completions.create(
                model=model,
                messages=prompt,
                temperature=0.0,
            )
            predicted = response.choices[0].message.content.strip()
            if predicted == expected:
                correct += 1
            total += 1

    return {"model": model, "accuracy": correct / total, "total": total}

base_results = evaluate_model(client, "gpt-4o-mini", "test_data.jsonl")
ft_results = evaluate_model(client, model_name, "test_data.jsonl")

print(f"Base model accuracy: {base_results['accuracy']:.1%}")
print(f"Fine-tuned accuracy: {ft_results['accuracy']:.1%}")

FAQ

How much does fine-tuning cost on the OpenAI API?

Training costs depend on the model and the number of tokens in your training data. For GPT-4o-mini, training costs approximately $3.00 per million tokens. A dataset of 500 examples at 500 tokens each totals about 250K tokens per epoch — roughly $0.75 per epoch. With 3 epochs, that is about $2.25 total for training. Inference on fine-tuned models costs the same as the base model.

How long does a fine-tuning job take?

Most fine-tuning jobs complete in 15 minutes to 2 hours, depending on dataset size and the number of epochs. Smaller datasets with 3 epochs typically finish in under 30 minutes. The OpenAI platform queues jobs, so there may be additional wait time during peak demand.

Can I fine-tune a fine-tuned model further with new data?

Yes. You can use a previously fine-tuned model as the base for a new fine-tuning job. This is useful for iterative improvement — train on your initial dataset, evaluate, then fine-tune again on a curated set of examples where the model performed poorly. Just reference the fine-tuned model ID as the model parameter.


#OpenAI #FineTuning #CustomModels #API #GPT #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Technical Guides

How AI Voice Agents Actually Work: Technical Deep Dive (2026 Edition)

A full technical walkthrough of how modern AI voice agents work — speech-to-text, LLM orchestration, TTS, tool calling, and sub-second latency.

Technical Guides

Voice AI Latency: Why Sub-Second Response Time Matters (And How to Hit It)

A technical breakdown of voice AI latency budgets — STT, LLM, TTS, network — and how to hit sub-second end-to-end response times.

Technical Guides

Building Voice Agents with the OpenAI Realtime API: Full Tutorial

Hands-on tutorial for building voice agents with the OpenAI Realtime API — WebSocket setup, PCM16 audio, server VAD, and function calling.

AI Interview Prep

8 AI System Design Interview Questions Actually Asked at FAANG in 2026

Real AI system design interview questions from Google, Meta, OpenAI, and Anthropic. Covers LLM serving, RAG pipelines, recommendation systems, AI agents, and more — with detailed answer frameworks.

AI Interview Prep

8 LLM & RAG Interview Questions That OpenAI, Anthropic & Google Actually Ask

Real LLM and RAG interview questions from top AI labs in 2026. Covers fine-tuning vs RAG decisions, production RAG pipelines, evaluation, PEFT methods, positional embeddings, and safety guardrails with expert answers.

AI Interview Prep

7 ML Fundamentals Questions That Top AI Companies Still Ask in 2026

Real machine learning fundamentals interview questions from OpenAI, Google DeepMind, Meta, and xAI in 2026. Covers attention mechanisms, KV cache, distributed training, MoE, speculative decoding, and emerging architectures.