Inference API Timeout

djr · May 30, 2025, 9:47am

The following modification to the script on the Inference API Doc reproduces the error:

from openai import OpenAI
import os
import dotenv
from tqdm import tqdm

dotenv.load_dotenv()

# Set API credentials and endpoint
openai_api_key = os.getenv("LAMBDA_API_KEY")
openai_api_base = "https://api.lambda.ai/v1"

# Initialize the OpenAI client
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

# Choose the model
model = "llama-4-scout-17b-16e-instruct"
for i in tqdm(range(100)):
    # Create a multi-turn chat completion request
    chat_completion = client.chat.completions.create(
        messages=[{
            "role": "system",
            "content": "You are an expert conversationalist who responds to the best of your ability."
        }, {
            "role": "user",
            "content": "Who won the world series in 2020?"
        }, {
            "role": "assistant",
            "content": "The Los Angeles Dodgers won the World Series in 2020."
        }, {
            "role": "user",
            "content": "Where was it played?"
        }],
        model=model,
    )

    # Print the full chat completion response

Topic		Replies	Views
Call Lambda's Llama? I keep getting a time out error Deep Learning: Getting Started	0	110	December 10, 2025
I'm getting an HTTP code 524 response from the the inference API's Technical Help	0	83	March 2, 2025
Did inference api change recently? Technical Help	0	116	March 20, 2025
Hermes3-405B API not responding Technical Help	1	124	October 1, 2025
Does Inference API support batch/asynchronous processing	1	129	March 13, 2025

Inference API Timeout

Related topics