Meta Llama 3.1 8B inference overview

Price per 1M tokens

$0.22 (input)
$0.22 (output)

Parameters

8B

Context Window

128K

Release Date

Jul 2024

Meta Llama 3.1 8B inference details

Llama 3.1 8B provides efficient multilingual conversational support ideal for applications where responsiveness and computational efficiency are critical. Effective for building chatbots, automated customer interactions, and applications needing fast yet reliable language understanding.

Created by: 

Meta

License: 

llama3.1

Model card: 

import openai
import weave

# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("/")

client = openai.OpenAI(
    # The custom base URL points to W&B Inference
    base_url='https://api.inference.wandb.ai/v1',

    # Get your API key from https://wandb.ai/authorize
    # Consider setting it in the environment as OPENAI_API_KEY instead for safety
    api_key="",

    # Team and project are required for usage tracking
    project="/",
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ],
)

print(response.choices[0].message.content)

Meta Llama 3.1 8B resources

Screenshot 2025-07-30 at 1.32.47 PM
Guide
PII redaction with Llama 3.1 tutorial
Screenshot 2025-07-30 at 1.03.14 PM
Course
AI engineering course: Agents
Inference_logo
Guide
W&B Inference powered by CoreWeave