Announcing: genai v0.1.0

The opinionated high performance professional-grade AI package for Go

2025-08-17 go

Go AI SDKs

I believe that the future in AI will be bi-modal. There’s going to be a few large model providers, a ton of small specialized ones, along with a commoditization of AI workloads. I want the ability to move to a different provider or model easily to both optimize latency (e.g. cerebras and groq), cost and functionality (vision, image generation interleaved with text, video, etc). It makes sense to run a model locally for many use cases. Every year, running models locally becomes more accessible. I wanted optionality.

All I wanted is something simple: improve my personal AI projects, keep costs low, and keep up with the state of the art (SOTA). I was doing raw HTTP requests. I wondered, should I use the provider’s SDK? It should be easy, right?

First, I looked at OpenAI’s Go SDK.

Then, I looked at Anthropic’s Go SDK.

I was not impressed. That’s not a SDK, that’s a code dump!

Both of these SDKs are generated from their schema definition as-is. Because many fields in the API are polymorphic, the generated SDKs end up with contrived exported symbols. In practice, these SDKs are more of a hindrance than a help. It’s hard to work with a low level SDK when the types have a high degree of polymorphism. Is “content” a null value, an empty list, a string, an object or a slice of object? It depends!

In the past 2 years, the concept of “OpenAI compatibility” has been popular amongst third party providers. As Jo Kristian Bergum noted, it’s a lie.

twitter.com/jobergum/status/1956374230976676123 and no, LiteLLM is not the answer

As an example, Gemma 3 on llama.cpp requires the use of alternating roles between each message, but many providers only support a bare string for “content” so having multiple content from the same role (e.g. the “user”) requires multiple consecutive messages with the same “user” role. Both are in direct conflict. Then there’s error handling. Some providers just return their pydantic exceptions as-is. Some return HTML pages as errors. How to handle errors during a server-sent events (SSE) stream? Each provider handles SSE slightly differently.

I came to the conclusion that designing a future-proof orthogonal and resilient API is not in their core skillset. Google tried to keep the form of the OpenAI API and tweak it to work in protobuf and reduce the structures' polymorphism. People were annoyed that it is different enough to break usage but not different enough to make the change worthwhile.

The current AI Go SDKs have two major flaws. Most SDKs use an imperative form when a declarative form works better. Most of them implemented the bare minimum with no real smoke testing, unit testing or any care for future proofing, let alone clean support for multi-modality. So a few months ago, I decided to start refactoring my own client code that kept on growing into a standalone package.

The result: genai

This led me to create genai. I designed the API to reduce the number of symbols a user (or coding agent!) must learn. I designed it to reduce the likelihood of creating an invalid message that will be refused by the provider. I made multi-modal and multiple content blocks (e.g. emitting code as an artifact plus explanation) a first class concept. Here’s a few key tenets:

Safe and strict API implementation. All you love from a statically typed language. The library’s smoke tests immediately fail on unknown RPC fields so the package is continuously improving. Error code paths are properly implemented.
Stateless and concurrency-safe. No global state, it is safe to use clients concurrently.
Professional-grade. Smoke tested on live services with recorded traces located in testdata/ directories, e.g. providers/anthropic/testdata/TestClient_Scoreboard/.
Trust, But Verify. It generates a scoreboard based on actual behavior from each provider.
Optimized for speed. Minimize memory allocations, compress data at the transport layer when possible. Groq, Mistral and OpenAI use brotli for HTTP compression instead of gzip, and POST’s body to Google are gzip compressed.
Lean, with few dependencies. No unnecessary abstraction layer.
Composable and future-proof. Enforce separation of concerns to future-proof the library.

I aimed for strong separation of concerns, cutting until there’s nothing left to remove. For example, unlike all other AI SDKs, the core struct Message doesn’t have a Role field. It only has 4 fields: Requests, User, Replies, ToolCallResults. The first two are sent by the “user”, where User is the user identifier. The second is sent by the LLM, the “assistant”, which includes tool call requests. The third is sent by the “computer” or “tool”. Having the struct effectively be an union makes things simpler to understand and to verify for validity. The struct Request only contains fields that are valid for a user to send. Same for Reply and ToolCallResult. Documents are in a Doc struct, which enables extensibility. Each struct has one job. Each struct has a Validate function to catch mistakes early on. The API normalizes and smooths out a lot of the differences of each provider, like token usage counts and restriction in the way messages must be encoded. It exposes common functionality everyone loves: temperature, top-k, max number of generated tokens, tools, JSON schema, etc. This requires a strong opinionated structure.

The Provider family of interfaces enable exposing each backend’s unique capabilities. Given the vast differences amongst the providers, I had to have more flexibility in how to expose their functionality. For example Black Forest Labs (bfl.ai, package providers/bfl) only supports asynchronous image generation. A synchronous adapter (ProviderGenDoc) is exposed for ease of use but the user can use the native asynchronous interface (ProviderGenAsync).

Exposing each provider’s strong and unique value proposition is critical in genai. Each provider has a xxxRaw method that accepts the provider-specific structs. This exposes all the provider’s functionality directly to bypass the abstraction layer whenever needed! Take a look at ChatResponse for the response from each GenSync provider implementation function and ChatStreamChunkResponse for GenStream. You want to generate images with OpenAI’s gptimage model with a transparent background? It’s available right there!

The global variable providers.All is a registry of all the providers implemented. Flipping between providers is simply changing the name (as a string). The function providers.Available returns the providers that are accessible, when the corresponding FOO_API_KEY environment variable is defined. There is no need to depend on environment variables though, specify the genai.ProviderOptions to pass authentication data, model and remote URL.

Try setting OPENAI_API_KEY and running the following:

func main() {
    msgs := genai.Messages{genai.NewTextMessage("Tell me a joke")}
    for name, factory := range providers.Available() {
        p, _ := factory(&genai.ProviderOptions{Model: genai.ModelCheap}, nil)
        if c, ok := p.(genai.ProviderGen); ok {
            if resp, err := c.GenSync(context.Background(), msgs, nil); err == nil {
                fmt.Printf("- %s says %s\n\n", name, resp.String())
            }
        }
    }
}

This 11 line snippet will request a joke from a cheap model for each provider you have configured with the relevant environment variables.

Trust? Verify!

Given that the quality of each provider varies a lot, each provider implementation is run through a scoreboard (github.com/maruel/genai/scoreboard) to assert the functionality they support in practice. For example Gemini supports Logprobs when doing synchronous requests yet will fail when doing a streaming request! The scoreboard enables genai to create a complete table of which functionality is supported by which provider. This table is 100% generated by requests to the backend and the requests are all recorded as yaml files. Using HTTP session recording limits the cost of creating the scoreboards. All the recordings are in the testdata/ directory under each provider’s implementation. When the scoreboard is created, the test fails if any unseen field is in the provider’s response. This provides clear visibility into what each provider sends back.

Provider	🌐	➛In	Out➛	JSON	Schema	Chat	Stream	Tool	Batch	Seed	File	Cite	Think	Probs	Limits
anthropic	🇺🇸	💬📄📸	💬	❌	❌	✅🤪	✅🤪	✅🧐	✅	❌	❌	✅	✅	❌	✅
bfl	🇩🇪	💬	📸	❌	❌	❌	❌	❌	✅	✅	❌	❌	❌	❌	✅
cerebras	🇺🇸	💬	💬	🤪	🤪	✅	✅	💨🧐	❌	✅	❌	❌	✅	✅	✅
cloudflare	🇺🇸	💬	💬	🤪	✅	✅🚩🤪	✅🚩🤪	💨	❌	✅	❌	❌	❌	❌	❌
cohere	🇨🇦	💬📸	💬	✅	✅	✅	✅	✅💥	❌	✅	❌	✅	✅	✅	❌
deepseek	🇨🇳	💬	💬	✅	❌	✅	✅	✅💥	❌	❌	❌	❌	✅	✅	❌
gemini	🇺🇸	🎤💬📄📸	💬📸	✅	✅	✅	✅	✅🧐	✅	✅	✅	❌	✅	✅	❌
groq	🇺🇸	💬📸	💬	✅	❌	✅	✅	💨🧐	❌	✅	❌	❌	✅	❌	✅
huggingface	🇺🇸	💬	💬	✅	❌	✅	✅	💨	❌	✅	❌	❌	✅	✅	✅
llamacpp	🏠	💬📸	💬	✅	✅	✅	✅	✅🧐	❌	✅	❌	❌	❌	✅	❌
mistral	🇫🇷	🎤💬📄📸	💬	✅	✅	✅	✅	✅🧐	❌	✅	❌	❌	❌	❌	✅
ollama	🏠	💬📸	💬	✅	✅	✅	✅	✅	❌	✅	❌	❌	✅	❌	❌
openai	🇺🇸	🎤💬📄📸	💬📸	✅	✅	✅🤪	✅🤪	✅💥🧐	✅	✅	✅	❌	✅	✅	✅
openairesponses	🇺🇸	💬📄📸	💬📸	✅	✅	✅💸🤪	✅💸🤪	✅🧐	❌	✅	❌	❌	✅	❌	✅
perplexity	🇺🇸	💬📸	💬	❌	✅	✅🤪	✅🤪	❌	❌	❌	❌	✅	✅	❌	❌
pollinations	🇩🇪	💬📸	💬📸	🤪	❌	✅🤪	✅💸🤪	✅🧐	❌	✅	❌	❌	❌	❌	❌
togetherai	🇺🇸	💬📸	💬📸	✅	✅	✅🚩🤪	✅🚩🤪	💨🧐	❌	✅	❌	❌	❌	✅	✅
openaicompatible	N/A	💬	💬	❌	❌	✅	✅	❌	❌	❌	❌	❌	❌	❌	❌

Table generated with go run github.com/maruel/genai/cmd/scoreboard@latest -table

This table is a very condensed view of what each provider supports. Thinking? Logprobs? Returning the remaining quota? Batching support for cost savings? Tool calling that actually works? JSON schema? Citations? Audio, image, video and PDF modalities? It’s all tested. See README.md for the legend!

Composability

The package is composable. For example, genai doesn’t provide logging because everyone wants to log differently. Here’s how to add your own logging:

// ProviderGenLog wraps a ProviderGen to add logging messages when generating responses.
type ProviderGenLog struct {
    genai.ProviderGen
}

func (l *ProviderGenLog) GenSync(ctx context.Context, msgs genai.Messages, opts genai.Options) (genai.Result, error) {
    start := time.Now()
    resp, err := l.ProviderGen.GenSync(ctx, msgs, opts)
    slog.DebugContext(ctx, "GenSync", "msgs", len(msgs), "dur", time.Since(start).Round(time.Millisecond), "err", err, "usage", resp.Usage)
    return resp, err
}

func (l *ProviderGenLog) GenStream(ctx context.Context, msgs genai.Messages, replies chan<- genai.ReplyFragment, opts genai.Options) (genai.Result, error) {
    start := time.Now()
    resp, err := l.ProviderGen.GenStream(ctx, msgs, replies, opts)
    slog.DebugContext(ctx, "GenStream", "msgs", len(msgs), "dur", time.Since(start).Round(time.Millisecond), "err", err)
    return resp, err
}

func main() {
    c, _ := anthropic.New(&genai.ProviderOptions{}, nil)
    p := &ProviderGenLog{c}
    p.GenSync(...)
}

That’s it!

Do you need to throttle all HTTP requests to stay inside your rate limit? Use roundtrippers.Throttle or your favorite http.RoundTripper:

import "github.com/maruel/roundtrippers"

func main() {
    requestsPerSecond := 0.5
    c, _ := anthropic.New(&genai.ProviderOptions{}, func(h http.RoundTripper) http.RoundTripper {
        return &roundtrippers.Throttle{QPS: requestsPerSecond, Transport: h}
    })
    c.GenSync(...)
}

Use a similar flow for HTTP request logging or HTTP recording, so that you can do reproducible smoke tests. While genai uses github.com/dnaeon/go-vcr for its internal recordings, you are free to use your preferred library.

Living on the edge and need to use Anthropic beta headers? Want to specify the OpenAI-Organization header? Either way use a roundtrippers.Header to inject custom HTTP headers.

Static typing is great

Go struct tags are a great way to tell the encoding/json package how to encode and decode structs. genai leverages github.com/invopop/jsonschema to augment the generated JSON schema to add description, enum support and specify which fields are required. This is available both for JSON schema and for tool calling. This means that tool calling ends up being as simple as defining one function, one struct for the inputs and one struct to describe the tool.

import "github.com/maruel/roundtrippers"

func main() {
    c, _ := cerebras.New(&genai.ProviderOptions{}, nil)
    msgs := genai.Messages{genai.NewTextMessage("What season is Montréal currently in?")}
    opts := genai.OptionsText{
      Tools: []genai.ToolDef{locationClockTime},
      // Force the LLM to do a tool call first.
      ToolCallRequest: genai.ToolCallRequired,
    }
    newMsgs, _, _ := adapters.GenSyncWithToolCallLoop(context.Background(), c, msgs, &opts)
    fmt.Printf("%s\n", newMsgs[len(newMsgs)-1].String())
}

var locationClockTime = genai.ToolDef{
    Name:        "get_today_date_current_clock_time",
    Description: "Get the current clock time and today's date.",
    Callback: func(ctx context.Context, e *location) (string, error) {
        if e.Location != "Montréal" {
            return "ask again with Montréal", nil
        }
        return time.Now().Format("Monday 2006-01-02 15:04:05"), nil
    },
}

type location struct {
    Location string `json:"location" json_description:"Location to ask the current time in"`
}

This makes tools really simple to implement and there are no surprises. Types are enforced. The context can be canceled for a clean cancellation.

Soon, tools will be allowed to return genai.Doc as more providers (currently Anthropic and Mistral) support it.

go build

genai is a great way to get into AI. It is composable, well tested, performant and handles error code paths.

If you program in Go and are interested in machine learning, try genai! It’s easy to get started and it is very extensible. Check out github.com/maruel/genai to get started. Join the discussion on Discord.

TODO: Discord URL.

I’d love your feedback! The project is still evolving, so contributions and suggestions (including critiques) are appreciated! The project is in a phase where I am open to making breaking changes to improve the ergonomics, and better future proof it. The field is moving fast and we don’t know where the state of the art will be in 5 years. Send feedback (or PRs!) my way so v0.2.0 gets better!

Thanks!

Marc-Antoine