Google is rolling out its Interactions API, a new unified interface for its Gemini models. The beta API aims to simplify interactions with Gemini, offering a more robust alternative to the existing generateContent API.
The Interactions API is designed to handle state management, tool orchestration, and long-running tasks more efficiently. It supports both general use cases and specialized functions like tool calling and agent interactions.
Unified Interface for Gemini Models
This new API consolidates various interaction patterns into a single, cohesive interface. It simplifies how developers engage with Gemini models and agents, promising a smoother workflow for complex AI applications.
The API is available via existing SDKs for Python and JavaScript, with REST endpoints also supported. Developers can start with basic text prompts or build multi-turn conversations.
Stateful and Stateless Conversations
For conversational AI, the Interactions API offers both stateful and stateless modes. Stateful conversations leverage previous_interaction_id to maintain context server-side, reducing the need to resend full chat histories. Stateless conversations require manual management of conversation history on the client side.
Multimodal Capabilities
The API extends Gemini's multimodal understanding and generation capabilities. Developers can provide inputs like images, audio, and video, and generate multimodal outputs, including images and speech.
- Image Understanding: Analyze images by providing URLs or uploaded files.
- Audio and Video Processing: Understand spoken content or analyze video content.
- Document Analysis: Process PDF documents directly.
- Image Generation: Create images with configurable aspect ratios and resolutions.
- Speech Generation: Convert text to natural-sounding speech with voice and language customization.
Agentic and Tool Integration
A significant focus is on agentic capabilities. The Interactions API facilitates the creation and interaction with specialized agents, such as the Deep Research agent. It also provides robust support for function calling, allowing custom tools to be integrated seamlessly.
Built-in tools like Google Search, code execution, and URL context are also accessible. This integration allows Gemini models to perform real-world actions and retrieve up-to-date information.
Advanced Features and Configuration
Advanced features include streaming responses for incremental updates and detailed control over model behavior via generation_config. Parameters like temperature, max_output_tokens, and thinking_level allow fine-tuning of the AI's output.
The thinking_level parameter specifically controls the depth of the model's internal reasoning process, balancing latency and output quality. Developers can also choose to receive summaries of the model's thought process.
Data Handling and Beta Status
By default, interactions are stored for 55 days on the paid tier and 1 day on the free tier to enable state management features. Users can opt out of storage by setting store=false. However, the API is currently in beta, and users should expect potential breaking changes. Google recommends using the stable generateContent API for production workloads.
