#Multimodal AI

50 articles with this tag

Mistral AI's Vox-Trainer and Fine-Tuning
Artificial Intelligence

Mistral AI's Vox-Trainer and Fine-Tuning

Mistral AI announces Vox-Trainer, a new multimodal AI model for voice cloning and speech generation, alongside new benchmarks for speech understanding.

2 days ago
3D Grounding for Vision-Language Models
AI Research

3D Grounding for Vision-Language Models

Loc3R-VLM enhances 2D VLMs with 3D spatial reasoning from monocular video, achieving SOTA in language-based localization and 3D QA.

13 days ago
PRIMO R1: Active Critics for Robotic Manipulation
AI Research

PRIMO R1: Active Critics for Robotic Manipulation

PRIMO R1 transforms video MLLMs into active critics for robotic manipulation via outcome-based RL, achieving SOTA on RoboFail and outperforming larger models.

15 days ago
Mistral Small 4 Unifies AI Capabilities
Artificial Intelligence

Mistral Small 4 Unifies AI Capabilities

Mistral AI unveils Mistral Small 4, a unified model combining text, image, reasoning, and coding capabilities under an open-source license.

15 days ago
CoCo: Code Drives Precise Image Generation
AI Research

CoCo: Code Drives Precise Image Generation

CoCo leverages executable code for precise, structured text-to-image generation, outperforming existing methods on complex benchmarks.

22 days ago
Code-Driven Reasoning for Precise Image Generation
AI Research

Code-Driven Reasoning for Precise Image Generation

CoCo (Code-as-CoT) introduces executable code as a reasoning framework for text-to-image generation, achieving superior precision and control.

22 days ago
AI Learns Beyond Text
Artificial Intelligence

AI Learns Beyond Text

AI is moving beyond text, with multimodal pretraining enabling models to learn from images, audio, and video for richer comprehension.

25 days ago
Microsoft's Compact AI Learns to Reason
AI Research

Microsoft's Compact AI Learns to Reason

Microsoft's new Phi-4-reasoning-vision-15B model offers strong multimodal reasoning capabilities in a compact, efficient package.

26 days ago
Crab+ Unifies AV-LLMs, Reverses Negative Transfer
AI Research

Crab+ Unifies AV-LLMs, Reverses Negative Transfer

Crab+ introduces a novel approach to Audio-Visual Large Language Models, overcoming negative transfer via explicit cooperation in data and model design.

27 days ago
Microsoft's Phi-4-reasoning-vision-15B compact AI model
AI Research

Microsoft's Phi-4-reasoning-vision-15B compact AI model

Microsoft Research's Phi-4-reasoning-vision-15B offers efficient multimodal AI, excelling in reasoning and vision tasks with less data and compute.

28 days ago
Google's Interactions API Evolves Gemini
Artificial Intelligence

Google's Interactions API Evolves Gemini

Google's new Interactions API for Gemini models offers a unified interface for complex AI tasks, supporting multimodal inputs, agents, and tool integration.

29 days ago
AI Research

Multimodal LLMs: What's Lost in Translation?

New research reveals multimodal LLMs struggle to utilize non-textual data due to a 'mismatched decoder problem,' impacting their true understanding.

about 1 month ago
AI Research

Less Data, More Alignment: SOTAlign

Researchers introduce SOTAlign, a framework that achieves robust cross-modal alignment using significantly less paired data by leveraging unpaired samples.

about 1 month ago
Agentic Vision Gemini 3 Flash: Code Execution Solves Visual Hallucination
AI Research

Agentic Vision Gemini 3 Flash: Code Execution Solves Visual Hallucination

Agentic Vision Gemini 3 Flash shifts multimodal AI from static image processing to an active, code-driven investigation, dramatically improving accuracy and verifiability.

2 months ago
Sparkli AI raises $5M to kill the EdTech chatbot for kids
Funding Round

Sparkli AI raises $5M to kill the EdTech chatbot for kids

Sparkli AI, founded by Google alums, raised a $5 million pre-seed round to develop a multimodal, simulation-based learning engine for children aged 5 to 12.

2 months ago
Argos Framework Delivers Grounded AI Reasoning
AI Research

Argos Framework Delivers Grounded AI Reasoning

Argos is an agentic verification framework that fundamentally changes reinforcement learning by rewarding models only for Grounded AI reasoning based on verifiable evidence.

2 months ago
Gemini API Data Ingestion Gets Production Ready
AI Research

Gemini API Data Ingestion Gets Production Ready

Google has upgraded Gemini API data ingestion to support persistent storage via GCS registration and external signed URLs, boosting the inline limit to 100MB.

3 months ago
The AI Pet Startup That Claims to Translate Your Dog's Thoughts
Funding Round

The AI Pet Startup That Claims to Translate Your Dog's Thoughts

3 months ago
Google Gemini 3 Redefines AI Reasoning and Efficiency
AI Research

Google Gemini 3 Redefines AI Reasoning and Efficiency

3 months ago
Google AI Tips: A Year of Ubiquitous Intelligence
AI Research

Google AI Tips: A Year of Ubiquitous Intelligence

3 months ago
T5Gemma 2 Multimodal Ushers In Efficient AI Future
AI Research

T5Gemma 2 Multimodal Ushers In Efficient AI Future

3 months ago
Tinker launches OpenAI API compatibility, challenging vendor lock-in.
AI Research

Tinker launches OpenAI API compatibility, challenging vendor lock-in.

4 months ago
Gemini Google Translate Elevates Nuance
AI Research

Gemini Google Translate Elevates Nuance

4 months ago
Gemma 3n Powers Real-World Impact at the Edge
AI Research

Gemma 3n Powers Real-World Impact at the Edge

4 months ago
AI Research

FACTS Benchmark Suite Elevates LLM Factuality Scrutiny

4 months ago
AI Precision Oncology Gets Scalable Boost from Microsoft AI
AI Research

AI Precision Oncology Gets Scalable Boost from Microsoft AI

4 months ago
Google's Gemini 3 Ushers In The Latest AI Era
AI Research

Google's Gemini 3 Ushers In The Latest AI Era

4 months ago
VoiceVision RAG: Beyond Text, Towards True Multimodal Document Intelligence
AI Video

VoiceVision RAG: Beyond Text, Towards True Multimodal Document Intelligence

4 months ago
Google TAU AI Partnership Expands Foundational AI Research
AI Research

Google TAU AI Partnership Expands Foundational AI Research

4 months ago
Google Cloud's Nano Banana Transforms Text-to-Vision Capabilities
AI Video

Google Cloud's Nano Banana Transforms Text-to-Vision Capabilities

4 months ago
Gemini 3 Unleashes a New Era of AI-Powered Creation
AI Video

Gemini 3 Unleashes a New Era of AI-Powered Creation

4 months ago
Meta’s Segment Anything Model 3 masters text and video
AI Research

Meta’s Segment Anything Model 3 masters text and video

4 months ago
Gemini 3: Google's Ambitious Leap Towards Universal AI Integration
AI Video

Gemini 3: Google's Ambitious Leap Towards Universal AI Integration

4 months ago
Google Gemini 3 Elevates AI with Agentic Interfaces
AI Research

Google Gemini 3 Elevates AI with Agentic Interfaces

4 months ago
NotebookLM Deep Research Redefines AI Analysis
AI Research

NotebookLM Deep Research Redefines AI Analysis

5 months ago
Marble World Model Goes Public, Redefining 3D Generation
Artificial Intelligence

Marble World Model Goes Public, Redefining 3D Generation

5 months ago
MMCTAgent: Microsoft's Multimodal Reasoning Agent Tackles Long-Form Video
AI Research

MMCTAgent: Microsoft's Multimodal Reasoning Agent Tackles Long-Form Video

5 months ago
Google's Nano Banana: The Human-Centric Evolution of Visual AI
AI Video

Google's Nano Banana: The Human-Centric Evolution of Visual AI

5 months ago
Emotive AI Redefines Customer Experience Dynamics
AI Research

Emotive AI Redefines Customer Experience Dynamics

5 months ago
Signify Elevates Support with Advanced Retrieval Augmented Generation
AI Research

Signify Elevates Support with Advanced Retrieval Augmented Generation

5 months ago
OlmoEarth Redefines Earth Observation Foundation Models
AI Research

OlmoEarth Redefines Earth Observation Foundation Models

5 months ago
OpenAI's Patent Strategy: Why the AI Leader Has Far Fewer Patents Than You'd Expect
Startup News

OpenAI's Patent Strategy: Why the AI Leader Has Far Fewer Patents Than You'd Expect

5 months ago
Automotive AI: Redefining Vehicle Design Quietly
AI Research

Automotive AI: Redefining Vehicle Design Quietly

Artificial intelligence is fundamentally reshaping vehicle design, moving beyond the long-promised fully autonomous car to deliver immediate, tangible improvements in today's vehicles. This evolution, often subtle, is driven by a sophisticated blend of on-device intelligence...

5 months ago
Fal.ai raises funding to advance multimodal AI platform
Funding Round

Fal.ai raises funding to advance multimodal AI platform

\n Multimodal AI startup Fal.ai has raised new funding, estimated at $250 million. This investment values the company at more than $4 billion.

5 months ago
Fal.ai raises funding to advance multimodal AI platform
Funding Round

Fal.ai raises funding to advance multimodal AI platform

\n Multimodal AI startup Fal.ai has raised new funding, estimated at $250 million. This investment values the company at more than $4 billion.

5 months ago
Nano Banana AI Elevates NotebookLM Video Overviews
AI Research

Nano Banana AI Elevates NotebookLM Video Overviews

6 months ago
Gemini 2.5 Pro Transforms Video Processing with Single API Calls
AI Video

Gemini 2.5 Pro Transforms Video Processing with Single API Calls

Ayo Adedeji, Google\'s Developer Relations Engineer, boldly declared, \"Or, you could just not do any of that.

6 months ago
Gemini 2.5 Pro Transforms Video Processing with Single API Calls
AI Video

Gemini 2.5 Pro Transforms Video Processing with Single API Calls

Ayo Adedeji, Google\'s Developer Relations Engineer, boldly declared, \"Or, you could just not do any of that.

6 months ago
Google AI Plus Expands to 40 New Countries, Shaking Up the AI Race
AI Research

Google AI Plus Expands to 40 New Countries, Shaking Up the AI Race

6 months ago
Gemini App Updates: Google Sharpens Its AI Assistant Edge
AI Research

Gemini App Updates: Google Sharpens Its AI Assistant Edge

6 months ago