Google’s latest iteration of its image generation and editing model, Gemini 2.5 Flash Image—affectionately dubbed "Nano Banana"—represents a significant advancement in AI's ability to understand and manipulate visual content. Matthew Berman, in his recent demonstration, showcased a suite of capabilities that push the boundaries of what was previously thought possible, extending far beyond simple pixel-level adjustments.

Berman highlighted the model's core strengths through a series of impressive feats. Its semantic understanding of objects and environments is profound. For instance, when presented with an image of two smartphones and instructed to "flip the phones over," Nano Banana not only accurately rotated the devices but intelligently rendered their opposite sides, complete with operating system interfaces. "It knew what the other side of the iPhone looks like. It knew what all of the icons of the iPhone looks like, the entire operating system," Berman noted, underscoring the model’s deep contextual awareness.

The model also exhibits an unprecedented grasp of real-world physics and light. In a compelling example, Berman added a pair of reflective sunglasses to his own portrait. The AI seamlessly integrated the glasses, accurately reflecting the surrounding field of yellow flowers into their lenses. "Look at the reflection in the glasses. This is absolutely crazy. You can actually see all of the yellow flowers in the reflection," he exclaimed, illustrating the model's sophisticated environmental integration.

Another critical insight is Nano Banana's remarkable character consistency and style transfer. The model demonstrated the ability to maintain the likeness of subjects across varied contexts and even stylistic transformations. When an original grainy moon landing photo was edited to include a camera crew and then zoomed out to reveal a sound stage, the newly generated elements seamlessly matched the vintage aesthetic. "What's super impressive is the style consistency. So, you can see the original image was very grainy and then the people in the background look like they're from the same time in the same photo," Berman observed. This capability extends to transforming a human subject into an anime character and then into a 3D model, all while retaining core identity.

Beyond these complex manipulations, the model also excels at practical tasks vital for content creators and businesses. Its ability to flawlessly remove backgrounds from images, as demonstrated with a photo of Sam Altman, is a testament to its precision. "Flawlessly done. There's literally not a single mistake in all of removing the background," Berman stated. Nano Banana can also generate multiple product angles from a single image, restore and colorize old photographs with impressive accuracy, and even craft contextually relevant YouTube thumbnails. While minor inconsistencies were occasionally noted, such as a miscounted KAWS character or an apple stem in a counting exercise, these were overshadowed by the overall fidelity and intelligence of the output. The speed and versatility of Nano Banana promise to significantly streamline workflows and unlock new creative avenues across various industries.

Google's Nano Banana: A Leap in AI Image Manipulation

Related startups

AI Daily Digest