In the rapidly evolving media landscape, Israeli startup, Cloudinary, is setting the precedent. They're relied on by thousands of websites to handle media processing, and in recent months, they're competing head-to-head with the giants in offering Generative AI for media. From image optimization to pioneering Generative AI, the bootstrapped unicorn leads the way, now harnessing Large Language Models (LLMs) and offering category-defining media generation.
Today, I sat down with Tal Lev-Ami, Cloudinary's visionary on the frontlines of this revolution, to explore the exciting world of Generative AI, where creativity and technology blend into a single notion. Cloudinary's automated, AI-driven media management empowers brands to deliver dynamic digital experiences at scale. Lev-Ami's journey with Cloudinary began with a commitment to excellence and a prescience into AI's potential.
As our conversation unfolded, we explored Generative AI's intricacies, enterprise integration challenges, and the future of digital visual media.
How long have you been using Generative AI? What were the first uses that you used it for?
We've been using AI for many, many years. From the early start, we had all sorts of algorithms to find the optimal quality, optimal crop, and location. Generative AI is allowing a new generation of visual media using LLMs. This is basically around two areas: one is the generation of visual media and the other is large language models. Both of them are under this envelope, called Generative AI, but they're two separate things that come from separate advances.
On the visual side, the first generative feature we had was style transfer, where you take a source image that you want to update and match it against another image that contains the style that you want it to have - whether it is that of a particular artist or your brand guidelines or something like that. The AI then generates a derivative image that matches the content in the original image to the desired style. We've had that for around five years now.
Every time new capabilities appear in the industry, the state-of-the-art moves and we try to keep pace and do even a bit more. One important aspect is that we aim to serve organizations, in many cases at the enterprise level, and they need something that is reliable. So, we are focused on identifying and prioritizing the technology innovations that are already good enough to use at the enterprise level, those which any brand can deploy and trust the results.
About a year ago, the emergence of GPT-4 and Stable Diffusion were both significant leaps in their respective fields. So we've made a concentrated effort in the company to determine exactly what we should do in the field of Generative AI and how we can best build upon our legacy of AI expertise to provide enterprise-ready solutions for our customers. In the last year or so, we've launched multiple Generative AI features that brands are taking advantage of to accelerate the value of their visual assets.
Tell us about the new features that you've added using Generative AI.
We have multiple features. On the visual side of things, one of the features is called Generative Fill. And the idea is that when you needed a different aspect ratio of an image in the past, you either had to lose some of the pixels or add black padding or white right padding around it. Today, you can ask the algorithm to expand the canvas and put something outside the original frame that fully matches the context of the current image.
We also have a feature we call Generative Erase where a user can tell the AI the name of an object that they don't want in the image. The AI instantly erases it and replaces it with a background consistent with the context of the environment featured in that image. Similarly, we have a Generative Replace feature which replaces a chosen object within an image with another item or subject, whatever the user asks for in their prompt generated by that prompt.
We also have Generative Recolor, which allows a user to take an object in a given color and replace that color with the color you choose - and which respects detailed requests for concepts such as shading, shadow, and light. We have Generative Restore, which takes an image that was either overcompressed or a bit blurry or lower quality and makes it pristine. We have Upscale, which takes a smaller image and enlarges it, inventing new details that were not there in the original image, to provide high-quality assets that can be published in any context. We also have Background Removal, which is not exactly generative, but it's based on similar concepts, and is something we’ve offered for some time.
These are things that we're doing on the visual side. There are more in the pipeline on the textual side of things which is a different category. We're a very visual company. It's all about the images, the video, the 3D objects, and less about the text. So we wanted to see what we could do with this incredible technology that will be useful to our customers using LLMs.
To do this, we’ve already released a tool that provides users with a chat interface and allows them to do image transformations based on a complicated set of written instructions on how to manipulate the image. The user can save the request and apply it to any other image as well. They can just do it in the URL since you need to compress all these instructions to a single URL.
It’s possible to do very complicated things with it, but it requires some technical knowledge. To make it accessible to everyone, we built a chat interface that connects with our documentation and allows you to ask the AI in natural language to do things like crop the image to focus on a specific object, add a watermark on the top left corner, resize it to the correct size and so on. The key is to make the interface more friendly to people that didn't invest the time in reading the entire documentation and getting up to speed with how you could do it yourself.
One important thing that we didn't release as a feature, but is really a separate product is what we call Final Touch, which allows you to do virtual photoshoots. You upload an image of an object and it will isolate that object and allow you to easily place that object in various scenes that it can generate. You can tell it what type of theme and style you want. It's really cool, and very visual. And it's very easy to use. A lot of what we do is designed to empower developers. But this one is something that anybody can play with.
There is a lot more we’re working on, and this is a rapidly changing space where we’re excited about the new and increased value we can bring to our users.
