What is Raster?
Raster is a digital asset manager for modern teams, saving time organizing, editing, and hosting photography developed in-house by the Monogram team. It focuses on saving time in organizing, editing, and hosting photography. Raster offers features like AI-driven organization, nondestructive collaborative editing, and efficient photo management. It utilizes AI for organizing images with smart tags and streamlines workflows for developers, designers, and marketing teams.
What is Gemini Pro Vision?
Gemini Pro Vision is a model from Google Gemini AI, which is Google's most advanced and versatile AI to date. Developed collaboratively by teams across Google, including Google Research, Gemini is designed to be multimodal, capable of understanding and operating across different types of information such as text, code, audio, images, and video. Gemini Pro is optimized for scaling across a wide range of tasks, boasting state-of-the-art capabilities that enhance how developers and enterprise customers build and scale with AI. For more details, you can visit the Google blog post.
The need for AI
We've been eagerly anticipating the opportunity to test out the multimodal prompts capability of the Gemini API. And what better platform to test it with than Raster!
This would allow Raster to:
- automatically label each image with highly relevant tags, revolutionizing the way we search and organize our visual assets
- generate alt tags that are both user-friendly and SEO-optimized, giving our images an extra boost in terms of search engine visibility and accessibility
How we did it?
Let’s focus on the alt text generation functionality, here’s how we did it:
- Image Acquisition: Access the uploaded image via a direct API call.
- Image Processing: Preprocess the image by resizing and converting it to a format compatible with Gemini Pro Vision's API image analysis engine.
- Content Analysis via Clear Prompt Instructions: Utilize Gemini Vision's advanced content analysis capabilities to extract meaningful insights from the image. This involves identifying prominent objects, scenes, and other visual elements — this is where Gemini’s true multimodal capability shines.
- Alt Text Generation: Based on the extracted insights, Gemini Vision generates a concise and descriptive alt text that accurately conveys the image's content.
- Integration with Raster: Store the generated alt text into the appropriate fields in Raster's database and display it’s results to the user interface.

Google AI Studio
Initially, we started using Google AI Studio at https://makersuite.google.com to test and improve prompts and obtain the desired responses. We used it to test multiple alternatives until we achieved the optimal input for our use case.

We experimented with various inputs until we considered asking Gemini for an optimal prompt for an AI. This helped us refine our input and achieve the desired outcome.

In addition to testing different inputs, the generation config is also important for fine-tuning the responses, specifically the temperature, topK, and topP parameters. You can learn more about their meaning and values in the Gemini API documentation.
After iterating on the Google AI Studio, a highly useful feature is the "Get code" option. The AI Studio web app provides all the necessary code to run what you have in your prompts, simplifying the process of transferring ideas from the studio to the application.

As JavaScript developers, the quickest way for us to start using Gemini is by directly utilizing the API for our web app using the Google AI JavaScript SDK. This SDK is suitable for anyone who prefers not to work with REST APIs or server-side code (such as Node.js) to access Gemini models in their web app.
Side note: If you are using the code on the client site where the API key is exposed, make sure to set restrictions in the Google Cloud Console in GCP based on the specific use case.

Google AI SDK
The Google AI JavaScript SDK, available as an open source repo, allows developers to utilize Google's Generative AI models. This SDK supports various use cases including:
- Generate text from text-only input
- Generate text from multimodal prompts (text and images)
- Build multi-turn conversations (chat)
After getting the code from Google AI Studio, we refactored it to align with our style guide and enhance reusability for future Gemini API features within Raster.
The Code
First, install the JS SDK as a dependency in your project