What Is Google Gemini Pro?
Google Gemini is a large language model (LLM) developed by Google. It is Google’s answer to popular, competing LLM technologies like OpenAI GPT-4 and Anthropic Claude. Gemini performs well on LLM benchmarks and incorporates novel technologies to improve computational speed, accuracy, and ability to process multi-modal inputs.
Google Gemini Pro is a full-scale version of the model, providing high performance on LLM benchmarks while offering improved computational efficiency. Google also offers Gemini Flash, a lightweight model for constrained environments, and has announced Gemini Ultra, a more advanced version of the model. This is part of an extensive series of guides about machine learning.
8 Key Features of the Gemini Pro Model Family
Here are the key features offered by Google Gemini Pro:
- Multimodal processing: Gemini Pro supports text, images, audio, and video inputs, enabling it to handle various types of data seamlessly. This makes it suitable for tasks that require understanding and generating content across different media.
- Scalable context window: With a context window of up to 1 million tokens, extendable to 2 million tokens, Gemini Pro can process large datasets. This capability is particularly useful for tasks involving long documents, extensive codebases, or lengthy audio and video files.
- Mixture-of-Experts (MoE) architecture: The MoE architecture allows the model to activate the most relevant neural pathways for specific inputs, enhancing both efficiency and performance. This specialized processing ensures high accuracy and speed in handling complex tasks.
- Advanced text generation and understanding: The model supports various text generation methods, including content creation, summarization, translation, and intelligent Q&A. It can produce high-quality written content, translate languages, and answer questions by reasoning over multimodal inputs.
- Customizable AI: Users can create customized versions of the Gemini AI, known as Gems, tailored to specific tasks and preferences. This feature enhances flexibility and allows for specialized applications in different industries.
- Function calling and JSON mode: Gemini Pro can produce structured output from unstructured data and supports enhanced function calling capabilities. This makes it easier to integrate with other systems and extract useful information from various data formats.
- Integration with Google services: Gemini Pro integrates with Google Cloud services, including Vertex AI, and tools like Google Workspace. This allows for easy deployment and management of AI-driven applications across different platforms.
- Enhanced safety and control: The model includes automatic safety features, adjustable by developers, ensuring that outputs are safe and appropriate for various use cases.
Google Gemini 1.5 Pro Architecture
The platform leverages a Mixture-of-Experts (MoE) architecture, which enhances performance by activating the most relevant neural network pathways based on the input type. This design improves efficiency and allows the model to process complex tasks with greater speed and accuracy.
The architecture is rooted in Google’s leading research on Transformer and MoE models, which involve dividing a large neural network into smaller "expert" networks. These experts specialize in different types of data, such as text, images, or code, and are selectively activated to handle specific inputs. This specialization enables the model to maintain high performance while being more resource-efficient during training and operation.
Gemini 1.5 Pro, the most current model in the Gemini Pro suite, features a standard 128,000 token context window, with capabilities to extend up to 1 million tokens for specific use cases. This large context window allows the model to handle extensive datasets, including lengthy text documents, large codebases, and long video or audio files.
Google Gemini Pro Models: Technical Specifications
Here are the technical details of models offered in the Gemini Pro family, as of the time of this writing. Rate limits of the models are expressed in these terms:
- Requests per minute (RPM)
- Tokens per minute (TPM)
- Requests per day (RPD)
- Tokens per day (TPD)
Gemini 1.5 Pro
Gemini 1.5 Pro is a mid-size multimodal model optimized for a range of reasoning tasks. It supports code and text generation, text editing, problem-solving, recommendations, information extraction, data generation, and the creation of AI agents. It can process large datasets, including extensive video, audio, and codebases.
Model details:
- Inputs: Audio, images, and text
- Output: Text
- Input token limit: 1,048,576
- Output token limit: 8,192
- Maximum number of images per prompt: 3,600
- Maximum video length: 1 hour
- Maximum audio length: Approximately 9.5 hours
- Maximum number of audio files per prompt: 1
- Model safety: Automatically applied, adjustable by developers
- System instructions: Supported
- JSON mode: Supported
Rate limits:
- Free:
- 2 RPM
- 32,000 TPM
- 50 RPD
- 6,080,000 TPD
- Pay-as-you-go:
- 360 RPM
- 10 million TPM
- 10,000 RPD
- 14,400,000,000 TPD
- Two million context:
- 1 RPM
- 2 million TPM
- 50 RPD
Gemini 1.0 Pro
Gemini 1.0 Pro is an LLM designed to handle tasks such as multi-turn text and code chat, as well as code generation. It supports zero-shot, one-shot, and few-shot learning, making it versatile for various applications.
Model details:
- Input: Text
- Output: Text
- System instructions: Unsupported
- JSON mode: Unsupported
Rate limits:
- Free:
- 15 requests per minute (RPM)
- 32,000 tokens per minute (TPM)
- 1,500 requests per day (RPD)
- 46,080,000 tokens per day (TPD)
- Pay-as-you-go:
- 360 RPM
- 120,000 TPM
- 30,000 RPD
- 172,800,000 TPD
Gemini 1.0 Pro Vision
Gemini 1.0 Pro Vision is a performance-optimized multimodal model capable of handling visual-related tasks. It can generate image descriptions, identify objects in images, and provide information about places or objects depicted in images. Similar to 1.0 Pro, it supports zero-shot, one-shot, and few-shot learning.
Model details:
- Inputs: Text and images
- Output: Text
- Input token limit: 12,288
- Output token limit: 4,096
- Maximum image size: No limit
- Maximum video length: 2 minutes
- Maximum number of videos per prompt: 1
- Model safety: Automatically applied, adjustable by developers
Rate limits:
- 60 RPM
- 16 images per prompt
Google Gemini Pro Consumer Pricing {#google-gemini-pro-consumer-pricing}
For end-users and organizations, Gemini is available in two versions:
Gemini (Free)
The standard version of Gemini is free, providing access to the 1.0 Pro model. It aids in writing, learning, and planning and is integrated with Google applications.
Gemini Advanced (Paid)
This version costs $19.99 per month. It uses 1.5 Pro, with a context window of 1 million tokens. It comes with 2 TB of Google One storage and can be integrated into Gmail and Google Docs. It can also run and edit Python code.
Google Gemini Pro API Pricing
Gemini 1.0 Pro Pricing
For users of the Gemini 1.0 Pro model, there are two pricing tiers available: Free and Pay-as-you-go. Please note rate limits listed in the model technical specifications above.
Free Tier:
- Cost: Free of charge for input and output
- Notes: Prompts and responses may be used to improve Google products. Context caching is not available.
Pay-as-you-go Tier:
-
Cost:
- $0.50 per 1 million input tokens
- $1.50 per 1 million output tokens
- Notes: Prompts and responses are not used to improve Google products. Context caching is planned in future.
Gemini 1.5 Pro Pricing
1.5 Pro is also available with Free and Pay-as-you-go tiers.
Free Tier:
- Cost: Free of charge for input and output
- Additional notes: Prompts and responses may be used to improve Google products. Context caching is not available.
Pay-as-you-go Tier:
-
Cost:
- $3.50 per 1 million input tokens for prompts up to 128K tokens
- $7.00 per 1 million input tokens for prompts longer than 128K tokens
- $10.50 per 1 million output tokens for prompts up to 128K tokens
- $21.00 per 1 million output tokens for prompts longer than 128K tokens
- Notes: Prompts and responses are not used to improve Google products. Context caching planned, and will cost $1.75 per 1 million tokens for prompts under 128K, $3.50 for prompts over 128K tokens, and $4.50 / million tokens / hour for storage.
Quick Tutorial: Getting Started with Google Gemini Pro API {#quick-tutorial-getting-started-with-google-gemini-pro-api}
Prerequisites
To integrate Google Gemini Pro with your applications, you need to set up your development environment. You can run the setup in Google Colab, which allows you to execute the notebook directly in your browser without additional configuration.
Alternatively, you can set up your local environment to meet the following requirements:
- Python 3.9+
- Jupyter installation for running the notebook
Once you have the basic requirements, install the Python SDK for the Gemini API, which is included in the google-generativeai
package. Use the following pip command:
pip install -q -U google-generativeai
Importing Packages
Next, import the necessary packages for your project:
import pathlib
import textwrap
import google.generativeai as genai
from IPython.display import display, Markdown
Setting Up Your API Key
Before using the Gemini API, obtain an API key from Google AI Studio. In Google Colab, add the key to the secrets manager and name it: GOOGLE_API_KEY
Pass this key to the SDK using one of the following methods:
from google.colab import userdata
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)
Listing Available Models
With the API key set up, you can now list the available Gemini models using the list_models
method:
for model in genai.list_models():
if 'generateContent' in model.supported_generation_methods:
print(model.name)
Generating Text Responses
To generate text responses from text inputs, use this model: gemini-pro
Create an instance of the GenerativeModel
class and call the generate_content
method with your prompt:
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("What is generative AI?")
A simple way to handle the response is converting it to Markdown:
to_markdown(response.text)
The Gemini API supports many other options, including multi-turn chat and multimodal input, depending on the model’s capabilities. The Gemini models support text and images as input, with text as the output. For more details, see the official documentation.
Building LLM Applications with Gemini Pro and Acorn
Visit https://gptscript.ai to download GPTScript and start building today. As we expand on the capabilities with GPTScript, we are also expanding our list of tools. With these tools, you can create any application imaginable: check out tools.gptscript.ai to get started.
See Additional Guides on Key Machine Learning Topics
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of machine learning.
Advanced Threat Protection
Authored by Cynet
- dvanced Threat Protection: A Real-Time Threat Killer Machine
- Advanced Threat Detection: Catch & Eliminate Sneak Attacks
- What is Network Analytics? From Detection to Active Prevention
Multi GPU
Authored by Run.AI
- Multi GPU: An In-Depth Look
- Keras Multi GPU: A Practical Guide
- How to Build Your GPU Cluster: Process and Hardware Options
LLM Application Development
Authored by Acorn