Google Gemini Pro: 8 Key Features, Models & API Tutorial

What Is Google Gemini Pro?

Google Gemini is a large language model (LLM) developed by Google. It is Google’s answer to popular, competing LLM technologies like OpenAI GPT-4 and Anthropic Claude. Gemini performs well on LLM benchmarks and incorporates novel technologies to improve computational speed, accuracy, and ability to process multi-modal inputs.

Google Gemini Pro is a full-scale version of the model, providing high performance on LLM benchmarks while offering improved computational efficiency. Google also offers Gemini Flash, a lightweight model for constrained environments, and has announced Gemini Ultra, a more advanced version of the model. This is part of an extensive series of guides about machine learning.

8 Key Features of the Gemini Pro Model Family

Here are the key features offered by Google Gemini Pro:

Multimodal processing: Gemini Pro supports text, images, audio, and video inputs, enabling it to handle various types of data seamlessly. This makes it suitable for tasks that require understanding and generating content across different media.
Scalable context window: With a context window of up to 1 million tokens, extendable to 2 million tokens, Gemini Pro can process large datasets. This capability is particularly useful for tasks involving long documents, extensive codebases, or lengthy audio and video files.
Mixture-of-Experts (MoE) architecture: The MoE architecture allows the model to activate the most relevant neural pathways for specific inputs, enhancing both efficiency and performance. This specialized processing ensures high accuracy and speed in handling complex tasks.
Advanced text generation and understanding: The model supports various text generation methods, including content creation, summarization, translation, and intelligent Q&A. It can produce high-quality written content, translate languages, and answer questions by reasoning over multimodal inputs.
Customizable AI: Users can create customized versions of the Gemini AI, known as Gems, tailored to specific tasks and preferences. This feature enhances flexibility and allows for specialized applications in different industries.
Function calling and JSON mode: Gemini Pro can produce structured output from unstructured data and supports enhanced function calling capabilities. This makes it easier to integrate with other systems and extract useful information from various data formats.
Integration with Google services: Gemini Pro integrates with Google Cloud services, including Vertex AI, and tools like Google Workspace. This allows for easy deployment and management of AI-driven applications across different platforms.
Enhanced safety and control: The model includes automatic safety features, adjustable by developers, ensuring that outputs are safe and appropriate for various use cases.

Google Gemini 1.5 Pro Architecture

The platform leverages a Mixture-of-Experts (MoE) architecture, which enhances performance by activating the most relevant neural network pathways based on the input type. This design improves efficiency and allows the model to process complex tasks with greater speed and accuracy.

The architecture is rooted in Google’s leading research on Transformer and MoE models, which involve dividing a large neural network into smaller "expert" networks. These experts specialize in different types of data, such as text, images, or code, and are selectively activated to handle specific inputs. This specialization enables the model to maintain high performance while being more resource-efficient during training and operation.

Gemini 1.5 Pro, the most current model in the Gemini Pro suite, features a standard 128,000 token context window, with capabilities to extend up to 1 million tokens for specific use cases. This large context window allows the model to handle extensive datasets, including lengthy text documents, large codebases, and long video or audio files.

Google Gemini Pro Models: Technical Specifications

Here are the technical details of models offered in the Gemini Pro family, as of the time of this writing. Rate limits of the models are expressed in these terms:

Requests per minute (RPM)
Tokens per minute (TPM)
Requests per day (RPD)
Tokens per day (TPD)

Gemini 1.5 Pro

Gemini 1.5 Pro is a mid-size multimodal model optimized for a range of reasoning tasks. It supports code and text generation, text editing, problem-solving, recommendations, information extraction, data generation, and the creation of AI agents. It can process large datasets, including extensive video, audio, and codebases.

Model details:

Inputs: Audio, images, and text
Output: Text
Input token limit: 1,048,576
Output token limit: 8,192
Maximum number of images per prompt: 3,600
Maximum video length: 1 hour
Maximum audio length: Approximately 9.5 hours
Maximum number of audio files per prompt: 1
Model safety: Automatically applied, adjustable by developers
System instructions: Supported
JSON mode: Supported

Rate limits:

Free:
- 2 RPM
- 32,000 TPM
- 50 RPD
- 6,080,000 TPD
Pay-as-you-go:
- 360 RPM
- 10 million TPM
- 10,000 RPD
- 14,400,000,000 TPD
Two million context:
- 1 RPM
- 2 million TPM
- 50 RPD

Gemini 1.0 Pro

Gemini 1.0 Pro is an LLM designed to handle tasks such as multi-turn text and code chat, as well as code generation. It supports zero-shot, one-shot, and few-shot learning, making it versatile for various applications.

Model details:

Input: Text
Output: Text
System instructions: Unsupported
JSON mode: Unsupported

Rate limits:

Free:
- 15 requests per minute (RPM)
- 32,000 tokens per minute (TPM)
- 1,500 requests per day (RPD)
- 46,080,000 tokens per day (TPD)
Pay-as-you-go:
- 360 RPM
- 120,000 TPM
- 30,000 RPD
- 172,800,000 TPD

Gemini 1.0 Pro Vision

Gemini 1.0 Pro Vision is a performance-optimized multimodal model capable of handling visual-related tasks. It can generate image descriptions, identify objects in images, and provide information about places or objects depicted in images. Similar to 1.0 Pro, it supports zero-shot, one-shot, and few-shot learning.

Model details:

Inputs: Text and images
Output: Text
Input token limit: 12,288
Output token limit: 4,096
Maximum image size: No limit
Maximum video length: 2 minutes
Maximum number of videos per prompt: 1
Model safety: Automatically applied, adjustable by developers

Rate limits:

60 RPM
16 images per prompt

Google Gemini Pro Consumer Pricing {#google-gemini-pro-consumer-pricing}

For end-users and organizations, Gemini is available in two versions:

Gemini (Free)

The standard version of Gemini is free, providing access to the 1.0 Pro model. It aids in writing, learning, and planning and is integrated with Google applications.

Gemini Advanced (Paid)

This version costs $19.99 per month. It uses 1.5 Pro, with a context window of 1 million tokens. It comes with 2 TB of Google One storage and can be integrated into Gmail and Google Docs. It can also run and edit Python code.

Google Gemini Pro API Pricing

Gemini 1.0 Pro Pricing

For users of the Gemini 1.0 Pro model, there are two pricing tiers available: Free and Pay-as-you-go. Please note rate limits listed in the model technical specifications above.

Free Tier:

Cost: Free of charge for input and output
Notes: Prompts and responses may be used to improve Google products. Context caching is not available.

Pay-as-you-go Tier:

Cost:
- $0.50 per 1 million input tokens
- $1.50 per 1 million output tokens
Notes: Prompts and responses are not used to improve Google products. Context caching is planned in future.

Gemini 1.5 Pro Pricing

1.5 Pro is also available with Free and Pay-as-you-go tiers.

Free Tier:

Cost: Free of charge for input and output
Additional notes: Prompts and responses may be used to improve Google products. Context caching is not available.

Pay-as-you-go Tier:

Cost:
- $3.50 per 1 million input tokens for prompts up to 128K tokens
- $7.00 per 1 million input tokens for prompts longer than 128K tokens
- $10.50 per 1 million output tokens for prompts up to 128K tokens
- $21.00 per 1 million output tokens for prompts longer than 128K tokens
Notes: Prompts and responses are not used to improve Google products. Context caching planned, and will cost $1.75 per 1 million tokens for prompts under 128K, $3.50 for prompts over 128K tokens, and $4.50 / million tokens / hour for storage.

Quick Tutorial: Getting Started with Google Gemini Pro API {#quick-tutorial-getting-started-with-google-gemini-pro-api}

Prerequisites

To integrate Google Gemini Pro with your applications, you need to set up your development environment. You can run the setup in Google Colab, which allows you to execute the notebook directly in your browser without additional configuration.

Alternatively, you can set up your local environment to meet the following requirements:

Python 3.9+
Jupyter installation for running the notebook

Once you have the basic requirements, install the Python SDK for the Gemini API, which is included in the google-generativeai package. Use the following pip command:

pip install -q -U google-generativeai

Importing Packages

Next, import the necessary packages for your project:

    import pathlib
    import textwrap
    import google.generativeai as genai
    from IPython.display import display, Markdown

Setting Up Your API Key

Before using the Gemini API, obtain an API key from Google AI Studio. In Google Colab, add the key to the secrets manager and name it: GOOGLE_API_KEY

Pass this key to the SDK using one of the following methods:

    from google.colab import userdata

    GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
    genai.configure(api_key=GOOGLE_API_KEY)

Listing Available Models

With the API key set up, you can now list the available Gemini models using the list_models method:

    for model in genai.list_models():
        if 'generateContent' in model.supported_generation_methods:
            print(model.name)

Generating Text Responses

To generate text responses from text inputs, use this model: gemini-pro
Create an instance of the GenerativeModel class and call the generate_content method with your prompt:

    model = genai.GenerativeModel('gemini-pro')
    response = model.generate_content("What is generative AI?")

A simple way to handle the response is converting it to Markdown:

to_markdown(response.text)

The Gemini API supports many other options, including multi-turn chat and multimodal input, depending on the model’s capabilities. The Gemini models support text and images as input, with text as the output. For more details, see the official documentation.

Building LLM Applications with Gemini Pro and Acorn

Visit https://gptscript.ai to download GPTScript and start building today. As we expand on the capabilities with GPTScript, we are also expanding our list of tools. With these tools, you can create any application imaginable: check out tools.gptscript.ai to get started.

See Additional Guides on Key Machine Learning Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of machine learning.

Google Gemini Pro: 8 Key Features, Models & Quick API Tutorial

June 6, 2024 by acorn labs

What Is Google Gemini Pro?

8 Key Features of the Gemini Pro Model Family

Google Gemini 1.5 Pro Architecture

Google Gemini Pro Models: Technical Specifications

Gemini 1.5 Pro

Gemini 1.0 Pro

Gemini 1.0 Pro Vision

Google Gemini Pro Consumer Pricing {#google-gemini-pro-consumer-pricing}

Gemini (Free)

Gemini Advanced (Paid)

Google Gemini Pro API Pricing

Gemini 1.0 Pro Pricing

Gemini 1.5 Pro Pricing

Quick Tutorial: Getting Started with Google Gemini Pro API {#quick-tutorial-getting-started-with-google-gemini-pro-api}

Prerequisites

Importing Packages

Setting Up Your API Key

Listing Available Models

Generating Text Responses

Building LLM Applications with Gemini Pro and Acorn

See Additional Guides on Key Machine Learning Topics

Advanced Threat Protection

Multi GPU

LLM Application Development

Related Articles