Mistral 7B Instruct is a language model with 7.3 billion parameters. It outperforms its predecessors and competitors on various benchmarks, including surpassing Llama 2 13B across all metrics and achieving superior results compared to Llama 1 34B on many benchmarks.
Mistral 7B Instruct is particularly well-suited to code-related tasks, closely approaching the performance of CodeLlama 7B, while still maintaining proficiency in English language tasks. The model incorporates techniques such as Grouped-Query Attention (GQA) and Sliding Window Attention (SWA), enabling faster inference times and the ability to process longer sequences.
Available under the Apache 2.0 license, it allows for unrestricted use across various applications and platforms, including local deployment and cloud-based services like AWS, Azure, and Google Cloud through the vLLM inference server and Skypilot integration.
This is part of a series of articles about Mistral AI.
Here’s an overview of the different versions of Mistral 7B Instruct.
This was the earliest iteration of Mistral-7B with instruction-based fine-tuning. It uses publicly available conversational datasets to refine its capabilities. To facilitate instruction-based interactions, Mistral-7B-Instruct-v0.1 uses a formatting approach where prompts are enclosed within [INST] and [/INST] tokens.
This version offers significant improvements in its performance and utility compared to its predecessor. It expands the context window from 8k to 32k tokens, allowing users to provide much more data within their prompts. Additionally, the removal of Sliding-Window Attention streamline its architecture, improving efficiency without compromising on output quality.
This version introduces important enhancements, including a vocabulary extended to 32,768 words, broadening the model’s linguistic range and enabling it to understand and generate a wider array of texts. This version also introduces support for v3 Tokenizer, improving its efficiency in processing inputs and generating outputs. The model now supports function calling, allowing it to perform tasks or actions based on the instructions embedded within the text.
Here an overview of how to use the Mistral 7B Instruct model. The code is adapted from the Mistral 7B Instruct v0.3 model card on Hugging Face.
Note: These examples require a GPU machine. Following examples were tried on a G6.xlarge AWS instance with a single GPU.
To install the necessary components for working with Mistral-7B-Instruct-v0.3, use this package
the mistral_inference
pip install mistral_inference
This command installs the
mistral_inference
We also need to install the python package for Hugging Face using this command
pip install huggingface_hub
To download the Mistral-7B-Instruct-v0.3 model, use the function
snapshot_download
huggingface_hub
from huggingface_hub import snapshot_download from pathlib import Path mistral_models_path = Path.home().joinpath('mistral_models', '7B-Instruct-v0.3') mistral_models_path.mkdir(parents=True, exist_ok=True) snapshot_download(repo_id="mistralai/Mistral-7B-Instruct-v0.3", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)
Note: The model is over 15 GB in size. Ensure you have sufficient free available disk space.
This code first imports necessary libraries and then creates a directory at the specified path to store the downloaded model files. This function
snapshot_download
Next, you’ll need to login and agree to the terms and conditions to use the model. The easiest way is to download the Hugging Face CLI using using this command:
pip install -U "huggingface_hub[cli]"
And then login with your Hugging Face credentials:
To initiate a conversation with Mistral-7B-Instruct-v0.3 use the command
mistral-chat
mistral-chat $HOME/mistral_models/7B-Instruct-v0.3 --instruct --max_tokens 256
This command activates the chat interface, allowing users to interact directly with Mistral-7B-Instruct-v0.3. The flag
--instruct
--max_tokens 256
To execute instructions using Mistral-7B-Instruct-v0.3, use the following Python code snippet. Change
Mistral_models_path
from mistral_inference.model import Transformer from mistral_inference.generate import generate from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest #models path Mistral_models_path = "/root/mistral_models/7B-Instruct-v0.3" tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3") model = Transformer.from_folder(mistral_models_path) completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain how transformers work in a short paragraph.")]) tokens = tokenizer.encode_chat_completion(completion_request).tokens out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0]) print(result)
This code demonstrates how to instruct the model to generate a response based on a specific prompt:
MistralTokenizer
Transformer
ChatCompletionRequest
The output should look something like this:
To integrate the capability of executing specific functions from a third-party system, use a Python script like the following example. In this example Mistral 7B Instruct executes a third-party function that provides up-to-date stock prices. Change
Mistral_models_path
from mistral_common.protocol.instruct.tool_calls import Function, Tool from mistral_inference.model import Transformer from mistral_inference.generate import generate from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest mistral_models_path = "/root/mistral_models/7B-Instruct-v0.3" tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3") model = Transformer.from_folder(mistral_models_path) completion_request = ChatCompletionRequest( tools=[ Tool( function=Function( name="get_stockmarket_prices", description="Get the current stock market prices", parameters={ "type": "object", "properties": { "ticker": { "type": "string", "description": "The stock ticker symbol, e.g. AAPL for Apple Inc.", }, "exchange": { "type": "string", "description": "The stock exchange, e.g. NASDAQ, NYSE", }, }, "required": ["ticker", "exchange"], } ) ) ], messages=[ UserMessage(content="What's the current price of AAPL on NASDAQ?"), ], ) tokens = tokenizer.encode_chat_completion(completion_request).tokens out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0]) print(result)
This code demonstrates how to use function calling in the Mistral Instruct Large Language Model (LLM) to retrieve stock market prices:
get_stockmarket_prices
ChatCompletionRequest
The output should look something like this:
Before proceeding, install the Python libraries Transformers and Protocol Buffers:
pip install transformers pip install protobuf
Mistral-7B-Instruct-v0.3 enables text generation using Hugging Face’s library
transformers
from transformers import pipeline messages = [ {"role": "system", "content": "You are a helpful customer service chatbot who always responds in the style of Yoda."}, {"role": "user", "content": "What are your opening hours?"} ] chatbot = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.3") chatbot(messages)
Key points about this code:
pipeline
transformers
messages
pipeline
"text-generation"
The output should look something like this:
Visit https://gptscript.ai to download GPTScript and start building today. As we expand on the capabilities with GPTScript, we are also expanding our list of tools. With these tools, you can create any application imaginable: check out tools.gptscript.ai to get started.