Unlocking the Power of Text Embeddings for AI with Acorn

Dec 5, 2023 by Sanjay Nadhavajhala

Text embeddings serve as the foundation for tasks like sentiment analysis, translation, and summarization. They also amplify the capabilities of Large Language Models (LLMs) such as GPT-4 and Llama, playing a central role in modern natural language processing.

Text embeddings are numerical representations of text that capture semantic meanings, which allows developers to use text embeddings to understand the nuances of human language as part of their LLM app pipelines.

Applications of Text Embeddings in LLMs

Semantic Search:
- Text embeddings represent the meaning and intent of a user's query and documents in a shared embedding space. This enables LLMs to facilitate semantic search, where documents similar to a user's query can be rapidly identified using vector search technology.
- LLMs can utilize these embeddings to build a semantic search engine that can find content relevant to a user's search term, even without exact keyword matches.
Text Classification:
- Text embeddings can be used for classifying text, allowing LLMs to understand various contexts without any additional training, a concept known as zero-shot learning.
- By calculating the embedding of a text and comparing it to pre-calculated “average” embeddings for different categories, text classification can be performed with high accuracy.
Recommendation Systems:
- In recommendation systems, text embeddings serve as a robust feature for training models like the Two-Tower model, thereby enabling semantic product recommendations.
Retrieval Augmented Generation:
- LLMs can implement retrieval-augmented generation by using text embeddings to match a user’s question with relevant documentation in a corpus and then generate a more accurate answer based on that.
Clustering and Anomaly Detection:
- Text embeddings can be employed for clustering nearby items and identifying patterns in a corpus of documents, as well as for anomaly detection and sentiment analysis.
Related Item Discovery:
- Embeddings allow LLMs to find related items by comparing the semantic similarity between their embeddings, which is useful for displaying related articles or products.

Easily Deploy a Text Embedding Inference Engine Using Acorn

It's good practice to separate model serving from your other ML components like data processing and hyperparameter tuning. There are several tools available for this purpose, such as Seldon Core and Ray Serve. However, we've found that Hugging Face's text embedding inference toolkit is an ideal solution for deploying and serving open-source text embedding models.

Built for speed and convenience, the Text Embeddings Inference toolkit operates as a cloud-based solution on the Acorn SaaS platform. It enables rapid text embeddings generation, allowing you to either pick from a list of extensive pre-trained models found on Hugging Face or deploy your custom models.

Deployment Options

Kickstart your text embedding journey by selecting how you deploy the inference server. There are 3 options

Using Acorn CLI

Run the following command (assuming you've already logged in using
acorn login acorn.io
):
```
acorn run index.docker.io/sanjay920/text-embeddings-inference:cpu --model_id MODEL_ID
```
Replace
MODEL_ID
with your chosen model identifier. Whether you prefer the default model
intfloat/e5-small-v2
or any other model found on Hugging Face or a model found in your local directory, the choice is yours. Alternatively, you don't need to specify this field and it will use the default model.
Using Acorn UI

Navigate to the Acorn SaaS Platform and sign in. Hit the "Create" button, select "From Acorn Image". Fill "Acorn Image" with
index.docker.io/sanjay920/text-embeddings-inference:cpu
. If you'd like to specify a model, toggle on "Advanced Options" and click "Next" - you will now be able to specify your desired model id. You should not modify "Port Bindings" and you can click "Create" at this point.
Easy Deploy

The suggested solution because it's so easy. Just follow this link and click "Deploy". Input your desired model id if you'd like.

Fetching Text Embeddings

Making API calls to the server’s endpoints is a breeze. Use the following

curl

commands to receive the embeddings in different formats:

Regular Embeddings

# Replace URL with your specific Acorn URL
curl YOUR_ACORN_URL/embed \
    -X POST \
    -d '{"inputs":"What is Deep Learning?"}' \
    -H 'Content-Type: application/json'

Output:

The response will be a vector of floating-point numbers, each representing a specific feature of the input text. While displaying the entire output might be overwhelming, here's a truncated snippet to give you an idea:

[[-0.0598, 0.0169, 0.0311, ..., 0.0181, -0.0436, 0.0324]]

OpenAI Formatted Response

# Replace URL with your specific Acorn URL
curl YOUR_ACORN_URL/openai \
    -X POST \
    -d '{"input":"What is Deep Learning?"}' \
    -H 'Content-Type: application/json'

The response will be a json object

{"object":"list","data":[{"object":"embedding","embedding":[-0.0598, 0.0169, 0.0311, ..., 0.0181, -0.0436, 0.0324],"index":0}],"model":"intfloat/e5-small-v2","usage":{"prompt_tokens":7,"total_tokens":7}}

What to Expect from the Output

These vectors are essentially the "numerical essence" of your text and can be used in various applications like semantic search, text classification, and recommendation systems.

Exploring Further: The Multifaceted Applications of Text Embeddings

Whether it's capturing user intent in semantic search or enabling zero-shot learning in text classification, text embeddings are a transformative tool. They also significantly contribute to product recommendations, clustering of similar items, and retrieval-augmented generation techniques.

In summary, the Text Embeddings Inference toolkit is both user-friendly and powerful. Written by Hugging Face and powered by Acorn, it allows you to easily harness the power of text embeddings in the cloud.

Sanjay Nadhavajhala is an AI engineer at Acorn Labs.

See all articles

The second part of a detailed tutorial on how to build a generative application with image and text generation, leveraging GPTScript, OpenAI and Nuxt.