The second part of a detailed tutorial on how to build a generative application with image and text generation, leveraging GPTScript, OpenAI and Nuxt.
Text embeddings serve as the foundation for tasks like sentiment analysis, translation, and summarization. They also amplify the capabilities of Large Language Models (LLMs) such as GPT-4 and Llama, playing a central role in modern natural language processing.
Text embeddings are numerical representations of text that capture semantic meanings, which allows developers to use text embeddings to understand the nuances of human language as part of their LLM app pipelines.
Applications of Text Embeddings in LLMs
It's good practice to separate model serving from your other ML components like data processing and hyperparameter tuning. There are several tools available for this purpose, such as Seldon Core and Ray Serve. However, we've found that Hugging Face's text embedding inference toolkit is an ideal solution for deploying and serving open-source text embedding models.
Built for speed and convenience, the Text Embeddings Inference toolkit operates as a cloud-based solution on the Acorn SaaS platform. It enables rapid text embeddings generation, allowing you to either pick from a list of extensive pre-trained models found on Hugging Face or deploy your custom models.
Kickstart your text embedding journey by selecting how you deploy the inference server. There are 3 options
Using Acorn CLI
Run the following command (assuming you've already logged in using
acorn login acorn.io
acorn run index.docker.io/sanjay920/text-embeddings-inference:cpu --model_id MODEL_ID
Replace
MODEL_ID
intfloat/e5-small-v2
Using Acorn UI
Navigate to the Acorn SaaS Platform and sign in. Hit the "Create" button, select "From Acorn Image". Fill "Acorn Image" with
index.docker.io/sanjay920/text-embeddings-inference:cpu
Easy Deploy
The suggested solution because it's so easy. Just follow this link and click "Deploy". Input your desired model id if you'd like.
Making API calls to the server’s endpoints is a breeze. Use the following
curl
Regular Embeddings
# Replace URL with your specific Acorn URL curl YOUR_ACORN_URL/embed \ -X POST \ -d '{"inputs":"What is Deep Learning?"}' \ -H 'Content-Type: application/json'
Output:
The response will be a vector of floating-point numbers, each representing a specific feature of the input text. While displaying the entire output might be overwhelming, here's a truncated snippet to give you an idea:
[[-0.0598, 0.0169, 0.0311, ..., 0.0181, -0.0436, 0.0324]]
OpenAI Formatted Response
# Replace URL with your specific Acorn URL curl YOUR_ACORN_URL/openai \ -X POST \ -d '{"input":"What is Deep Learning?"}' \ -H 'Content-Type: application/json'
The response will be a json object
{"object":"list","data":[{"object":"embedding","embedding":[-0.0598, 0.0169, 0.0311, ..., 0.0181, -0.0436, 0.0324],"index":0}],"model":"intfloat/e5-small-v2","usage":{"prompt_tokens":7,"total_tokens":7}}
These vectors are essentially the "numerical essence" of your text and can be used in various applications like semantic search, text classification, and recommendation systems.
Whether it's capturing user intent in semantic search or enabling zero-shot learning in text classification, text embeddings are a transformative tool. They also significantly contribute to product recommendations, clustering of similar items, and retrieval-augmented generation techniques.
In summary, the Text Embeddings Inference toolkit is both user-friendly and powerful. Written by Hugging Face and powered by Acorn, it allows you to easily harness the power of text embeddings in the cloud.
Sanjay Nadhavajhala is an AI engineer at Acorn Labs.