Building an LLM-powered cron job to monitor Craigslist using GPTScript and Render

Mar 27, 2024 by Grant Linville
Building an LLM-powered cron job to monitor Craigslist using GPTScript and Render

If you are not familiar with it already, GPTScript is a new scripting language to automate interactions with large language models, such as OpenAI's GPT-4. With one simple script, you can give the LLM the ability to perform tasks on your computer by giving it the ability to use tools. GPTScript is also useful on servers and in containers, and can perform advanced tasks without requiring programming. In this tutorial, I will walk you through the process of setting up a containerized GPTScript and running it as a cron job in Render.

The job will check Craigslist for new Toyota 4Runners listed for sale in the Phoenix, Arizona area. It will send an email with links and information about any new vehicles that it has not already sent an email about. The goal is to accomplish this without having to write any code, other than the GPTScript and a Dockerfile.

Prerequisites

There are a few things you will need in order to complete this guide:

  • An OpenAI account, with your API key set to the environment variable OPENAI_API_KEY
  • A Render account
  • A SendGrid account
  • Docker
  • The ability to push container images to a container registry, such as Docker Hub

Creating the script

There are two main tasks that our GPTScript needs to accomplish: getting the vehicle information from Craigslist, and sending the email. Let's start with the Craigslist portion.

The built-in tool "sys.http.html2text" is useful in circumstances where we want the LLM to fetch a webpage and process its contents in some way. Let's start with this:

tools: sys.http.html2text Visit https://phoenix.craigslist.org/search/cta?postedToday=1&query=4runner&sort=date to get information about the 4Runners listed today.

Save that to a file called "4runner.gpt". Now let's run it and see if it works:

gptscript 4runner.gpt

This was my output:

There are two 4Runners listed today in Phoenix, AZ on Craigslist: 1. **2017 Toyota 4Runner TRD Off Road 4x4** - Price: $39,750 - Location: I-17 & 101 - [Link to listing](https://phoenix.craigslist.org/nph/cto/d/phoenix-2017-toyota-4runner-trd-off/7729111421.html) 2. **2011 Toyota 4Runner SR5** - Price: $21,500 - Location: Mesa - [Link to listing](https://phoenix.craigslist.org/evl/ctd/d/mesa-2011-toyota-4runner-sr5/7729001129.html)

(Your output will look different, depending on the day, and how many 4Runners were listed.)

This looks good. But we need a way of keeping track of which vehicles we have already sent an email about, so that we do not include them in future emails. We will need to use a database for this. Render offers PostgreSQL databases that are easy to set up, so we will go with that.

Let's add some more instructions to the script:

tools: sys.http.html2text, sys.exec? Visit https://phoenix.craigslist.org/search/cta?postedToday=1&query=4runner&sort=date to get information about the 4Runners listed today. Check the PostgreSQL database and see if any of these vehicles have already been stored. If they have been, ignore them. The `psql` command is available. The database connection string is in the environment variable PGURL. The table is called "vehicles". If the table doesn't exist yet, create it. The only thing that needs to be stored in the table is the URL for each vehicle. Don't add any other fields. For each vehicle that was not already in the database, store the information about it into the database.

Here, we are telling the LLM about our database. It knows that it can find the connection string in the PGURL environment variable, and that it has the psql command available to interact with the database. We have also instructed it to only store the URL for each car, so it will create a very simple schema for the database table.

We also gave it access to the sys.exec tool so that it can run commands in the container. The "?" indicates that the script should continue to run even if the sys.exec tool returns an error. This is necessary in case the LLM tries to add an entry to the vehicles table, only to discover that it does not exist and needs to create it. It might get an error back the first time it tries something, but it will be able to recover from that state.

We aren't done with the script yet, since we have not given it the ability to send emails, but we will handle that part later. Next, we are going to set up the database in Render.

Creating the Database

Inside of your Render dashboard, create a new project called "4Runner". Then, select "Create new service" and then "New PostgreSQL". Set the Name to `4runner", the Database to "vehicles", the User to "user", and select whichever region you prefer. Leave PostgreSQL Version set to 16. Your configuration should look similar to this:

Render New PostgreSQL

Scroll down a bit and select the free tier:

Render PostgreSQL Free Tier

Then, click the "Create Database" button at the bottom of the page. The database may take a couple minutes to create. Once it is ready, you will see some URLs when you scroll down to the Connections section:

Render Database Connections

Click the copy button on the "External Database URL". Save this to the environment variable PGURL in your terminal. Now, we are ready to create the container and run it locally.

Creating the Dockerfile

Here is the Dockerfile we are going to use:

FROM alpine:latest COPY 4runner.gpt ./4runner.gpt RUN apk update && apk add curl postgresql16-client --no-cache && curl https://get.gptscript.ai/install.sh | sh && mkdir /.cache && chmod 777 /.cache CMD ["gptscript", "--cache=false", "4runner.gpt"]

This image is based on Alpine Linux, which is commonly used in containers. We use Alpine's package manager, apk, to install curl and the PostgreSQL 16 client (the psql command). We then use curl to install GPTScript via the install script. We also set up the .cache directory and modify its permissions so that GPTScript is able to use it without getting any permissions errors.

Next, let's build and run the image locally.

docker build -t 4runner . docker run --rm -e PGURL=$PGURL -e OPENAI_API_KEY=$OPENAI_API_KEY 4runner

This was my output:

All the 4Runners listed today have been successfully stored in the database.

Let's run the container again, but this time create a shell in it so that we can manually verify that the URLs were added to the database:

docker run --rm -it --entrypoint sh -e PGURL=$PGURL -e OPENAI_API_KEY=$OPENAI_API_KEY 4runner

Once you are in the container, run:

psql $PGURL -c "SELECT * FROM vehicles;"

This was my output:

url ---------------------------------------------------------------------------------------------- https://phoenix.craigslist.org/nph/cto/d/phoenix-2017-toyota-4runner-trd-off/7729111421.html https://phoenix.craigslist.org/evl/ctd/d/mesa-2011-toyota-4runner-sr5/7729001129.html (2 rows)

Looks good! Now, let's delete these so that we can try it out with the email later:

psql $PGURL -c "DROP TABLE vehicles;"

Now, run "exit" to leave the container. Next, we will set up SendGrid so that we can get an API key and start sending emails.

Setting up SendGrid

Create an account at sendgrid.com if you do not already have one. Once you are logged in, go to Settings -> Tracking in the sidebar.

SendGrid Tracking Settings

I disabled all of these settings on my account. Tweak them to your own preferences, but I recommend at least disabling Click Tracking so that the links the emails that get sent will look normal.

Once you are done, go to Settings -> API Keys, also in the sidebar. Click "Create API Key" in the top right corner. Name your key "4runner" and give it Restricted Access. Leave each setting at No Access, except for Mail Send, which should get Full Access. It should look like this:

SendGrid API Key Settings

Click "Create & View" to get your API key. Save it to the environment variable SENDGRID_API_KEY.

The last thing we need to do in SendGrid is set up sender authentication. Select Settings -> Sender Authentication in the sidebar. You can set up an entire domain if you would like, but that is not necessary for this example, as you can just use your personal email. Click on "Verify Single Sender" and go through all the menus and prompts to get it set up. It will send you an email to verify your request. Once you are done, it is time to finish the GPTScript.

Finishing the script

Next, we will add email capabilities to the script. This is our final version of the script:

tools: send_email, sys.http.html2text, sys.exec? Visit https://phoenix.craigslist.org/search/cta?postedToday=1&query=4runner&sort=date to get information about the 4Runners listed today. Check the PostgreSQL database and see if any of these vehicles have already been stored. If they have been, ignore them. The `psql` command is available. The database connection string is in the environment variable PGURL. The table is called "vehicles". If the table doesn't exist yet, create it. The only thing that needs to be stored in the table is the URL for each vehicle. Don't add any other fields. For each vehicle that was not already in the database, store the information about it into the database, and send an email with information about those vehicles. If there are no new vehicles, then don't send an email. --- name: send_email description: sends an email with the provided contents args: contents: the contents of the email to send tools: sys.http.html2text, sys.exec? IMPORTANT: when setting --header or -H on a cURL command, always use double quotes, never single quotes! IMPORTANT: when setting --data or -d on a cURL command, always use single quotes, never double quotes! And always escape newlines! (They should look like "\\n") The SendGrid API key is in the environment variable SENDGRID_API_KEY. Perform the following actions in this order: 1. View the contents of https://docs.sendgrid.com/for-developers/sending-email/api-getting-started to learn about the SendGrid API. 2. Run a cURL command to send an email using the SendGrid API, with the following information: to: <email address> (name: <name>) from: <email address> (name: <name>) reply to: <email address> (name: <name>) subject: "4Runner Listings" content: $contents

IMPORTANT: be sure to fill in all instances of "" and "" with your actual email address and name.

We added a new tool called send_email that will read the docs for the SendGrid API and then run a cURL command to send an email about the vehicles. (Due to various issues around quotes and environment variables, I had to add those "IMPORTANT" lines to tell the LLM how to not mess up the commands it creates.)

We also slightly modified the prompt for the first tool to tell it to send the email, and we gave it access to the new send_email tool.

With that, the script is finished. Let's build and push the container image up to a container registry. Make sure it is built for the linux/amd64 platform. I used my personal Docker Hub for this.

docker buildx build --push --platform linux/amd64 -t grantlinville/4runner .

Now the last thing to do is set up and run the cron job in Render.

Setting up the cron job

Go back to your 4runner project's dashboard in Render. Click on the + icon, and then "Create New Service".

Render Create New Service

Click "New Cron Job", and then select "Deploy an existing image from a registry" and click "Next".

Render New CronJob

In the Image URL field, type in the URL of your container image on whichever registry you uploaded it to. For me, it is "docker.io/grantlinville/4runner". Then click "Next".

Name your cron job "4runner" and select whichever region you would like. For the schedule, I recommend setting it to "0 */12 * * *" so that it runs every 12 hours. Set the Instance Type to Starter. Then, add three environment variables. The first is SENDGRID_API_KEY with the value set to the key you got earlier. The second is PGURL, but this time, set to the Internal Database URL, instead of the External Database URL from the PostgreSQL database we created earlier. The third is OPENAI_API_KEY, with your key.

Render Internal Database URL

Your cron job settings should look like this:

Render Cron Job

Click "Create Cron Job". The job will not run right away, so in the top right, click the "Trigger Run" button. Once it is done running, you should see that it succeeded:

Render Cron Job Succeeded

Let's check the logs:

Render Cron Job Logs

It says the email was sent successfully. I found the email in my junk/spam folder. This is probably because it is coming from my own email address, which seems a bit suspicious, but that is how I configured SendGrid since it was the easiest way. As you can see, there are links to three cars in the email:

4Runner Email

It looks like a new one was posted between when I ran the script myself at the beginning of this tutorial, and now at the end.

To prove that the job will not send an email for cars that it has already notified about, trigger the cron job again and wait for it to finish. The job succeeded again, but looking at the logs, it did not send an email since there were no new listings:

Render Cron Job Logs 2

That brings us to the end of this tutorial. This cron job demonstrates the power of GPT-4 with GPTScript to automate advanced tasks without requiring programming.

Grant Linville is a software engineer at Acorn Labs.