Introduction
What if you could automate any task simply by saying what you want to achieve?
You’ve probably already heard about ChatGPT - the system from OpenAI that can write and debug software, generate essays, write poetry and more. You can give it tasks in plain English (“please write a haiku on why AI is not a threat to humanity”). Some tasks are one-shot - you get back exactly what you need after one interaction. Most often in my experience though it take a few attempts. The generated code is not quite right, or the answer doesn’t quite make sense to you. Sometimes you need to give ChatGPT some more background knowledge because it can’t search the internet by itself.
What if you could automate all that?
What if you could give ChatGPT a goal and it would work out what it needed to do to achieve that goal and then do it. Without you needing to do anything else?
That’s the promise of Auto-GPT systems.
What is Auto-GPT?
I’ll talk more about how it works later, but the core idea is actually pretty simple: You give the system a role and some goals and it attempts to work out how to achieve those goals for you. Now you might wonder “How is this different from ChatGPT?”. It’s true that you can tell ChatGPT what you want and it will attempt to do it for you. However, it’s limited to the knowledge it was trained with and all it can do is provide you with a text output in a web browser. If it’s not quite right or it needs more input you need to provide that. Auto GPT provides a way for GPT4 to search the Internet, pull information it needs from web pages and write files to your local disk. What’s more it attempts to work out when it’s achieved a goal and continues until that’s done (or you run out of either patience or money).
Auto-GPT create a writing career for me
There are some great demos showing what can be acheived such as market research for waterproof shoes but I’ll show some of my own experience.
The first thing you need to go is give the AI agent a name and a role. You can then give it up to five goals. Here’s what I tried:
Name: WritingCoach
Role: You are an expert writing teacher with a long history of helping aspiring writers achieve their goals and reach financial independence through writing articles on medium.com
Goals:
- Create a plan for a software engineer to become a successful writer about AI
- The plan must not require that the software engineer devotes more than on hour per day to writing
- The plan must allow the software engineer to transition from a full-time job as a software engineer to a full-time job as a writer
- The plan must allow the software engineer to make at least $100,000 per year from writing
- It must be possible to complete the plan in five years or less
Hey, I figured why not be ambitious!
This is what the system responded:
1 | THOUGHTS: I think the first step is to research the current market for AI writing and identify the most lucrative niches. This will help us create a plan that is tailored to the software engineer's strengths and interests. I can use a Google search to find this information. |
That seems fairly sensible. Notice how the response is organised into several sections. This is how Auto GPT is able to process then response from GPT.
Next it asks my approval to execute a command. You can run Auto-GPT so that it doesn’t prompt for approval for every command, but I’m not that trusting!
1 | NEXT ACTION: COMMAND = google ARGUMENTS = {'input': 'most lucrative niches for AI writing'} |
Based on the results of its google search Auto-GPT comes up with the next stage of its plan:
1 | THOUGHTS: Now that we have identified some of the most lucrative niches for AI writing, we need to determine which ones align with the software engineer's interests and expertise. We can use a GPT agent to help us with this task. |
Now Auto-GPT wants to spawn a new agent. Agents are a way for Auto-GPT to pursue several threads of investigation or action at once. Each agent has its own context and so does not have to remember the global state and won’t be confused by information unrelated to its task.
1 | NEXT ACTION: COMMAND = start_agent ARGUMENTS = {'name': 'AI Writing Niches', 'task': "Determine which AI writing niches align with the software engineer's interests and expertise", 'prompt': "Which AI writing niches align with the software engineer's interests and expertise?"} |
This triggers another round of thinking:
1 | THOUGHTS: The GPT agent has provided us with some initial niches that may align with the software engineer's interests and expertise. We can now use a Google search to research these niches further and determine their profitability and demand. |
This goes on for a couple more rounds until it decides on the following topic areas:
- Content creation for e-commerce websites
- Social media content creation and management
- SEO content creation and optimization
- Email marketing campaigns and newsletters
- Product descriptions and reviews
- Technical writing for software and technology companies
- Academic writing and research papers
- Creative writing for fiction and non-fiction books
- Copywriting for advertising and marketing agencies
- Legal writing and contract drafting
Unfortunately, these don’t seem to have a lot to do with writing about AI. It’s clear by this point that it’s unlikely to do what I want, but I was curious how far it will go so I let it continue for a bit.
1 | THOUGHTS: Based on the Google search results, it seems that technical writing for software and technology companies is a promising niche for the software engineer to write about. We should now create a plan for the software engineer to establish themselves as an expert in this niche and start earning income from their writing. |
So that’s quite neat. It decided that a writer needed a portfolio and it was going to go ahead and generate one for me. It also decided that it was going to use GitHub to host the portfolio and attempted to create a GitHub account for me (which failed BTW).
Note that in my original goals I said nothing about GitHub or wanting a portfolio.
After watching it flail around trying to create a GitHub account for a while I stopped it.
At the time I tried the above I didn’t have access to GPT4 and so had to use GPT3.5. When I got access to GPT4 I tried again. It didn’t succeed this time either, but it came up with a better list of topics:
- Machine Learning
- Natural Language Processing
- Computer Vision
- Robotics
- Expert Systems
- Speech Recognition
- Neural Networks
- Deep Learning
- Reinforcement Learning
- AI Ethics
So, are we all obsolete yet?
I tried another couple of scenarios with varying levels of success. In one instance I asked Auto-GPT to create a cross-compilation toolchain for me using crosstool-ng. It got as far as cloning the crosstool-ng repo and attempting to configure it.
I also asked it to create a mobile app for me using Flutter. In one attempt it generated python code to run a shell process to create a flutter project, but then failed to execute it correctly.
1 | import subprocess |
It then seemed to get stuck in a loop attempting to execute the file, failing, but then trying again.
I restarted Auto-GPT and this time it attempted to execute the flutter command as a python file which also (unsurprisingly) failed. Once again it wasn’t able to recover from this.
I decided to try one more time, but this time attempt to guide Auto-GPT to get past the step it was failing at before. While doing this I realised that my original prompt stated that flutter was already installed (which it was). But Auto-GPT runs python code in a docker image and so couldn’t access the local installation. This time Auto-GPT browsed the flutter website to find out how to install flutter but then crashed as it exceeded the GPT context length.
Auto-GPT is currently hard-coded to run the python:3-alpine
image. I found a docker image with the flutter SDK, but it didn’t contain python so at that point I decided to call it a day.
It’s possible I’m terrible at crafting goals for Auto-GPT. It does get tantalisingly close to doing useful stuff and I plan to continue experimenting with it, but I think my job is safe for this year.
How does it work?
We want to get from a single command and response to a chain of thoughts and actions that be used to achieve a goal. Chain of thought prompting has been shown to get better results in some cases by asking the LLM to show intermediate steps to get to an answer. ReAct extends this byinstructing the LLM to interleave reasoning and task specific actions and teaching it how to request that those actions are performed. If the LLM chooses an action, the software takes that action and feeds the results back to the LLM.
Auto-GPT uses a prompt that instructs GPT to reply using a fixed format and that it may use one of a number of built-in commands.
Response contents
1 | self.response_format = { |
Code to generate the prompt with the JSON formatted response:
1 | def generate_prompt_string(self) -> str: |
These commands include:
- searching using google
- creating agents
- reading from and writing to files
- opening a web browser and extracting the contents of a web page
- sending a tweet via twitter
- generating an image using Dall-e
- cloning a GitHub repo
- running python code (in a docker container)
Auto-GPT works like this:
- sends the current state to ChatGPT API
- receives the response in a JSON format it knows how to parse
- executes the command (if any) in the response
- feeds back the result of the command, the next time around the loop
It keeps looping like this until it decides the goals have been achieved, at which point it stops.
There is more to it than this. For example Auto-GPT can integrate with vector databases like Pinecone and use that as a memory.
There can never be only one
Although Auto-GPT was the first project I came across there are others, including
Jarvis is particularly interesting as it can make use of other AI models from the Hugging Face library. Jarvis selects models to use for each task, executes the task using a model and then feeds the results back into the next round. In other words it expands on Auto-GPT by using other AI models, not just simple commands, to achieve a goal.
Concerns
As with any piece of software there are risks. Here’s a few of the more obvious ones.
You are giving something control over your local computer - this is true of all software to some degree. Instead of software that performs a known task you have a large language model that could decide to run something damaging. The fixed set of built-in commands means that it shouldn’t be possible, for Auto-GPT to delete all your files, for example. However, anything that can write to your local disk could potentially scribble all over something you care about.
It could rack up large bills - the OpenAI API is not free and Auto-GPT does appear to sometimes get stuck in loops. There is the potential for it to rack up large bills if you let it make large numbers of API calls. You can protect yourself by configuring a budget on the OpenAI platform and/or when you create a new Auto-GPT agent.
Conclusions
Has Auto-GPT made developers obsolete? Not yet, but it is an interesting and logical development of GPT technology. I have not yet achieved an AI powered life of leisure.
Poking at Auto-GPT also made me aware of vector databases and so has already helped to expand my knowledge.
There is a lot of interesting work happening in this space. Be sceptical of the hype, but keep open-minded.