o1 coding capabilities
I review the performance of OpenAI's new model and demonstrate how it helped me develop a web application without any code changes
Last updated
I review the performance of OpenAI's new model and demonstrate how it helped me develop a web application without any code changes
Last updated
Open AI just announced their new model, GPT o1.
It is the advancement in large language models (LLMs), specifically designed to enhance reasoning capabilities. It addresses the challenge of enabling AI systems to perform complex reasoning tasks more effectively.
Advanced Training Techniques: o1 uses new training methodologies that focus on reasoning.
Chain-of-Thought Reasoning: Allowing the model to generate intermediate reasoning steps before arriving at a final answer.
Reinforcement Learning from Human Feedback (RLHF): The model is fine-tuned using feedback from human evaluators.
Dynamic Memory Mechanism: o1 incorporates a mechanism to remember and reference previous information within a conversation.
OpenAI have stated that it performs similar to PHD students:
Comparison with GPT4:
It clearly better at tasks that require reasoning, but for less reasoning such as writing, I GPT4 performs similar and quicker.
I wanted to test this out with a platform I want to use as a wider project. To test this out, I wanted it to produce a solution that required multiple different technologies that communicated with each other. For example, JS for the front end, Python for the backend, and a library to display a neo4j graph on a web page. This would have taken me a while to do... so let's see.
Since my web application will be quite specific, I wanted to be clear with my prompt. I asked GPT4 to generate a prompt template:
I amended this and asked o1. It took about 15 seconds to come up with comprehensive documentation:
It was able to produce code, project structure, and Dockerised environment for this application. I liked how it was able to connect the front end (React) and backend (Python / Flask). It selected react-force-graph-2d as the JS library to display the returned data. It even included rate limiting and input sanitisation. The only changes I had to make were the python library versions in requirements.txt and editing the .env variables with my neo4j credentials.
I amended the graph parameters a bit to make it more readable.
We can see how it's "thought" about the response:
I wanted to try and create a problem where o1 would have to "think" a little more.
I want o1 to create a script to solve this and for o1 to give me the answer.
Here is the response:
I would have expected python script is good, as it's just implementing a path finding algorithm that should work with any solution, but the path it proposed is wrong. It's unclear how it got to this answer. I tried this in GPT 4o which returned a different, but wrong result.
The table of contents...