Testing GTP3.5 vs. GPT4: Which Model Writes Better Code?

In this article, we compare outputs from GPT-3.5_turbo and GPT-4, and explore how to use GPT-4 as a code assistant, using a simple CLI termGPT to access the models.

Thomas Capelle

Created on March 16|Last edited on June 30

Comment

This article will compare model outputs from gpt3.5_turbo and gpt4 using the OpenAI API. We will refer to gpt3.5_turbo as GPT-3 from now on. I'm using my own simple CLI termGPT to access the models.
termGPT: Interacting with openAI's chatGPT on your terminal
Let's build a minimal app to interact with chatGPT without leaving the terminal
OpenAI Announces GPT-4
Today, OpenAI has officially announced GPT-4, the long-awaited fourth generation of GPT natural language processing models.
﻿
Actually. That introduction could be better. Let's ask GPT3.5 and GPT4 to write it instead. Here's our prompt:
Create a fun intro to a blog post comparing GPT4 with GPT3 on code generation tasks
💡
Introduction by GPT-4 🎉🚀 Welcome, Code Enthusiasts, to another thrilling adventure in the realm of cutting-edge AI technologies! Today, we dive into an epic showdown between two colossal contenders: GPT-3 versus GPT-4! Pull up a chair, grab your favorite beverage, and prepare for an electrifying exploration into code generation prowess. Will GPT-3 maintain its crown, or will GPT-4 rise to the occasion and claim the throne? Join us as we put their programming prowess to the ultimate test and discover who shall emerge as the true Generator of Generators! 💻🤖✨
Introduction by GPT-3.5Welcome to our latest blog post! Get ready to discover the latest buzz in the AI world as we compare two of the most powerful language models - GPT-3 and GPT-4 - on their code generation skills. Brace yourself for an exciting, head-to-head competition that will amaze you with the astonishing capabilities of these two state-of-the-art models. Sit back, relax, and let the comparison begin!
Both are good, but... I like the emojis, so we'll give the win to GPT-4 here🥇
A Day With an MLE Using ChatGPT for Code CompletionToday, I am using GPT-3 and 4 to help me to write code. Some tasks are mundane and boring; others are more complex and require a deep understanding of the language. We'll compare outputs side by side: 
﻿
﻿
GPT Knows Weights & Biases﻿
﻿
What Happens With Brand New Libraries Like PyTorch 2.0? GPT was trained on data up to 2021, so it doesn't know about PyTorch 2.0, which was actually released on the same day as GPT-4! We are only going to give the model the release page (it has a lot of information), but it will need to rely on the knowledge of PyTorch to get the actual code. 
Let's dump the launch (Ctrl+A, Ctrl+C, Ctrl+V) page from PyTorch 2.0 and see if it can integrate some of it:
﻿
﻿
You can also use it to create a new shiny training loop!
﻿
﻿
﻿
ConclusionsGPT-4 is an excellent step up compared to GPT-3.5 (old chatGPT) in code generation. It immediately produces better code, with better explanations around the decisions and a higher chance of correctness. I hope this model gets added to Copilot soon, as it's a good pair programmer 🤣.
I noticed that queries to GPT-4 are slower, sometimes taking almost a minute. For instance, the PyTorch 2.0 query was very slow, but maybe the servers were overwhelmed this week. I plan to benchmark the inference speed/latency in a follow report.
﻿

Add a comment

Tags: Articles, GPT, Experiment, GenAI, NLP, Intermediate

Iterate on AI agents and models faster. Try Weights & Biases today.