Hackathons and Model Capabilities: What Fast Experiments Reveal
Hackathons and collaborative prompt exploration reveal a model's wide range of capabilities—from diagrams and spreadsheets to SVGs, STL files, 3D scenes, and mini apps—demonstrating practical ways to surface and showcase AI skills.
One of the ways we discovered new capabilities in models was to put them into a Slack and invite people to play with them. This was really fun—especially in the early days with the first Dolly model—because you could quickly see what the capabilities actually were. You’d watch some very imaginative people come up with prompts for everything from artwork styles to complex objects. And no matter how creative I think I can be, it’s no comparison to getting a group of really creative, imaginative people all pushing in different directions at the same time.
Another way we explored capabilities was through hackathons, and this was probably one of my favorite methods. Partly it was just fun to see what we could all come up with in a short period of time. Also, coming from a showbiz background, I loved any opportunity to go present things. And in those hackathons, exploring model capabilities was sometimes the first time we ever found out whether a model could do something unique.
I think I was the first one to discover that one of the GPT-3 models could use Mermaid—which is used for creating diagrams—and that was a pretty fun experience. Mermaid is a JavaScript library for generating flowcharts, and what was especially nice was that I could ask the model to generate a flowchart describing how an attention mechanism worked (or some other complex topic). It was one of the first times we saw these models generating something other than text that was visually complex.
For GPT-4, I decided to have it generate a bunch of ideas for apps. I put it into a loop where I had GPT-4 (writing code) continuously try to create whatever it could. I ended up with something like a hundred different silly little applications—everything from internet radio to small image-editing tools. None of them was going to set the world on fire, but it was really fun to have GPT-4 generate a huge number of apps and then show them off at the hackathon.
Some other discoveries that came out of hackathon exploration were around structured outputs. One was spreadsheet generation. The models knew how to work with comma-separated text, so you could get them to format and generate usable spreadsheets, which was pretty cool. It was another sign the models were going to be useful for things besides just linear text—similar to diagrams. The diagrams showed the model could understand that things could be above, below, to the side, and around.
Any time I see somebody do a Pelican test where they have the model generate an SVG, I think back to those first hackathons, because SVGs were some of the first things we tried. One of the things I thought was especially cool was getting the model to generate .stl files, which are 3D printing files. They started out very, very simple, but I took the first .stl file I ever got a GPT-3 model to generate, ran it through my printer, and produced it. It was just a block, but it was still pretty cool to have something that came from a text model turn into a 3D object in physical space.
And speaking of 3D, one of my favorite libraries was A-Frame, which is a version of 3.js used for creating virtual reality experiences. It only took a few lines of code to import and place objects, and it made for a really good demo: having a model generate 3D scenes. At first they weren’t terribly sophisticated, but with GPT-3.5—and the model’s ability to understand arrays of arrays—it could start doing things like creating mazes and building environments like cities.
Another hackathon demo I did was having the model play tic-tac-toe. The rules are simple, but the challenge—back when it was a purely language model—was getting it to plan ahead and make moves correctly. Part of it was like the Wordle problem I’ve described elsewhere: you had to figure out how to explain to the model how to understand, represent, and display locations for objects. We ended up finding that special formatting mattered a lot—making sure there were no empty spaces, and using dashes and pipes to structure the board so the model could “see” it reliably. It took a lot of experimentation with different encodings for a silly game that more capable models can handle pretty easily now. But that was part of the fun: trying to get these systems to do things that felt hard for them, even though the task itself was pretty easy, and getting a smaller system to finally do it was incredibly satisfying.