So I’ve been trying Gen AI out off-and-on, along with the rest of the world, I guess. This weekend, I thought I’d try creating a whole app from scratch and see how far I get. Spoiler: Chatpgt 3.5 did better than I expected, but still I didnt get very far. Read on for the gory details.
So I’d built this “app” at work, which is essentially a shared spreadsheet. It tracks manual reporting that managers in my team have to do weekly. Even though I built it, I don’t like it because it has reached that dangerous point in the life of every spreadsheet based app where the formulas are sufficiently complex enough to break with simple edits, or even regular use of the spreadsheet. And there’s only so much you can do with hidden columns and greyed out cells. The real solution, of course, was that there shouldnt be a tracker at all, but that’s a whole different discussion.
So anyway, I’d been thinking of rewriting the app by hand, and then thought: what if I use this GenAI stuff that everyone’s been demo-ing as being able to build whole applications from textual descriptions? So I fired up Chatgpt yesterday and spent 3 hours trying to do just that.
First, I asked ChatGPT if it can describe a logical architecture given requirements as use cases.
R1. As an employee I want to track any violations on resources I own, as visible from various websites, in a single central place, so that I can be compliant
In summary, the response was a good list of components that would be needed in an app to support this requirement: UI, Auth, Data ingestion and storage, Notifications, violation tracking, security etc.
Next I added another requirement:
R2. As a manager I want to see an aggregated dashboard of all employees reporting to me, including roll up of managers reporting to me, so that I can check compliance at my level
…and it added a component to manage the hierarchy. So far so good. So I asked it to suggest frameworks for each component in Javascript, to which it gave a pretty ok set of recommendations: React, Express, Mongo, etc.
Next, I said I’d like to start buiding the UI in React, and asked for directions to set the project up. The response had a pretty detailed set of instructions to install react and the required libraries, and even bash code to create the right directory structure. At this point, I realized, however, that there was an issue with my local node setup (unrelated to this session). So I asked for instructions to setup the project using Docker.
Chatgpt gave me a pretty usage Dockerfile and commands to build the container and run it, but forgot to provide the package.json referenced in the file. One reminder later, that file too was there, and I went ahead with building the container. Since it had suggested Alpine Linux 14, and I saw a ton of warnings about deprecated npm packages, I asked ChatGPT if there was a newer version of the OS. It said that to its best knowledge, as of Jan 2022, Alpine 16 was the latest and that I should use that. I did a Google search, found that Alpine 21 was the latest, and tried that, which failed for reasons I dont remember now, so I switched to the suggested version 16. However, I continued to have npm failures because the generated code referenced files that didnt exist - index.js, index.html and index.css, all of which CGPT had to be reminded, upon which it produced them promptly.
The next hour or so was spent fixing issues related to docker, npm, and react. Log files couldnt be seen easily since npm build would fail, the generated code used sub-packages and APIs that had changed across versions, and CGPT would forget that all the action happens inside a container and instead give instructions that would work outside one.
Finally, I got the build to work, and spent another 20 mins trying to get the container setup such that I didnt have to build every time I changed the source as suggested by the AI. I did this by myself, btw, since its advice on setting up volumes was plain wrong: it repeatedly told me that the npm build process would move files from /src to /build (TRUE), but that then I can continue to edit files in the host src folder and they’d be reflected magically in the container’s /build folder (FALSE).
At long last, I had a basic home page working, and the rest of the generated code - it had built one route for employee and manager correctly - showed up as blank pages.
I then directed it to display a list of violations in the employee page and to build the UI to update them. To its credit, it did produce a UI that had a form for submission of the violation, and a table below the form that showed the submitted violations. I even got it to transpose the display to show violations as the rows and weeks as the columns. But by this time I was exhausted.
My final ask from GPT was to recommend a React Grid component, which it did, and added it to the UI as well, but getting that to work was beyond my patience with this particular session that had lasted well into the night.
In summary: It is indeed amazing that an automaton can get to the point where I had a working UI at all. So there’s that. It’s also great that it built out the domain model to the extent that it did. It actually at one point told me to step back and build a backend API first and then to come back to the UI. However, as confirmed by many others before me, it’s still early days for the following reasons:
- Building real world applications is inherently messy and iterative. While an LLM can keep up with the change in directions, its own misdirections get to be taxing. As a human, I shouldnt have to keep tabs on the AI and catch its misses.
- Current information is important. And while Retrieval augmented Generation (RAG) could be applied, I wonder if it can hold specific knowledge in multiple domains such as React, Docker, UI development, API design/Express, etc.
- I’m also a bit skeptical about agent-based solutions like Lang Chain working in this kind of problem space. The process is inherently messy and there are a lot of blind alleys all the time, including ones that you run into while already in another one! Unless the chain can be dynamically changed based on the last outcome, agent based solutions might be too rigid.
- Finally, and I have no data to back this up, but I sense that the problem is at the core of how LLMs work. “If you’ve done N steps so far, what’s the next step in building the app?” is indeed the general formulation of the problem, and seems to fit the LLM’s credo, but the answer needs to be goal oriented, not based on yesterday’s successes.
That all said, we’re certainly living in interesting times!