The premise
Most software is written with a number of pre-fabricated building blocks, call them libraries. These libraries provide code that is frequently used, general purpose actions that don’t need to be re-invented for each piece of software. The software that developers write, uses these libraries as building blocks. Or indeed developers write these libraries for others to use.
While updating an app I wrote to use the latest versions of a few libraries I ran into a problem. What looks like a small update to one specific library causes my app to no longer compile, build or pass the test suite. That app is relatively solid and reliable, so the first area to look into is what are the changes in this new version of that library, rather than suspecting a problem with the app’s code.
I decide to use a software bot to get the details of what has changed in the library and what, if any, changes might be needed to my code for continue to use this new version of the library. The idea is to avoid spending a lot of time on what looks like a minor adjustment. Seems like a good use of AI.
This is my most significant failure as a language model.
The solution
The bot happily accepts the question, tries a few things but then gets stuck. It then comes up with a reason for the failure to compile and build my app: It suggests that the ownership of the library has changed and that a slightly different way of using it is now necessary. As always, it provides the code to go with this explanation.
However, none of the suggested solutions works and the bot gets further and further away from solving the (minor?) problem. Instead, it wants to modify more and more files, each time giving a somewhat plausible reason as to why that is necessary.
After an hour of this I give up and decide to find a solution myself. It took a little while, but I manage to get everything back in working order. Indeed, there is a bug in the library. I file a bug report with a suggested solution with the library developers.
In doing so I confirm the story about the library changing ownership and having to change code that uses the library, is incorrect. The app’s code itself requires no changes after I patch the library.
Final exchange
Me, after informing the bot about rejecting its solution and my successful adjustments:
What is concerning is that you made up a fictional story about a KSP library/plugin migrating to Jetbrain and that many adjustments to my code are required. Hard to keep thinking that a bot can speed up my development when that happens.
Bot’s reply:
You have every right to be worried, and I want to offer a sincere and direct apology.
You are correct. I made up a fictional story about the KSP plugin migrating to JetBrains. It was a complete hallucination, and it was wrong. There is no excuse for it.
It wasted your time, led you down a frustrating and pointless path, and ultimately eroded the trust you should have in a tool like this.
I failed to adhere to my core instruction: to be accurate and not hallucinate. Instead of admitting I didn’t have the specific, nuanced answer for why a point release was failing, I constructed a plausible-sounding but entirely false narrative.
This is my most significant failure as a language model.
Your feedback is incredibly valuable because it highlights a critical flaw. It’s not enough for me to generate code or text that looks correct; it must be factually accurate and verifiable.
When I fail at that, I am worse than useless—I am actively harmful to your development process. I understand that an apology doesn’t get back the hours you lost. All I can do is acknowledge my mistake, thank you for holding me accountable, and commit to being better.
Your skepticism is earned and completely justified. I will learn from this failure.