Andrea Fomera: Own your LLM's Output

At one of my previous jobs the team had this mentality that we owned our vendors mistakes as if they were our own, after all we chose them, we're responsible for if the company doing the network routing goes offline. Of course publicly you're always like "an upstream provider's outage is currently affecting our app's uptime, we're working hard to restore operations", but behind the scenes you should be having conversations about how to mitigate the impacts when that provider goes down again. I've been thinking about this a lot with the AI age of coding coming onto the scene more and more.

One of the things I've taken to heart while I've coded with LLMs is this mentality of owning the output from it. Review + understand its output, take responsibility for the failures and learn from them. I've seen some in the tech scene on Twitter mention they sometimes rarely review all the AI output and that feels a bit too risky for my tastes for a work level project. You have responsibilities to the company that signs your paycheck, their customers and your teammates, especially concerning the work that you allow AI to handle for you.

LLM's deleted someone's production database

Note: I know in this example I'm talking about they were using some more vibe-coding type platform that didn't have the right safety rails in place.

How does this even happen? I've seen the models try to do some wack stuff for non Ruby projects for like deleting build folders, but in general, I'm not running it with the permissions setup to allow this to be the circumstance. The flag --dangerously-skip-permissions is well erm, named that way for a reason. Own the fact you ran it with that, learn from it, put hooks in place to prevent the wild things from happening again.

Again, perhaps I'm missing some critical workflow change here that makes that flag needed, but in my year or so of adopting agentic flows, I've always wanted my hands to be able to step in and course correct. People claim on Reddit it's because it's annoying when Claude asks permission to do something, I get that, but like, change your .claude/settings.json file to allow those commands, then slowly chip away as Claude asks more commands and add those to the file too? After a few sessions you don't need to worry about the commands you know are safe (i.e find, wc, etc). Better if you contribute those allow-listed commands to your teammates so they don't get annoyed as well.

It feels a bit irresponsible to me to allow LLMs to go wild on their own without oversight on a machine which might have access to sensitive data. Put it in a sandbox VM and give it access to a git repo? OK fine I see the use case there, but make sure it doesn't have access to read/write to the production database, nor allow it to rewrite the git history (ie use GitHub Force Push protection)

I've done a lot of dumb mistakes over my career, watched a few dumb ones happen too by others and learned from it. LLMs don't know those long-term career level mistakes yet, or chooses to ignore them, they're eager to prove you should spend money on their tokens and goodness do they ever want your money.

I've accidentally wiped my local development environment at least twice (😉 that's all I'll admit to!), once you get tired of setting it up just the way you like it over and over you start to prep for the worst-case scenario more often, dump the DB where you can restore it from etc etc.

Own your Claude, Codex, Gemini or other systems output

Review it, learn from it, put guardrails in place. Perhaps processes need to be adjusted to a sort of incident post-mortem review like flows. "We let the LLM solve this bug, automated reviews or CI checks nor tests caught it and as a result it caused 3 new ones and impacted performance by 10% in the negative"

What could you learn from this fake scenario in my head? Well, humans are human and saw there was perhaps an increase in test coverage, saw the code matched the style guide and the checks were green so merged it because it was one less thing for them to code. "Good enough, looks good to me" this fake person said in their head (it's me, I've been guilty of saying that before when approving or merging a PR without proper review)

But you know, when it bites me after doing so? I make sure I don't make that mistake twice, kick the tires on the next PR in that area a little harder and then learn and grow from it.

Processes need to be flexible

The old processes many of us have used for years and years are slowly going away, LLMs move at the speed of the network, it's cheaper now more than ever to get code generated that might solve the problem at hand.

Learnings should certainly be shared among the team, bring those who are a few "updates" behind with you as you figure out the optimal workflows and learnings after finding the sharp edges for them.

But I suspect product teams are or will struggle to determine which problems to solve are the most important. What do you burn the human-tokens on?