# The 5-Layer LLM Review Burrito

*Published on April 14, 2026*

Today I want to write about something I&#39;ve been thinking about the last few months. It&#39;s certainly an ever changing landscape in terms of LLM usage for developing software. Picture that you&#39;ve picked up a task for a ticket and have prepared a Pull Request with your changes to submit. Before you submit it you run it through Claude or Codex or whatever your LLM model of choice is, this feels like a responsible usage of an LLM to help spot-check your code for anything you might have missed during development _before_ you ask reviewers to look at it.

After you submit the PR, your team&#39;s automated LLM review bot leaves its own comments. Then you have two reviewers stop by who run the diff through their own LLM before giving feedback. That&#39;s five layers of LLM review on a single pull request, and I&#39;ve been calling it the 5-Layer LLM Review Burrito. Earlier this year I wrote about [owning your LLM output](https://afomera.dev/posts/2026-02-06-own-your-llms-output), and I think the burrito is where that idea gets super interesting to me.

Let&#39;s dive into this a bit more

## What&#39;s good about the 5-Layer LLM Burrito?

Each layer on its own is honestly pretty reasonable. Having an LLM spot-check your code before you push? That&#39;s really just being a good teammate in my eyes. You&#39;re not wasting your reviewers&#39; time on a typo in a method name, or a missing nil check you would have caught anyway. The automated CI bot? Likely catching formatting stuff, flagging obvious security concerns and it does it without anyone needing to spend brain cycles on it. Reviewers each running the diff on their own LLM to grok a large changeset faster? I totally get it, I&#39;ve done it too 😅.

The thing is, different reviewers bring different lenses to the table. Engineer A might be focused on understanding the &quot;why&quot; behind technological choices. Engineer B might be thinking about whether this abstraction holds up six months from now. Engineer C might just be making sure the tests are actually testing the right things. Those are all valuable perspectives and none of them are really &quot;bad&quot; LLM uses for being assisted by.

Where it gets interesting for me is that LLMs are actually _really_ **_good_** at the surface level stuff_._ Catching an N+1 query in isolation, finding missing indexes, flagging if you forgot an error case. If that&#39;s what each layer is doing, great you&#39;ve got five solid lint cases. That can certainly be a win.

## Where&#39;s all the rice in this 5 tortilla burrito?

I think the thing that concerns me and bugs me about all of this is... Five layers of LLM review sounds like a lot of coverage. It _feels_ thorough. But is it actually five reviews? Or is it one review that happened five times?

LLMs are pulling from largely the same training data, the same patterns, the same understanding of what &quot;good code&quot; looks like. If you stack five of them on the same diff, they tend to catch the same stuff and miss the same stuff. You&#39;ll get five comments about error handling and zero about the fact that this approach is going to make building the next quarters flagship feature harder to implement. Nobody flagged that before because nobody _could_ flag that.

It creates this false sense of security. You see five rounds of review, all green, all positive and your brain goes &quot;well that was thoroughly reviewed!&quot; But thorough implies breadth. This is depth on a very narrow axis. You&#39;ve got a really thick burrito that somehow has no rice, no beans, and no salsa or sour cream 😦. Just tortilla all the way down!

_Someone call the burrito police! That feels a bit like a crime against burritos._

## What&#39;s not in the diff

So if LLMs are catching the same surface level stuff across five layers, what are they missing? For me it comes down to one thing that LLMs can&#39;t provide yet: context that doesn&#39;t live in the code.

LLMs are reviewing your code in a vacuum. A really smart vacuum but still a vacuum. They can see the diff. they can see surrounding files. But they can&#39;t see that your team tried the exact same caching strategy a year ago and it caused a thundering herd problem on your deploy. They don&#39;t know another team is mid-release and this PR would change a web hook that would ruin their release and break their stuff. They can&#39;t tell the codebase has been migrating away from one code pattern to a different one and this PR goes in the other direction.

That&#39;s all stuff that lives in people&#39;s heads. In slack threads from 2024. In painful memories of on call rotations gone wrong. In conversations at standup about what&#39;s coming next quarter. No amount of LLM layers is going to surface any of that.

I think this is also where as code authors we could be doing a lot more. Instead of writing PR descriptions that summarize _what_ changed (which, let&#39;s be real, LLMs can do that for us now), we could be signaling the why. Why this approach and not the other three you considered? What tradeoffs did you make? What are you unsure about? That&#39;s the stuff that gives human reviewers something real to engage with beyond &quot;yep looks good, Claude agrees.&quot;

## Own the burrito

Earlier this year I wrote about owning your LLM&#39;s output and I think the burrito makes that conversation way more interesting. Because when there&#39;s five layers of LLM between you and the merge button, ownership can get real fuzzy fast.

For me the danger isn&#39;t that any one layer is doing something wrong. It&#39;s the diffusion of responsibility that happens when you stack them all together. You reviewed it with Claude before pushing, the bot reviewed it on CI, your reviewers ran it through the LLMs. Everyone saw green (eventually). Everyone felt good. Then something breaks in production and you realize that five layers of review all missed the same thing because none of them had the context to catch it.

That&#39;s still on you as the PR author in my eyes, but perhaps it signifies a crack in the foundations of leaning on LLM reviews at every step.

I don&#39;t think the answer is to stop using LLMs in code review. Honestly for me they&#39;ve been helpful for spot-checking my own work before I accidentally waste a teammates time on something that could be a silly mistake. The answer is to stop treating more layers like more safety. Be intentional about what you&#39;re asking the LLM to do at each step and be honest about what it can&#39;t do at any of them.

Write better PR descriptions. Tell your reviewers _why_ you made the choices you made. Give them something to actually chew on that isn&#39;t &quot;does this code work.&quot; What did you optimize your solution for? What drove the decisions behind the _why?_. When you&#39;re the reviewer, fight the urge to let the LLM do your thinking for you. Trust that your teammates did some basic due diligence before slinging code at you. The stuff that matters most in a review is the stuff only you can bring to the table.

Five tortillas is not a meal. Make sure somebody&#39;s bringing the rice 🌯.

---

By [Andrea Fomera](https://afomera.dev) | [View original post](https://afomera.dev/posts/2026-04-14-5-layer-llm-review-burrito)
