Andrea Fomera: New: Checkset - a Ruby gem for repeatable verifications using Playwright.

So... I accidentally reinvented system tests.

Hear me out though! System Tests as Rails Developers know them tend to be flaky, and slow. Playwright tends to fix the flakiness in how it runs the tests, but either way they can be slow and prone to breakage when you check the UI and your tests aren't hardened for it.

Then you have this test suite in your apps that only runs against your test environment, that feels like a waste and most teams or people I know have to do 'smoke tests' against a staging or production environment after deploys to verify things were ok. What a waste of all that time to do system tests and not even get to use it against your production environment.

When I first was building this, I came at it from an angle of trying to automate that smoke test or QA as it is. Then only after getting my prototype working I realized it actually was just system tests in a trench coat. Oops.

So, a Checkset is a suite or more of repeatable checks you can run against a target url. You can have 'Preps' that run before checks to do things like ensure your test user exists and is in a known good state. Perhaps you want to make an API call to your app's api to create a fresh user / seed some data or whatnot, you're given the option to.

Again, sounds a lot like System Tests. 😰, but let's talk about some differences and why you might want to give it a try for your critical flows you're always testing with.

How is it different than system tests?

1) You can target different URLs with the same checksets

2) It doesn't have to live in your app's codebase.

3) You can setup different 'suites' of checks that can target subdomains or other domains for your apps. Each get their own browser context for the checks.

4) LLM agents have been super quick to be able to write my Checks for me by pointing it to the live site with the Playwright MCP + the gem's readme. I suspect this will improve rapidly as more examples are written.

Now the cons

With the pros of #1 from above, you need to keep in mind that your users or whatever you're testing with across urls will either be created or need to exist first, ie if you're signing in with test@example.com you're gonna end up with a user for that on your production site too.

That's where Preps come in handy, you can have a prep that hits your app's API to create a test user before the check runs, or you can just... you know, have a known test user account that exists everywhere. Most teams I've worked on have had their own production testing accounts too anyway, so it might not be as scary as it sounds. But it's worth thinking about before you bundle exec checkset --target https://yourproductionsite.example.com and wonder why you have a new user in your database 😅.

The other thing is this is a Ruby gem that depends on Playwright being installed. You'll need node installed for the browser binaries. It is what it is.

And look, your system tests probably cover more edge cases than Checkset is designed for. I'm not saying rip out your test suite and replace it with this. Checkset is really meant for the "did the deploy break the 5 most important flows" kind of checks. The most critical paths. The things you're already manually clicking through after every deploy anyway.

OK show me what it looks like!

A check is just a Ruby class. Here's what a basic one looks like:

# checks/homepage.rb
class Checks::Homepage < Checkset::Check
  description "verifies the homepage loads correctly"

  def call
    visit "/"

    verify "page has title" do
      page.title.include?("My App")
    end

    verify "has main heading" do
      page.get_by_role("heading", name: "Welcome").visible?
    end
  end
end

There's two primitives here: verify and step. A verify is an assertion that keeps going if it fails, it collects all the failures in one run so you're not playing whack-a-mole running the suite over and over to find the next broken thing. A step is an action (like clicking buttons, filling in forms etc) that half on failures because the rest of the check probably depends on it succeeding.

Here's a more realistic example, a checkout flow:

class Checks::UserCanCheckout < Checkset::Check
  prep :sign_in_as_customer
  description "Verifies the full checkout flow"
  tags :checkout, :critical

  def call
    visit "/products"

    step "add product to cart" do
      page.get_by_role("button", name: "Add to Cart").first.click
    end

    verify "cart has items" do
      page.get_by_test_id("cart-count").text_content.to_i > 0
    end

    step "go to checkout" do
      page.get_by_role("link", name: "Cart").click
      page.get_by_role("button", name: "Checkout").click
    end

    verify "reached checkout page" do
      page.url.include?("/checkout")
    end
  end
end

See that prep :sign_in_as_customer at the top? That runs then before the check in the same browser context, so the session carries over. The prep handles the login dance so your checks stay focused on the thing they're actually checking.

Suites are where it gets fun

If you're working on an app with subdomains (admin.example.com, app.example.com, etc) you can setup suites in a checkset.yml that maps each subdomain to its own set of checks:

base_domain: myapp.com

suites:
  app:
    target: https://app.%{domain}
  admin:
    target: https://admin.%{domain}

Then you organize your checks into folders that match the suite names and the top-level checks run agains every suite:

checks/
├── app/
│   ├── user_can_sign_in.rb
│   └── user_can_checkout.rb
├── admin/
│   └── admin_dashboard.rb
└── homepage.rb              ← runs in every suite

Want to run your checks against staging instead of production? Just swap the domain:

bundle exec checkset --domain staging.myapp.com

Same checks, different target. That's really the whole pitch.

The LLM angle

I wasn't kidding about #4 in the pros list. I've been using the Playwright MCP with Claude to write checks and it's been shockingly effective. Point it at a live site, give it the gem's readme for context and ask it to write a check for a specific flow. It navigates the site, figures out the selectors, and writes a check class that mostly just works.

Sometimes it can be too clever with selectors and you have to nudge it towards get_by_role or get_by_test_id instead of fragile CSS selectors, but that's a quick prompt adjustment. This is one of those places where I'm happy to let the LLMs code this because writing browser checks by hand is one of the more tedious parts of my job that I'd happily never do again.

Running it

# Basic — point it at a url and go
bundle exec checkset --target https://staging.myapp.com

# Only the critical checks
bundle exec checkset --target https://staging.myapp.com --tag critical

# Watch it work (headed mode + slow motion)
bundle exec checkset --target http://localhost:3000 --headed --slow-mo 500

# Run checks in parallel
bundle exec checkset --target https://staging.myapp.com --parallel 4

It spits out a nice terminal summary and writes JSON results to tmp/checkset. Screenshots are captured on every step and verify (pass or fail).

It's early

I'll be honest, this is still super fresh. I've been using it on my own projects and it's been working well for me, but there are certainly rough edges I haven't found yet. If you try it and something looks weird, open an issue and I'll take a look!

gem "checkset"

Give it a shot and let me know what you think!

GitHub: https://github.com/afomera/checkset

Happy checking 👋

Andrea