Securing a web app you don't fully understand

Somewhere out there is a web app you’re responsible for, and nobody can tell you with a straight face whether the login form is safe to expose to the internet. Maybe you inherited the app. Maybe a colleague vibe-coded it over a weekend. Maybe it’s a decade old and the original team is long gone. All of these are the same problem: you need to secure a web app whose internals you do not fully understand. You cannot walk through every route, every auth check, every dependency, and every database call and vouch for it personally. Rewriting takes quarters you don’t have, and leaving it exposed isn’t acceptable. The middle ground is what this article is about.

If we treat an app we have as a black box, we need to defend it from the outside, layer by layer. No single tool secures the whole thing, but a handful of complementary ones, stacked, cover most of the obvious ways it will get owned. Three core layers, roughly in the order you should add them:

LayerWhat it does
Secret scanningFinds credentials leaked into the repo and its history.
Edge filteringA WAF in front of the app, filtering obviously malicious traffic.
Bot protectionCAPTCHA or score-based bot detection on the forms that have to stay public.

The three layers above all sit at the edge — the boundary between your app and the outside world. Secret scanning keeps credentials from leaking out of the repo, the WAF filters malicious traffic coming in, bot protection challenges non-human visitors on the forms that have to stay public. None of them need to understand the app’s internals to work, which is exactly why they’re so useful when those internals are uncertain.

OWASP has a name for this pattern: virtual patching — protective layers that reduce exposure while the underlying code is still uncertain. None of these replace a proper rewrite or a proper audit. All of them buy you time.

The edge is where this article spends most of its time, but we’ll also cover a little of what you can do beyond the edge — looking inward instead, at the libraries the app actually runs, at its source code, and at its runtime behavior. Dependency scanning finds known-vulnerable libraries the app already ships, and SAST and DAST at the end of the article catch code-level patterns no edge layer can see.

Most of these layers have a solid self-hosted option that works on any host — gitleaks for secrets, ModSecurity for the WAF, OSV-Scanner for dependencies, Semgrep for SAST, ZAP for DAST. These stay primary throughout. For the managed-cloud equivalents — where you trade some configuration work for a vendor’s tuned ruleset and threat intel — every major cloud ships its own. We’ll reach for Google Cloud Platform as the concrete reference when a managed example helps: Cloud Armor for the WAF, Artifact Analysis for container scanning, reCAPTCHA Enterprise and Adaptive Protection for bot defense. Picking one cloud keeps those examples specific rather than a hand-wave; AWS, Azure, and Cloudflare offer equivalents we’ll mention briefly but not deep-dive.

Three categories of unknown

Before the tools, it is worth naming what you are actually defending against, because that shapes which layer matters most. The unknowns fall into three groups.

What’s in the code and the repo. You don’t know which routes exist — legacy apps grow organically and there are probably admin endpoints nobody mentioned. You don’t know what the code actually does, either because you can’t read it fast enough (legacy) or because nobody read it carefully in the first place (AI-generated). And you don’t know what’s already been leaked into git history. DAST probes the running app from the outside to map what’s actually exposed; SAST gives you a lossy but useful map of risky code patterns; secret scanning catches credentials forgotten in old commits.

What’s running alongside the code. You don’t know which dependencies are current — a ten-year-old app has dozens of libraries with public CVEs (Common Vulnerabilities and Exposures — the public catalog of known security flaws, each with a standardized identifier like CVE-2024-12345). Dependency scanning is the single highest-leverage layer for this.

What’s happening to the app in production. You don’t know who is attacking you and how — a WAF’s audit log tells you that within a day of going live. And you don’t know how much of your traffic is bots, which on public signup, login, and password-reset forms is often a majority. Bot protection surfaces that and lets you act on it.

No tool solves all of these. Each of the three layers below addresses one or more of them; dependency scanning (covered after the layers) and SAST/DAST (at the end) push the depth further once the basics are in place.

Secret scanning

Repos leak credentials — legacy ones because they grew organically over years, vibe-coded ones because LLMs cheerfully hardcode keys for “convenience” and nobody reviewed the diff. API keys, AWS access keys, database passwords, internal service tokens — they end up in commits, in config files, in old test fixtures, in documentation, in AI-generated examples that quietly substitute a real key for the placeholder. And because git keeps history forever, even commits that “removed” a secret still contain it.

The fastest return on a black-box app is scanning the full git history of every repo involved, rotating anything found, and then keeping a scanner in CI so new leaks are caught.

Tools

  • gitleaks — fast, Go-based, excellent default rules covering most providers (AWS, GCP, Azure, Stripe, Twilio, GitHub tokens, private keys, JWTs). Run it against the full history, not just HEAD.
  • trufflehog — similar detection but adds a verification step: it tests whether a detected secret is actually live (e.g., makes an sts:GetCallerIdentity call for AWS keys). Higher-signal output, worth the extra latency.
  • git-secrets — AWS-focused, lightweight, useful as a pre-commit hook.
  • GitHub Secret Scanning — free for public repos, Advanced Security on private. Partners with providers so leaked tokens get auto-revoked.

Running a scan and reading the output

gitleaks can be installed several ways — a prebuilt binary, Homebrew, go install, or via Docker. We’ll use Docker here for convenience: nothing to install on the host, the same command works on any machine with a Docker daemon, and CI runners get the same invocation as your laptop.

I just ran this against the repo this article lives in:

docker run --rm -v "$(pwd):/repo" zricethezav/gitleaks:latest \
    detect --source=/repo --log-opts="--all" --redact --verbose

--log-opts="--all" walks every commit on every branch, not just HEAD; --redact masks the secret value in the output so the report itself isn’t a new leak; --verbose prints the full per-finding block instead of just a summary count.

gitleaks emits one finding per potential secret string it identifies — the same block of fields for every match, regardless of which rule fired. A typical finding looks like this:

Finding:     STRIPE_SECRET_KEY=REDACTED
Secret:      REDACTED
RuleID:      stripe-access-token
Entropy:     4.175736
File:        blog/src/content/blog/anatomy-of-a-developer-targeted-supply-chain-attack.mdx
Line:        225
Commit:      49e4ca23a5ee6d1ef6b1a566a580a4086fbb84aa
Fingerprint: 49e4ca23a5ee6d1ef6b1a566a580a4086fbb84aa:blog/src/content/blog/anatomy-of-a-developer-targeted-supply-chain-attack.mdx:stripe-access-token:225

The five fields that matter:

FieldWhat it tells you
RuleIDWhich detection rule fired (stripe-access-token, aws-access-token, generic-api-key, private-key, etc.). The full default ruleset lives in config/gitleaks.toml in the gitleaks repo — every rule with its regex, description, and any keyword filters.
File + LineWhere the match is.
CommitWhich commit introduced it — not necessarily the latest one. gitleaks finds it wherever it first appears in history.
EntropyShannon entropy of the matched string, in bits per character. Higher = more random-looking, which usually means more likely to be a real secret. Generic-API-key rules use entropy as a primary signal; provider-specific rules (Stripe, AWS) match the prefix and don’t need it.
FingerprintA stable identifier you’ll use to suppress this finding if it turns out to be acceptable. More on that below.

What entropy actually measures

Shannon entropy measures how unpredictable each character of a string is, on average. A string where every character could equally have been any of N values has the maximum possible entropy: log₂(N) bits per character. A string where one character appears far more often than others, or where the next character is predictable from the previous one, has lower entropy.

For secret scanning, the question entropy is trying to answer is: does this string look like it was generated by a random source, or by a human? Real secrets — API keys, hashes, base64-encoded tokens — are designed to be unpredictable, so they push entropy toward the alphabet’s maximum. Names, English words, dates, version strings, file paths, and boilerplate sit much lower. Entropy is the cheap test that separates “this was rolled by a CSPRNG” from “this was typed by a human or generated by code with a pattern.”

Rough scale:

  • under ~3.0 — looks like English text, identifiers, version strings, or repetitive structure. Things like password123, [email protected], v1.2.3-beta, file paths.
  • ~4.0 — near the maximum for pure hex (log₂(16) = 4). MD5/SHA hashes, hex-encoded tokens, UUIDs.
  • ~5.0 — typical for random alphanumeric tokens. API keys with a fixed prefix and a high-entropy random tail.
  • ~5.95 — ceiling for a fully random 62-character [a-zA-Z0-9] alphabet. Long base64-ish or pure alphanumeric secrets.

Our finding above scored 4.175736 — comfortably above gitleaks’ default cutoff of ≈3.5 for generic rules, and right in the band where real provider keys land. The fixed prefix (something like sk_live_) drags the average entropy down a bit; the random tail pushes it back up. The combination is exactly what a real API key looks like.

Triaging findings

When you actually run gitleaks against a real repo, you get three categories of finding — and the same response doesn’t fit all three:

  1. Real leak. A credential committed by accident — a .env file, a config file with a hardcoded API key, a test fixture with a real token. Rotate immediately. Assume the credential is compromised, even if the repo is private. Former contractors and old laptop backups are not theoretical. Then decide whether to rewrite history (only for high-sensitivity cases like production DB passwords or signing keys — rewriting is disruptive, do it deliberately).
  2. Intentional inclusion. The string genuinely matches a secret pattern but is meant to be in the repo: documentation that quotes attacker-controlled credentials as evidence in a security writeup, test fixtures with deliberately-fake-but-realistic-looking values, sample config files. The repo this article lives in has exactly this case — the supply-chain attack writeup quotes the malicious project’s hardcoded Stripe and API keys to show what the attacker was using. gitleaks correctly flags them, but those are not real credentials so there’s anything to rotate.
  3. False positive. The pattern matched but the string isn’t actually a secret — random base64 in a hash, a long UUID-like value, a commit-message word with high entropy. Less common with provider-specific rules, more common with generic ones.

For categories 2 and 3, the fix is a .gitleaksignore file at the repo root listing the identifiers — the Fingerprint field from the table above — of findings you’ve reviewed and approved:

# Intentional: secrets quoted as evidence in supply-chain writeup
49e4ca23a5ee6d1ef6b1a566a580a4086fbb84aa:blog/src/content/blog/anatomy-of-a-developer-targeted-supply-chain-attack.mdx:stripe-access-token:225
49e4ca23a5ee6d1ef6b1a566a580a4086fbb84aa:blog/src/content/blog/anatomy-of-a-developer-targeted-supply-chain-attack.mdx:generic-api-key:193

gitleaks will skip these specific findings on subsequent scans. Comment liberally — future-you needs to know whether each entry was approved because it’s intentional or because it’s a false positive, and the comment is the only durable record of that decision.

Stop the next leak at commit time

Once the historical scan is clean (or fingerprinted-and-suppressed), wire gitleaks into:

  • CI — gates the build on any new findings. The same Docker invocation works in any CI runner.
  • Pre-commit hook — catches leaks before they reach your local repo at all. There’s an official gitleaks protect mode for this, or use the pre-commit framework with the gitleaks hook.

Together these mean a new leak now requires actively bypassing both layers — much harder to do by accident than to do deliberately.

Edge filtering with a WAF

A Web Application Firewall is a reverse proxy that inspects HTTP(S) requests against a ruleset. It decides to allow, log, block, or challenge each request before it reaches your app. A WAF is useful for:

  • Catching obvious payloads for classes of attacks: SQL injection, XSS, local/remote file inclusion, command injection, path traversal.
  • Blocking known bad paths (/wp-admin/, .git/config, .env).
  • Rate limiting, IP reputation blocking, geo filtering.
  • Buying time when a CVE drops in a dependency you cannot upgrade today.

It is not useful for broken business logic, broken authorization, or bad session design. It is a compensating control, not a cure.

There are two operational paths to a WAF. A managed WAF is run by your cloud provider or a self-hosted WAF you run yourself. The detection ruleset underneath both is essentially the same — the open-source ModSecurity v3 engine paired with the OWASP Core Rule Set (CRS), or a vendor’s rebranded version of the same. The CRS is roughly 200 community-maintained detection rules covering SQL injection, XSS, RCE, path traversal, and the rest of the OWASP Top 10. It uses anomaly scoring — rules contribute to a score, and a request is only blocked when the total crosses a threshold — which makes tuning gentler than per-rule blocking.

So the choice between managed and self-hosted is mostly operational, not about detection quality. Both paths run essentially the same engine and rules.

Managed: Cloud Armor

A managed WAF is almost always the right starting point: less infrastructure to operate, faster to deploy (an afternoon, not a week), no engine to update or tune. Pick this first unless you have a specific reason not to.

The GCP-native option is Google Cloud Armor. It attaches to a GCP HTTP(S) load balancer, and its preconfigured rule groups (sqli-v33-stable, xss-v33-stable, lfi-v33-stable, rce-v33-stable, etc.) are derived directly from the OWASP CRS — the same ruleset you’d install by hand with ModSecurity.

You can configure Cloud Armor through the GCP Console, the gcloud CLI, the REST API, or — what we’ll show below — declaratively in Terraform. The model is the same regardless of interface: a Cloud Armor security policy is a list of rules with match conditions and actions (allow, deny(403), rate_based_ban). A minimal policy enabling the preconfigured CRS rule groups and a sensible default action looks like:

resource "google_compute_security_policy" "app" {
  name = "app-waf"

  rule {
    action   = "deny(403)"
    priority = 1000
    match {
      expr { expression = "evaluatePreconfiguredWaf('sqli-v33-stable', {'sensitivity': 2})" }
    }
    description = "Block SQL injection"
  }

  rule {
    action   = "deny(403)"
    priority = 1001
    match {
      expr { expression = "evaluatePreconfiguredWaf('xss-v33-stable', {'sensitivity': 2})" }
    }
    description = "Block XSS"
  }

  rule {
    action   = "allow"
    priority = 2147483647
    match { versioned_expr = "SRC_IPS_V1" config { src_ip_ranges = ["*"] } }
    description = "Default allow"
  }
}

There’s one mode worth mentioning early: Cloud Armor’s preview mode. With it on, rules evaluate normally and log every match to Cloud Logging — but they don’t actually deny traffic. It’s the managed equivalent of running ModSecurity in detection-only, and it’s how you safely turn a brand-new policy on against real production traffic without the risk of accidentally blocking your payment callback at 3am. The rollout flow is the same as for self-hosted: preview for a week, tune based on what fires, then flip rules out of preview one at a time.

Beyond the WAF rules, Cloud Armor bundles or pairs with several adjacent capabilities: DDoS protection (always on), an adaptive-protection ML layer that detects volumetric anomalies and auto-suggests rules, and bot management. The last two are paid add-ons rather than free baseline features.

Tradeoffs to know about. Cloud Armor’s custom rule language is a subset of CEL — less expressive than ModSecurity’s full SecRule syntax, so very intricate custom rules sometimes can’t be expressed directly. The CRS version is Google’s choice, not yours, so you can’t stay on a specific revision. And there’s a per-request cost on top of LB egress.

Other cloud providers have their own equivalents. AWS WAF attaches to ALB/CloudFront/API Gateway with AWS’s own rule syntax plus managed rule groups; Azure Front Door / Application Gateway WAF is also CRS-based; Cloudflare WAF is architecturally different — it’s a reverse-proxy CDN rather than an LB-attached filter, so you point DNS at Cloudflare and your origin hides behind their network. All three include CRS-based managed rule groups and a detection-only equivalent.

Self-hosted: ModSecurity + NGINX

The self-hosted path is the right choice when you want full control over custom rules, you are not on a cloud with a good managed offering, cost-per-request matters at your volume, or compliance constraints make an appliance you operate easier to reason about than a vendor service.

You run ModSecurity v3 directly as a dynamic NGINX module, configured with the OWASP CRS rules. The shape of the architecture is:

Internet  →  NGINX + ModSecurity  →  upstream app

Here’s how the three pieces are arranged:

  • NGINX is the reverse proxy. It terminates TLS, accepts incoming HTTP requests, and — when nothing blocks them — proxies them to the upstream app on a private port.
  • libmodsecurity is the rule-evaluation engine. It’s a C++ library, separate from NGINX itself, with no network code of its own — it just takes a request as input and returns a verdict.
  • The NGINX ModSecurity connector is a dynamic module (.so file) that bridges the two. It hooks into NGINX’s request-handling pipeline and, for every incoming request, hands the URL, headers, and body to libmodsecurity for evaluation.

All three pieces run inside the same NGINX process. The mechanism is NGINX’s dynamic-module loader — both the connector and libmodsecurity end up loaded into each NGINX worker at startup, with no separate ModSecurity daemon, no IPC, no socket between them.

When a request comes in, NGINX calls into the connector, which calls into libmodsecurity, all within shared process memory. That’s why the integration only adds a few milliseconds of latency rather than the round-trip cost of an out-of-process security gateway. It’s also why the connector module’s NGINX version has to match the NGINX version it’s loaded into exactly — the connector is compiled against NGINX’s internal ABI, which is not stable across versions.

For each request, NGINX passes the URL, headers, and body to libmodsecurity (via the connector). libmodsecurity runs them through the loaded rules — the OWASP CRS plus any custom ones — accumulates an anomaly score, and returns an action: allow, log, or deny. If the verdict is deny, NGINX returns the configured error response (usually 403) and never forwards to the upstream. Otherwise the request proceeds normally.

Once the WAF is in place, make sure these are all true:

  • The app must not be reachable directly. If it still has a public IP, attackers bypass the WAF entirely.
  • TLS terminates at the WAF. ModSecurity cannot inspect what it cannot decrypt.
  • The WAF is now a single point of failure — run at least two instances behind an LB if uptime matters.
  • Audit log volume will spike. Plan storage.

Install shape on Debian/Ubuntu (abbreviated):

# libmodsecurity
git clone --depth 1 -b v3/master https://github.com/owasp-modsecurity/ModSecurity
cd ModSecurity && git submodule init && git submodule update
./build.sh && ./configure && make -j$(nproc) && make install

# NGINX connector module (must match your NGINX version)
git clone --depth 1 https://github.com/owasp-modsecurity/ModSecurity-nginx
cd nginx-${NGINX_VERSION}
./configure --with-compat --add-dynamic-module=../ModSecurity-nginx
make modules && cp objs/ngx_http_modsecurity_module.so /etc/nginx/modules/

# OWASP Core Rule Set
git clone --depth 1 https://github.com/coreruleset/coreruleset /etc/nginx/modsec/crs
cp /etc/nginx/modsec/crs/crs-setup.conf.example /etc/nginx/modsec/crs/crs-setup.conf

Then in nginx.conf:

load_module modules/ngx_http_modsecurity_module.so;

http {
    modsecurity on;
    modsecurity_rules_file /etc/nginx/modsec/main.conf;

    server {
        listen 443 ssl;
        server_name legacy-app.example.com;
        ssl_certificate     /etc/ssl/certs/app.crt;
        ssl_certificate_key /etc/ssl/private/app.key;

        location / {
            proxy_pass http://127.0.0.1:8080;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}

Roll out in detection-only mode first. Set SecRuleEngine DetectionOnly for at least a week. Read the audit log. You are looking for false positives (CRS rules firing on legitimate traffic — editors, admin dashboards, file uploads) and unexpected true positives (attacks already hitting you). The CRS uses anomaly scoring — rules contribute a score and a request is only blocked when the total crosses a threshold — which makes tuning gentler than per-rule blocking.

When you find a false positive, use SecRuleRemoveById scoped to the specific location first, lower the paranoia level on a path second, and disable the rule globally only as a last resort. Only after the logs are quiet enough to be meaningful, flip SecRuleEngine On.

For custom virtual patches — say you know /api/report?format=X is vulnerable to command injection — you can drop anything outside an allowlist at the edge:

SecRule REQUEST_URI "@beginsWith /api/report" \
    "id:1000001,phase:1,chain,deny,status:400,\
     msg:'virtual patch: /api/report format allowlist'"
    SecRule ARGS:format "!@rx ^(pdf|csv|json)$"

Bot protection and CAPTCHA

The other layers in this stack barely touch a threat class that will almost certainly hit a public legacy app: opportunistic bot abuse on endpoints that have to be unauthenticated — spam signups, credential-stuffing runs against your login form, mass scraping, automated coupon abuse, password-reset floods that trigger emails at scale.

The app’s own authentication doesn’t help here — signup, login, and password-reset forms have to be reachable by anyone. WAF rate limiting helps, but a distributed botnet making one polite-looking request per IP looks identical to a wave of legitimate signups. The layer that actually distinguishes humans from bots is a bot-detection or CAPTCHA system.

Bot protection is a spectrum rather than a single tool:

  1. Passive classifiers — ML models that score every request based on signals (TLS fingerprint, header order, timing, behavioral telemetry) with no user-visible challenge. Lowest friction, works best against volumetric abuse.
  2. Score-based CAPTCHA — a classifier plus an escalation to a visible challenge only when the score is low. The hybrid most modern systems use.
  3. Always-visible CAPTCHA — the classic “click all the traffic lights.” High friction, usually the wrong default now, still useful for specific high-value endpoints.

The three subsections below cover two of those spectrum categories — score-based CAPTCHA (Turnstile, reCAPTCHA Enterprise) and passive classifier (Cloud Armor Adaptive Protection, Cloudflare Bot Management). They’re complementary rather than alternatives:

SubsectionSpectrum categoryClient-side?Best for
Cloudflare TurnstileScore-based CAPTCHAYes — JS widget on each formEasy wins on signup/login forms
reCAPTCHA EnterpriseScore-based CAPTCHAYes — JS widget on each formDeeper GCP integration, edge enforcement via Cloud Armor
Classifier-only detectionPassive classifierNo — server-side signals onlyVolumetric abuse, traffic where you can’t or won’t add a widget

The defining difference is whether you have to add a widget. Turnstile and reCAPTCHA Enterprise both require a JS snippet on each form — the widget collects browser-level signals (timing, behavioral telemetry, JS challenges) and submits a token your server verifies. Classifier-only tools evaluate every request using only what the network already sees: TLS fingerprint, HTTP header order, IP reputation, volumetric anomalies. Nothing to embed, no app code change. They stack naturally: the classifier covers all incoming traffic as a blanket layer; the CAPTCHA targets the high-value forms specifically.

We’ll start with the easiest to set up — Cloudflare Turnstile — then look at the deeper GCP-native combination of reCAPTCHA Enterprise plus Cloud Armor Adaptive Protection for teams that need more.

Cloudflare Turnstile (the easiest path)

If you only do one thing today, this is it. Cloudflare Turnstile is free, invisible by default, privacy-friendly (no behavioral fingerprinting the way reCAPTCHA does), and drop-in compatible with the reCAPTCHA v2 API — so you can migrate later without rewriting any form code if you outgrow it.

The setup is genuinely an afternoon’s work and you don’t need to host the rest of your app on Cloudflare — see the Turnstile getting-started docs for setup details. The setup doesn’t require any DNS change, Terraform, or integration with your WAF.

Turnstile stops being enough once you need finer control — per-request scores you can act on (not just pass/fail), enforcement at the load balancer instead of in app code, or rules more nuanced than “challenge passed / failed.” That’s the gap the reCAPTCHA Enterprise + Cloud Armor combination below fills.

reCAPTCHA Enterprise (deeper GCP integration)

reCAPTCHA Enterprise is Google’s managed bot-detection service — a descendent of classic reCAPTCHA, rebuilt as a score-based API rather than a pass/fail challenge. Every request gets a score from 0.0 (almost certainly a bot) to 1.0 (almost certainly human), plus a reason list (AUTOMATION, UNEXPECTED_USAGE_PATTERNS, LOW_CONFIDENCE, etc.). You decide what to do with it.

Two things make it the natural pick for a GCP-based stack:

  1. Invisible by default. Unlike the old reCAPTCHA v2 “click all the traffic lights” UI, Enterprise usually runs silently in the background. Users see no friction unless the score is low, at which point you can step up to a visible challenge.
  2. It integrates with Cloud Armor. Cloud Armor can evaluate a reCAPTCHA token as part of a security-policy rule expression, so you can enforce bot scores at the load balancer instead of in the app:
rule {
  action   = "deny(403)"
  priority = 500
  match {
    expr {
      expression = "token.recaptcha_action_token.valid && token.recaptcha_action_token.score < 0.5"
    }
  }
  description = "Block low-score bot traffic to sensitive endpoints"
}

The integration means your app code embeds the reCAPTCHA JS widget on forms, submits the resulting token with each request, and Cloud Armor does the score check before the request reaches the backend. The app itself doesn’t need to know whether the request is trusted.

There’s also a WAF-proper mode in Cloud Armor called bot management actions (redirect, googleRecaptcha), which issues a reCAPTCHA challenge inline at the edge — useful for the “suspicious traffic spike, challenge everyone until it subsides” scenario.

Google folded reCAPTCHA Enterprise into a broader Cloud Fraud Defense platform that adds AI-agent classification, an agentic policy engine, and AI-resistant challenges. Existing reCAPTCHA Enterprise integrations continue to work unchanged — Fraud Defense extends rather than replaces, with no migration required and no pricing change. Worth tracking if AI-agent traffic becomes a meaningful share of what hits your forms.

Classifier-only detection (no client-side integration)

The faster layer — no widget, no form embed, nothing to change in the app — is a pure classifier that evaluates requests at the edge using signals the network already sees: TLS fingerprint (JA3/JA4), HTTP header order and casing, request timing, IP reputation, volumetric anomalies. The model is an ML bot/human classifier; the request either gets through or doesn’t.

  • Cloud Armor Adaptive Protection is the GCP-native option. It’s an ML layer inside Cloud Armor that watches traffic to each of your backend services, detects volumetric L7 attacks and credential-stuffing / scraping patterns that don’t match any single WAF rule, and auto-generates a candidate Cloud Armor rule you can deploy (or have it auto-deploy) to block the offending traffic. Continuous, no app integration required, no user-visible friction. Downside: it’s pattern-based on traffic anomalies, so it catches volumetric abuse far better than low-and-slow attacks from a single determined actor.
  • Cloudflare Bot Management is the equivalent on Cloudflare — ML classifier on every request, scores from 1 (bot) to 99 (human), configurable rules on the score. Sits at the same layer as Cloudflare WAF.
  • Commercial platforms for sophisticated abuse: DataDome, HUMAN Security (formerly PerimeterX), Kasada. These combine server-side classifiers with client-side JS probes and are the right tier when you’re being specifically targeted (ticket scalping, account takeover at scale, price scraping from a competitor).

The right structure for most apps is classifier first, CAPTCHA as backstop: let the passive model handle the majority of traffic silently, and only escalate to a visible challenge for requests the classifier is uncertain about. reCAPTCHA Enterprise does this natively; combining Adaptive Protection (volumetric layer) with reCAPTCHA Enterprise (per-request-scored, escalation-capable) gives you both the edge filter and the challenge fallback.

Other alternatives

  • hCaptcha — reCAPTCHA v2 replacement, similar UI, more privacy-focused, pays websites for solved challenges. Used by Cloudflare before they built Turnstile.
  • reCAPTCHA v2 / v3 — the free, classic version (not Enterprise). Still widely used, but increasingly avoided for accessibility and privacy reasons, and lacks the Cloud Armor integration.
  • Arkose Labs — commercial, enterprise-grade, game-like challenges that are very hard to solve via solver-as-a-service farms. Expensive. Pick this only if your app is being targeted by sophisticated, determined bot operators.

CAPTCHAs are a quick and relatively cheap mechanism to filter the bulk of opportunistic bot traffic — most automated abuse is undirected, gives up the moment it hits a challenge, and any production app running a signup form without one is getting a non-trivial share of its signups from bots. They aren’t a complete solution though: a determined attacker can pay CAPTCHA-solving services at roughly $1–3 per 1000 challenges, enough to deter opportunistic abuse but not a targeted campaign, and visible challenges add friction, hurt conversion, and are genuinely hostile to some users accessibility-wise.

So they’re necessary, not sufficient. The practical approach:

  • Scope them to high-value, abuse-prone endpoints — signup, login, password reset, contact forms, coupon redemption, anywhere the app sends email or creates a resource. Not to read-only pages.
  • Score-based integration (reCAPTCHA Enterprise, Turnstile) is vastly better than classic pass/fail challenges for UX — most real users never see the challenge.
  • Pair with rate limiting, not in place of it. A CAPTCHA at the login form and aggressive per-IP rate limiting together are far stronger than either alone.
  • Don’t CAPTCHA your internal admin pages. Put them behind authentication and IP allowlists instead — bot protection is for forms that have to stay public.

Beyond the edge — dependency scanning with OSV-Scanner

The previous layers either harden the edge (WAF, bot protection) or clean up the repo (secrets). None of them touch the known-vulnerable libraries already running inside your app — which is where most legacy exploitation actually starts. This is what dependency scanning fixes.

Why OSV-Scanner

OSV-Scanner is Google’s open-source CLI built on top of the OSV (Open Source Vulnerabilities) database. OSV is the vulnerability data source Dependabot, GitHub Security Advisories, GCP Container Analysis, and several other scanners query under the hood. It is a normalized, machine-readable schema aggregating advisories from npm, PyPI, Go, Cargo, RubyGems, Maven, Packagist, NuGet, Debian/Alpine/Ubuntu packages, GitHub Actions, and more.

It reads common lock files (package-lock.json, yarn.lock, Pipfile.lock, etc.), resolves each package at its exact pinned version, and reports vulnerabilities with CVE IDs, severity, and (where OSV has it) the version range that fixes them.

Running a scan and reading the output

OSV-Scanner can be installed via Go, Homebrew, a prebuilt binary, or — same as gitleaks above — run via Docker. We’ll show Docker for consistency.

I just ran this against the repo this article lives in:

docker run --rm -v "$(pwd):/src" ghcr.io/google/osv-scanner:latest \
    scan source --recursive /src

--recursive walks every directory under /src and finds lockfiles in nested projects (the blog, several demos, the server, draft experiments). Without it, the scanner only inspects the top level — fine for a single-project repo, but most real ones have lockfiles spread across several directories.

The summary at the end of the run looks like this:

Total 35 packages affected by 59 known vulnerabilities (2 Critical, 17 High, 37 Medium, 3 Low, 0 Unknown) from 2 ecosystems.
59 vulnerabilities can be fixed.

The full output is a table with one row per CVE × package × source-of-truth. A few representative rows:

OSV URLCVSSEcosystemPackageVersionFixed versionSource
GHSA-xq3m-2v4x-88gg9.4npmprotobufjs7.5.47.5.5demos/transformers-js-demo/package-lock.json
GHSA-p9ff-h696-f5838.2npmvite7.3.17.3.2blog/package-lock.json
PYSEC-2025-407.5PyPItransformers4.48.34.49.0blog/…/from-scratch/requirements.txt
GHSA-r5fr-rjxr-66jc8.1npmlodash4.17.234.18.0server/package-lock.json

Six fields per row, all of them useful:

  • OSV URL — links straight to the advisory at osv.dev (CVE ID, attack vector, affected version range, references).
  • CVSS — severity score on the 0–10 scale. 9+ is Critical, 7–8.9 is High, 4–6.9 is Medium.
  • Ecosystem — which package registry the dependency comes from. The same package name can exist in multiple ecosystems and get different CVEs.
  • Package and Version — what’s currently pinned in the lockfile.
  • Fixed version — the lowest version that resolves the issue. Often a patch upgrade; sometimes a minor or major.
  • Source — which lockfile the finding came from. Critical for monorepos: the same package can be pinned at different versions in different sub-projects, which is exactly what we see for vite, protobufjs, and picomatch in the actual scan.

Notice that the summary breaks the count two ways: 59 known vulnerabilities total, and 59 vulnerabilities can be fixed — meaning each one has a known upgrade target. In this scan they happen to match, which isn’t always the case.

The Fixed version column tells you whether there is an upgrade path for each row. If it’s populated, the scanner has already done the research; the remaining work is purely the upgrade — bump, test, ship. If it’s empty, you’ve hit a finding without a known fix yet, and the response is different. The next section walks through how to triage both kinds.

The scanner can also target a specific lockfile (--lockfile=/path/to/lockfile), an SBOM in SPDX or CycloneDX format (--sbom=/path/to/sbom.json), or a container image (osv-scanner scan image my-app:latest) — image scanning needs the local-CLI install rather than Docker, since the scanner needs direct filesystem access to the image being inspected.

What to do with the findings

Run it once on a legacy app and you’ll almost certainly get a wall of findings — easily dozens, sometimes hundreds. For each finding, two questions decide what to do: how serious is it, and is there a fix available.

Fixable findings. The bulk of what a scan turns up usually falls here, because actively-maintained packages get patched. Triage by severity:

  1. Critical/High with a known public exploit — patch first, regardless of whether you think you use that codepath. Public exploitation doesn’t care about your mental model of the code.
  2. Critical/High without a known exploit — patch on a normal cadence (next sprint).
  3. Medium/Low — bump on the next routine upgrade cycle. Don’t let these block the build or you’ll never get a green CI run on a legacy app.

For all of these, the work is the same shape: bump the pinned version to the Fixed version, run your tests, ship. Tools like Dependabot and Renovate (covered below) automate the PR creation step.

Unfixable findings. These need different handling and there are usually three flavors:

  1. No patch released yet — the vulnerability was disclosed but the maintainer hasn’t shipped a fix. Track it, re-scan periodically, and watch the OSV advisory page for updates.
  2. Unmaintained package — the project is dead. Either replace the dependency, or live with it consciously and add a documented suppression in osv-scanner.toml so the scan doesn’t re-flag it on every CI run.
  3. Transitive dependency you can’t directly upgrade — a vulnerable package is pulled in by something else that hasn’t bumped to a newer parent version. This is exactly the case where OSV-Scanner’s output pairs with the WAF: if there’s no upgrade path and the vulnerability has a known exploit pattern, write a WAF rule blocking that pattern as a virtual patch until the upstream fix lands.

The split matters operationally: fixable findings are work for your build pipeline (Dependabot PRs, automated upgrades). Unfixable findings are work for a human - deciding to suppress, replace, or virtually patch — and they’re the ones worth watching for, because they don’t go away on their own.

Alternatives and when to pick each

  • GCP Artifact Analysis — the GCP-native option, and probably the lowest-friction scanner you can add if you’re already on GCP. Artifact Registry auto-scans images on push using the OSV database (same data source as OSV-Scanner above), and continuous analysis keeps rescanning stored images as new vulnerabilities land in OSV — so you find out about newly-disclosed CVEs in images you shipped months ago, without re-pushing. Findings show up in the Artifact Registry UI and as Pub/Sub notifications. No extra tool to operate.
  • Trivy — a genuine superset if you want one tool. Scans containers, filesystems, git repos, Kubernetes configs, IaC (Terraform/CloudFormation), and secrets. Pick Trivy if you also need container and IaC scanning and want a single operational tool that runs anywhere.
  • Dependabot — if you’re on GitHub, it’s already wired up. Opens PRs to upgrade vulnerable dependencies automatically. Use it alongside OSV-Scanner: Dependabot for the auto-PRs, OSV-Scanner in CI for gating the build on severity.

A reasonable minimal stack: OSV-Scanner in CI (gates the build on high-severity new findings) + Dependabot (continuously opens upgrade PRs). If you already ship containers, either add Trivy for image scanning, or — if you push to a cloud registry — lean on the registry’s built-in scanner (Artifact Analysis on GCP, ECR + Inspector on AWS) rather than operating another tool.

Scanners aren’t supply-chain hygiene

One important gap worth naming: scanners catch known CVEs in pinned dependencies. They don’t catch supply-chain attacks — malicious packages, compromised maintainer accounts, typosquatted names (lodsh vs lodash), post-install scripts that exfiltrate credentials. These are a different class of threat and require different defenses: lockfile enforcement, disabling install-time lifecycle scripts, reviewing new dependencies before adding them, 2FA on package-registry accounts.

The OWASP NPM Security Cheat Sheet is a useful concrete checklist for this if your stack is Node.js — covering --ignore-scripts, npm audit, scoped packages, typosquat-spotting, and more. OWASP’s Vulnerable Dependency Management Cheat Sheet covers the same ground language-agnostically. Both pair well with the scanner-based layer above: scanners for known-bad, hygiene for unknown-bad.

What this stack covers, and what could go further

If you deploy the three core layers plus dependency scanning, what have you actually bought yourself?

Reasonably well covered:

  • Opportunistic automated scanners and mass exploitation campaigns.
  • Known CVEs in dependencies — as long as you patch them when findings come in.
  • Leaked credentials in the repo (current and historical).
  • SQL injection, XSS, path traversal, and command injection at request time — blocked by the WAF before they reach the app.
  • The obvious, volume-based layer of bot abuse on public forms.

Still missed:

  • Vulnerable patterns inside your own code. The WAF blocks payloads at request time but doesn’t help you find or fix the underlying vulnerable code. SAST and DAST below are how you start to.
  • App-logic flaws — anything that requires knowing what the app is supposed to do. IDOR (a request to /orders/1234 when the user should only see 1233), broken authentication or session management, missing or inconsistent authorization checks, business-logic bugs like skipping the payment step by replaying a cart-checkout request. None of these look malicious to a scanner.
  • Novel zero-days in the app or its dependencies, before signatures exist.
  • Determined adversaries. Insiders abusing legitimate access. Well-resourced bot operators who pay CAPTCHA-solving services and rotate IPs faster than you can blocklist them.

What to add next: SAST and DAST

Once the three core layers and dependency scanning are in place, the natural next step is analyzing the code itself rather than only the traffic flowing through it. Two complementary techniques:

  • SAST (Static Application Security Testing) reads the source statically and pattern-matches for risky constructs — SQL queries built with string concatenation, missing authorization checks, shell commands assembled from user input, unvalidated redirects, hardcoded credentials. Semgrep is the open-source default; community-maintained rulesets like p/security-audit and p/owasp-top-ten give you coverage across most languages without the setup of CodeQL or SonarQube. Run it in CI and gate the build on high-severity findings.
  • DAST (Dynamic Application Security Testing) does the inverse — it runs the app and probes it from the outside, crawling routes and trying real attack payloads. Powerful for a black-box app precisely because it needs no source code; the scanner is a synthetic attacker. OWASP ZAP is the open-source default, with a passive baseline scan that’s safe against staging or production, and a destructive full active scan that should only ever run against staging with throwaway data. Burp Suite is the commercial industry standard; Nuclei complements either with fast template-based CVE-pattern coverage.

A reasonable extension stack: Semgrep on every PR, ZAP baseline on every deploy to staging, ZAP full authenticated scan on a nightly or weekly schedule. Authenticated scanning matters disproportionately for DAST on a legacy app — an unauthenticated scan only sees the login page; the interesting endpoints are all behind it.