Posted 2026-05-20 · ~10 min read

CASCADE and the LLM-deobfuscator question

In July 2025 a Google research team published CASCADE: LLM-Powered JavaScript Deobfuscator at Google. The paper is operational, not theoretical — CASCADE runs in Google's environment, combining Gemini with a JavaScript Intermediate Representation (JSIR) to deobfuscate JavaScript at scale. If you sell JavaScript protection in 2026, you owe customers a straight answer about how the protected output holds up.

This post is that answer. We'll walk through what CASCADE actually does, where the "per-build polymorphism" framing common in our category (including some of JSO's own marketing) is honest and where it isn't, and what's on the JSO roadmap to address the attack class CASCADE represents.

What CASCADE does

The paper's core observation: every JavaScript obfuscator emits a small amount of scaffolding — the string-array initializer, the decoder function, the control-flow flattening dispatcher, the dead-code marker. The authors call these prelude functions. Once you identify which functions in the obfuscated output are prelude versus business logic, the remaining work is a tractable compiler problem: symbolically execute the prelude, recover the decoded strings and the original control flow, write the result back as IR.

Prior deobfuscators tried to detect preludes via "hundreds to thousands of hardcoded rules" hand-written for each known obfuscator vendor. CASCADE replaces that rule pile with an LLM. Gemini reads the obfuscated source and tags candidate prelude functions; the JSIR pass then does the deterministic part. Two complementary strengths: pattern recognition where rules don't generalize, and compiler-correctness where pattern recognition is sloppy.

The contribution is the architecture. The result is that the moat once provided by polymorphic prelude shapes — "every build has a different decoder, so pattern-matching deobfuscators can't keep up" — is dramatically smaller than it was in 2023. An LLM that has seen enough builds learns the shape category, not the specific build.

Where this lands honest claims and dishonest claims

Almost every commercial JavaScript obfuscator — ours included — markets some flavour of "every build is different, so LLMs can't deobfuscate it." That claim has two interpretations, only one of which survives CASCADE.

Honest version: per-build polymorphism raises the cost of a one-shot deobfuscation. An attacker who solves Build A's decoder by hand cannot copy-paste that work onto Build B. This is still true after CASCADE. What CASCADE shows is that the cost reduction from learning the category is enormous; the attacker no longer has to do per-build work because the LLM has internalized the category.

Dishonest version: "polymorphism makes the output unreadable to AI." It doesn't. CASCADE reads it, identifies the prelude class, and hands the IR pass a tractable transformation problem. Anybody whose marketing leans on this claim is selling something that the published research already contradicts.

JSO's Maximum-mode output absolutely benefits from per-build polymorphism — that's why we built it. But against a CASCADE-class attacker the right claim is "polymorphism slows them down by some constant factor", not "polymorphism stops them". The first is true and useful; the second is false and we shouldn't say it.

What actually resists the CASCADE attack class

The CASCADE attack has a structural prerequisite: the obfuscator's protection must be reducible to a small number of identifiable prelude functions plus a body of code that, once preludes are inverted, looks more or less like normal JavaScript again. Three lines of defense erode that prerequisite:

1. Make the prelude / body boundary fuzzy

A clean prelude function whose only job is to set up a string array is easy for an LLM to label. A prelude whose work is interleaved with real business logic — the string-array decoder also computes a value the rest of the code depends on, or the control-flow dispatcher's state machine encodes part of the application's domain state — is much harder. The LLM either has to label the whole program "prelude" (the IR pass then has nothing to remove) or labels nothing as prelude (the IR pass has nothing to invert).

This is a real research direction, not just marketing. JSO's current FlatTransform + DeepObfuscate already partially interleaves the two; making the interleaving stronger is in scope.

2. Make the decoder non-extractable

CASCADE assumes the prelude can be symbolically executed once it's identified. A decoder that reads from the DOM, the current URL, a fingerprint of the runtime environment, or a server-issued nonce cannot be symbolically executed without the surrounding context. The deobfuscator gets the prelude's shape but not its output.

JSO's Runtime Defense already exposes the building blocks: RuntimeFingerprint, RuntimeChallengeSecret, RuntimeSessionToken. Wire your sensitive code paths so they decode through one of these and a CASCADE-style attacker has nothing to symbolically execute. The runtime values can still be captured by an attacker controlling the live environment — this isn't a replacement for the assumption "attacker has access to a running instance" — but it does defeat the offline-deobfuscation scenario CASCADE optimizes for.

3. Replace the decoder with bytecode

The strongest defense is the one that doesn't have a recognizable prelude at all. A virtualized function has been compiled to a per-build bytecode and ships with a per-build interpreter. The LLM can identify the interpreter — that's just a loop with a switch — but inverting it requires understanding the opcode encoding, which is itself polymorphic per build. The compiler-IR pass that worked against a string-array decoder doesn't help here because the protected behaviour isn't a pure transformation of static data; it's a custom-instruction-set execution.

This is exactly the threat model that motivated JSO's VM bytecode protection beta. The roadmap commitment was always to ship it for Corporate-tier customers who have a small number of high-value functions (license validation, anti-tamper checks, key derivation) where the per-function overhead is acceptable. CASCADE is the published research that justifies the priority.

Hot-path versus cold-path protection

None of the above is free. Interleaved preludes cost more bytes; non-extractable decoders cost a runtime round-trip on the first call; VM bytecode costs an order-of-magnitude slowdown. CASCADE doesn't change the cost calculus, but it does change which code paths deserve the expensive treatment.

  • Hot path (rendering loops, animation tick handlers, anything in the critical request path): Maximum-mode polymorphic transforms are still the right answer. CASCADE can probably deobfuscate them given enough effort; but if the code is hot, it's hot because it's high-volume and low-secret-density, which means the cost of an attacker successfully reading it is lower than the cost of slowing every customer's app down to protect it.
  • Cold path (license checks, anti-tamper, key derivation, watermarking, payment-flow guards): This is where VM bytecode earns its overhead. A function called twice per session can absorb a 50× slowdown invisibly, and these are the functions whose secret-density is highest.

If your team is on JSO and concerned about CASCADE-class threats, the move is: keep Maximum mode on for everything, mark the small set of high-value functions for VM virtualization, and wire the runtime-defense locks (SessionToken, Fingerprint) into the boot-time paths so an offline deobfuscator can't symbolically execute them.

What about the legitimate use-cases for CASCADE itself?

The paper's framing is squarely defensive: CASCADE supports security analysis — malware deobfuscation, supply-chain auditing, reverse-engineering for vulnerability research. JSO ships protection that any defender's CASCADE-class tool could plausibly target, and that's fine. The presence of well-resourced deobfuscators is a feature of the ecosystem we work in, not a betrayal of it.

What we owe customers is honest framing about which threats their protection actually blunts and which it doesn't. A determined attacker with Google-scale tooling will eventually read your bundle; protection's job is to make that read expensive enough that the attacker spends their time elsewhere. CASCADE moves the goalposts on what "expensive enough" means, and the response is to upgrade defenses where the asset value justifies it.

Roadmap impact

Three line items the CASCADE paper has bumped up our priority list:

  1. VM bytecode protection — already in beta, moving toward general availability for Corporate+ tier in 2026-Q3. Docs.
  2. Prelude-interleaving transform — new research item: deliberately merge string-array decoders and control-flow dispatchers with surrounding business logic so the LLM-identification step has nothing clean to label. Tracking the implementation difficulty; no commit date yet.
  3. Symbolic-execution detection at runtime — new research item: detect when the protected code is being executed in a non-browser symbolic-evaluator context and refuse to decode. Adjacent to DetectHeadlessBrowser; would extend it.

All three appear in the public roadmap under the Research section. If your security model puts you in the CASCADE-target population, please email [email protected] — the threat-model conversations directly shape this roadmap.

References: Yan, Sherzod, et al. "CASCADE: LLM Powered JavaScript Deobfuscator at Google." arXiv preprint arXiv:2507.17691, 2025.