Reversing JavaScript control-flow flattening
Why obfuscators flatten control flow, the canonical switch-dispatch pattern, and the compiler trick that collapses it back into readable code.
Open any aggressively obfuscated JavaScript sample written in the last decade and you will probably see the same shape: a while (true) loop wrapped around a giant switch statement, with case labels driven by an integer dispatch variable that gets reassigned in almost every branch. That is control-flow flattening (CFF), and it is the single most common reason a function that “obviously does something simple” looks like an absolute mess.
This post walks through what CFF actually does to a function, why it works against analysts even though it is invisible to the runtime, and the compiler trick that collapses it back into readable code. The trick is not a clever pattern matcher. It is just constant propagation.
The switch dispatch pattern
A normal function has a control-flow graph (CFG) that mirrors its source: branches go where the syntax says they go. CFF rewrites that graph so every basic block becomes a labelled case in one giant switch, and the only branching mechanism is a single integer variable that the dispatcher reads on each iteration.
Take this trivial function:
function signOf(x) {
if (x > 0) return 1;
if (x < 0) return -1;
return 0;
}
After CFF, it looks something like:
function signOf(x) {
var state = 0;
var result;
while (true) {
switch (state) {
case 0:
if (x > 0) { state = 1; break; }
state = 2; break;
case 1:
result = 1;
state = 5; break;
case 2:
if (x < 0) { state = 3; break; }
state = 4; break;
case 3:
result = -1;
state = 5; break;
case 4:
result = 0;
state = 5; break;
case 5:
return result;
}
}
}
The runtime walks exactly the same paths. The execution trace is identical. But the source no longer has any local indication of which branch leads where. To follow signOf(-7) mentally you have to thread through six case labels and keep state in your head.
Real samples make it worse. Production CFF passes:
- Permute case labels (so
state = 1does not mean “the first thing”) - Mix dispatch updates with unrelated assignments
- Add unreachable cases as decoys
- Use string keys or hashed integers instead of small ints
- Combine with opaque predicates that look like real branches
The unifying property is always the same: a single dispatch variable encodes the original CFG, and the rest is dressing.
Why it works on humans, not the runtime
The runtime does not care. The optimizer in V8 will inline this function, fold the dispatch variable when x is a constant, and produce machine code that does not contain a switch at all. The cost of flattening at runtime is roughly zero.
The cost on an analyst is everything. The flattened form defeats the two cheapest forms of program understanding:
- Linear reading. You cannot read top to bottom and follow the logic, because the next basic block depends on a variable rather than on lexical adjacency.
- AST-driven tooling. A pretty-printer or AST simplifier sees a
while-switchand stops. The structure it preserves is the wrong structure. The interesting structure is the implicit graph encoded instate.
CFF is a cheap obfuscation in production cost and an expensive one in analyst cost. That asymmetry is why it appears in essentially every commercial JS obfuscator and most malware loaders.
The reversal trick
The interesting observation is that the dispatch variable contains all the information you need to recover the original CFG, and you can recover it without any pattern matching on the obfuscator.
In the flattened form, each case ends by writing some constant value to state and then breaking out of the switch. That write is a transition edge in the original CFG. If you can statically determine, for every case, what values state can take when control leaves that case, you have reconstructed the original control flow.
That determination is exactly what a constant-propagation pass on an SSA representation does:
- Lift the function into SSA. Each definition of
statebecomes a unique versioned variable (state_1,state_2, and so on). - For every case, the dispatch variable at the end of the case is either a constant or a phi node selecting between constants. Both are trivial to fold.
- Use the folded values as edges. The graph you have just drawn is the original CFG, with one edge per
case-to-casetransition. - Drop the switch wrapper, restructure the basic blocks back into nested control flow (
if,while,return), and emit JavaScript.
The output of that pipeline on the signOf example, with no obfuscator-specific code anywhere in the analyzer:
function signOf(x) {
if (x > 0) return 1;
if (x < 0) return -1;
return 0;
}
Identical to the original. The CFF pass was undone not by a “reverse CFF” routine but by a generic optimization pass that any compiler textbook would describe.
A small walkthrough
To make the mechanism concrete, here is the dispatch graph that falls out of the example above. Treat each case as a node. For each transition, write down which value state is set to before the break:
case 0writes1ifx > 0, otherwise2. Outgoing edges:{1, 2}.case 1writes5. Outgoing edges:{5}.case 2writes3ifx < 0, otherwise4. Outgoing edges:{3, 4}.case 3writes5. Outgoing edges:{5}.case 4writes5. Outgoing edges:{5}.case 5returns.
That graph is the original control flow. The case bodies are the original basic blocks with their dispatch updates stripped. Reassembling into structured code is a textbook problem (the inverse of the structured-programming theorem) and gives you back the if-if-return shape.
The work here lives entirely in tracking state. Everything else is bookkeeping.
Where it breaks down
The toy walkthrough makes it sound trivial. Real samples are harder for specific reasons, and these are where most naive deobfuscators give up:
Computed dispatch. If the obfuscator computes state with an arithmetic expression (state = (a ^ b) + 1), constant propagation can only resolve it when a and b are themselves constants on the relevant path. You need actual interprocedural value analysis, not just local folding.
Opaque predicates as dispatch updates. state = some_tautology ? 3 : 99 where 99 is never reachable. A constant-folding pass without an opaque-predicate solver will keep the 99 edge and produce a CFG with unreachable garbage.
Exception flow. If the function is wrapped in a try-catch and any case throws, control leaves the switch via the exception edge, which the dispatch variable does not encode. Your CFG reconstruction has to model exception edges as first-class to handle this.
Recursive flattening. Some obfuscators flatten twice: an inner flattened switch nested inside a case of the outer one, with the inner dispatch variable also encoded in the outer state. Reversal becomes a fixpoint: fold the outer, find the inner, fold the inner, refold the outer.
Inlined virtual machines. At the extreme, the dispatcher is no longer a switch but a function pointer table or a hand-rolled interpreter, with the “bytecode” stored as an array. At that point you are not unflattening any more, you are devirtualizing a VM, which deserves its own post.
Pattern-matching deobfuscators that only handle the canonical shape get stuck on any of these. A real compiler pipeline gets through the first three by default, because constant propagation, exception modelling, and CFG reconstruction are standard.
Why a general optimizer beats a CFF-specific tool
The narrow version of this post is “CFF is reversible because it does not actually hide anything from a value analysis.” The broader version is the same statement applied to almost every JavaScript obfuscation primitive:
- String array encoding hides nothing from constant propagation on the lookup function.
- Dead-code injection hides nothing from reachability analysis.
- Identifier mangling hides nothing from scope-aware renaming after SSA renumbering.
- VM-based protection hides nothing from symbolic execution of the interpreter loop.
Each of these can be defeated by a dedicated pass, and most public deobfuscators ship dedicated passes. The downside of dedicated passes is that they break the moment the obfuscator changes its surface form. A general-purpose IR with general-purpose optimization passes does not have that brittleness, because the passes do not look at the source. They look at the IR.
That is the architectural argument for treating deobfuscation as a compiler problem rather than a pattern-matching problem. CFF is the example most analysts know, but it is one instance of a more general pattern.
Where devirt.dev fits
devirt.dev is a JavaScript deobfuscator built on this idea. It lifts the sample into a custom SSA-based IR, runs the optimization passes described above to fixpoint, and emits readable JavaScript. CFF reversal is not a feature. It is what falls out when constant propagation runs on the flattened form.
If you analyze obfuscated JavaScript regularly and want early access, the waitlist is open at devirt.dev.
Further reading
- The Static Single Assignment book (Rastello, Tichadou). The reference for SSA construction and the passes that operate on it.
- obfuscator.io. The open-source obfuscator most “javascript-obfuscator” samples come from. Useful for generating CFF samples to test against.
- Synchrony. A pattern-matching deobfuscator targeted at obfuscator.io output. Good baseline for what the dedicated-pass approach can and cannot do.