MBA Simplifier β Pipeline v5
Simplifies Mixed Boolean-Arithmetic (MBA) expressions used in code obfuscation. Combines BFS lookup over a pre-computed expression table, symbolic rewriting (40+ rules), and optional neural-guided search. Every output is oracle-verified correct.
Quick Start
git clone https://huggingface.co/<your-username>/mba-simplifier
cd mba-simplifier
pip install -r requirements.txt
python inference.py "(x & y) + (x | y)"
Expected output:
BFS: 18,189 entries (182ms)
Input: (x & y) + (x | y)
Simplified: x + y
Reduction: 57% (bfs)
Latency: 186 ms (cold), ~2 ms (warm)
Python API
from inference import simplify
result = simplify("(x & y) + (x | y)")
print(result["simplified"]) # "x + y"
print(result["reduction"]) # 0.571
print(result["strategy"]) # "bfs"
print(result["latency_ms"]) # ~2 ms warm
# 32-bit mode (symbolic only β BFS table is 8-bit)
result = simplify("(x & y) + (x | y)", bits=32)
# β "x + y" (strategy: symbolic)
# With neural guidance
result = simplify("complex_expr", model_path="mba_classifier_v2.pt")
Return fields
| Field | Type | Description |
|---|---|---|
simplified |
str | Simplified infix expression |
original |
str | Input expression |
reduction |
float | Tree-size reduction (0.0β1.0) |
strategy |
str | bfs / symbolic / neural / none |
latency_ms |
float | Wall-clock time in milliseconds |
Input / Output Examples
| Input | Output | Strategy |
|---|---|---|
(x & y) + (x | y) |
x + y |
bfs |
(x | y) - (x & y) |
x ^ y |
bfs |
(x + y) - 2 * (x & y) |
x ^ y |
bfs |
~(~x) |
x |
bfs |
-(~x) - 1 |
x |
symbolic |
~(~x & ~y) |
x | y |
symbolic |
x & (x | y) |
x |
symbolic |
x ^ x |
0 |
bfs |
x * 1 + y * 0 |
x |
bfs |
(x & ~y) | (x & y) |
x |
bfs |
Architecture
Three-layer pipeline (first hit wins):
BFS lookup β pre-computed table of 18,189 minimal 8-bit expressions (loaded from
bfs_table_6.json, ~180ms first call, ~2ms warm). Coverage: all 2-variable expressions up to AST size 6.Symbolic rewriting β 40+ algebraic rules covering MBA identities, De Morgan, absorption, two's complement, XOR-NOT, cancellation, constant folding. Runs to fixpoint (up to 30 passes). Works for 8-bit and 32-bit.
Neural-guided beam search (optional, requires
mba_classifier_v2.pt) β Transformer classifier predicts rewrite rules to try. Symbolic executor applies them; oracle verifies each candidate. Requirestorch.
Oracle verification: every candidate result is verified by random sampling (512 points). The pipeline never returns a wrong answer β it falls back to the original if verification fails.
Benchmark Results
| Method | Correct% | Reduced% | Size reduction | ms/expr |
|---|---|---|---|---|
| Pipeline v5 (ours) | 100% | 100% | 77% | 6.5ms |
| MBA-Blast (USENIX '21) | 100% | 100% | 77% | 15.8ms |
| SSPAM (Quarkslab) | 100% | 0% | 0% | 298.6ms |
| LLaMA-3.3-70B | 93% | 12% | 16% | 34.4ms |
Dataset: MBA-Blast paper dataset, 62 expressions, 8-bit 2-var.
Files
| File | Description |
|---|---|
inference.py |
Standalone inference β the only file you need |
bfs_table_6.json |
Pre-computed BFS table, 18K entries (1.7MB) |
mba_classifier_v2.pt |
Neural model, 2.2M params (8.6MB, optional) |
Limitations
- BFS table: 8-bit only, variables
xandyonly. Expressions with 3+ variables or 32-bit semantics fall through to symbolic rewriting. - Variables: up to 2 variables for BFS; symbolic handles more.
- Bit width: 8 and 32 only.
- Cold start: BFS table loads in
180ms on first call; subsequent calls are fast (2ms). - Neural model trained on 8-bit β may not generalize perfectly to 32-bit.
Requirements
torch>=1.13.0 # required for neural layer (optional otherwise)
numpy>=1.21.0 # required for oracle exhaustive check (8-bit)
Python 3.10+ required (structural pattern matching).
- Downloads last month
- 67