MBA Simplifier β€” Pipeline v5

Simplifies Mixed Boolean-Arithmetic (MBA) expressions used in code obfuscation. Combines BFS lookup over a pre-computed expression table, symbolic rewriting (40+ rules), and optional neural-guided search. Every output is oracle-verified correct.

Quick Start

git clone https://huggingface.co/<your-username>/mba-simplifier
cd mba-simplifier
pip install -r requirements.txt
python inference.py "(x & y) + (x | y)"

Expected output:

BFS: 18,189 entries (182ms)
Input:      (x & y) + (x | y)
Simplified: x + y
Reduction:  57% (bfs)
Latency:    186 ms (cold), ~2 ms (warm)

Python API

from inference import simplify

result = simplify("(x & y) + (x | y)")
print(result["simplified"])   # "x + y"
print(result["reduction"])    # 0.571
print(result["strategy"])     # "bfs"
print(result["latency_ms"])   # ~2 ms warm

# 32-bit mode (symbolic only β€” BFS table is 8-bit)
result = simplify("(x & y) + (x | y)", bits=32)
# β†’ "x + y"  (strategy: symbolic)

# With neural guidance
result = simplify("complex_expr", model_path="mba_classifier_v2.pt")

Return fields

Field Type Description
simplified str Simplified infix expression
original str Input expression
reduction float Tree-size reduction (0.0–1.0)
strategy str bfs / symbolic / neural / none
latency_ms float Wall-clock time in milliseconds

Input / Output Examples

Input Output Strategy
(x & y) + (x | y) x + y bfs
(x | y) - (x & y) x ^ y bfs
(x + y) - 2 * (x & y) x ^ y bfs
~(~x) x bfs
-(~x) - 1 x symbolic
~(~x & ~y) x | y symbolic
x & (x | y) x symbolic
x ^ x 0 bfs
x * 1 + y * 0 x bfs
(x & ~y) | (x & y) x bfs

Architecture

Three-layer pipeline (first hit wins):

  1. BFS lookup β€” pre-computed table of 18,189 minimal 8-bit expressions (loaded from bfs_table_6.json, ~180ms first call, ~2ms warm). Coverage: all 2-variable expressions up to AST size 6.

  2. Symbolic rewriting β€” 40+ algebraic rules covering MBA identities, De Morgan, absorption, two's complement, XOR-NOT, cancellation, constant folding. Runs to fixpoint (up to 30 passes). Works for 8-bit and 32-bit.

  3. Neural-guided beam search (optional, requires mba_classifier_v2.pt) β€” Transformer classifier predicts rewrite rules to try. Symbolic executor applies them; oracle verifies each candidate. Requires torch.

Oracle verification: every candidate result is verified by random sampling (512 points). The pipeline never returns a wrong answer β€” it falls back to the original if verification fails.

Benchmark Results

Method Correct% Reduced% Size reduction ms/expr
Pipeline v5 (ours) 100% 100% 77% 6.5ms
MBA-Blast (USENIX '21) 100% 100% 77% 15.8ms
SSPAM (Quarkslab) 100% 0% 0% 298.6ms
LLaMA-3.3-70B 93% 12% 16% 34.4ms

Dataset: MBA-Blast paper dataset, 62 expressions, 8-bit 2-var.

Files

File Description
inference.py Standalone inference β€” the only file you need
bfs_table_6.json Pre-computed BFS table, 18K entries (1.7MB)
mba_classifier_v2.pt Neural model, 2.2M params (8.6MB, optional)

Limitations

  • BFS table: 8-bit only, variables x and y only. Expressions with 3+ variables or 32-bit semantics fall through to symbolic rewriting.
  • Variables: up to 2 variables for BFS; symbolic handles more.
  • Bit width: 8 and 32 only.
  • Cold start: BFS table loads in 180ms on first call; subsequent calls are fast (2ms).
  • Neural model trained on 8-bit β€” may not generalize perfectly to 32-bit.

Requirements

torch>=1.13.0     # required for neural layer (optional otherwise)
numpy>=1.21.0     # required for oracle exhaustive check (8-bit)

Python 3.10+ required (structural pattern matching).

Downloads last month
67
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support