Building a deterministic-ish DFIR pipeline for the SANS AI hackathon

SANS recently announced their first hackathon for autonomous incident response — open to the community, build something that uses AI to figure out what the bad guys did. Good timing, because I've been deep in exactly this problem for months.

The core issue with LLMs in DFIR is non-determinism. Ask the same model the same question a hundred times and you'll get a hundred different reasoning paths, and sometimes a hundred different conclusions. That's a serious problem when your output might end up in front of a court, a regulator, or a board. "The AI thought so" is not a finding.

So how do we get useful work out of a non-deterministic tool in a field that demands reproducibility? My approach is two-fold.

Part one: an automated pipeline that does the boring bit. A collection of scripts that collect and parse artifacts — exactly what you'd do in a normal investigation — then hand the parsed output to the LLM, which decides which areas warrant a closer look. I've spent a lot of time refining both the detection scripts and the stage-specific prompts. The aim is for automation alone to surface 80–90% of the meaningful findings. For a lot of cases, that's enough to understand what happened.

Part two: turn the LLM loose with tools. Once the investigative prompts are dialled in, I point opencode at a set of MCP servers I've built and let the LLM run its own analysis. When it decides it's done, it diffs its findings against the automated pipeline report. That diff is where the interesting stuff lives — either the LLM has found something the scripts missed and can pivot to chase it down, or the union of the two gets us as close to "everything" as this kind of work ever does.

The setup. I'm running this entirely locally. The RTX 5090 handles fast orchestration with Qwen3.6 at around 150–200 tok/s, which keeps the analysis loop snappy. Reporting runs on the Spark cluster against Qwen3.5-397B. End to end, a Velociraptor triage image processes in about 15–20 minutes.

More to come as I push this further toward the hackathon.

J

Jeff Davies

// read next

Building a deterministic-ish DFIR pipeline for the SANS AI hackathon

Related

Inside Mirage2FA — Reverse-Engineering

Bluekit and the AI Impersonators: A Phishing Kit Hunt That Uncovered a Fraud Empire

EvilTokens