Bbarath.ai
All writing
Tutorials·3 min read

Threat modeling, faster — with an LLM in the loop

A practical pattern for using an LLM to bootstrap STRIDE without giving up the parts that need a human.

I've been threat-modeling features for over a decade. The thing I keep noticing is that most of the time spent in a threat-modeling session is on the parts an LLM is good at: enumerating threat categories, surfacing the obvious-but-easy-to-miss patterns, and producing the first draft of the writeup.

The part that actually requires a human — judgment about what matters in this system, with this trust boundary, with this deadline — is maybe 20% of the session.

So why am I still doing the other 80% by hand?

The pattern

Here's the loop that's been working for me:

  1. Feed the LLM the feature spec, the data flow, and the trust boundaries. Force it to restate them back. If it can't, the spec isn't good enough yet — and that's a finding.
  2. Ask for STRIDE-by-element, with concrete threats specific to this feature. Reject generic ones.
  3. Demand a likelihood/impact rating with reasoning, not just a label. The reasoning is where the LLM either earns trust or exposes a gap.
  4. Have it propose the smallest mitigation that meaningfully reduces risk for each. The word "smallest" matters — it forces specificity.
  5. Then I read it. Critically. The output is a draft, not a decision.

The first pass usually takes me ~10 minutes instead of 90. The second pass — the human review — is where the real work happens, and now I get to do it on a dense, structured starting point instead of a blank page.

The prompt

The version I use lives in the prompts libraryThreat-model a feature. It enforces a specific output shape, forbids generic threats, and ends with "open questions for product" — because half the value of threat modeling is exposing the questions nobody asked yet.

What still requires a human

Three things, in order of importance:

  1. Knowing what not to model. The LLM will happily generate threats for everything. Half of them aren't worth the cost of mitigation. Triage is human work.
  2. Trust boundaries. Where the boundary actually is — not where the diagram says it is — needs someone who's seen the system in production.
  3. Politics. "This is a P0 risk, ship date is Friday" is a conversation, not a prompt.

The LLM accelerates the parts that benefit from acceleration. It doesn't — and shouldn't — replace the judgment.

A note on guardrails

If you're sending real specs to an external model, treat them like the sensitive documents they are. Your threat-modeling prompt should never be the place a competitor learns about your unreleased feature. For most teams, this means a vetted endpoint, not a public chat UI.

Tagsthreat-modelingaiappsecprompts