Back to thoughts
3 min read

siren: automated phishing forensics at scale

leveraging azure openai and serverless orchestration to transform manual phishing triage into automated semantic forensics.

phishing remains the most persistent vector for initial access in the modern enterprise. despite years of "awareness training," the technical reality is that attackers are getting better at bypasses, while soc teams are still stuck in the manual toil of checking headers, running manual reputation lookups, and squinting at obfuscated javascript.

i built siren to move phishing investigation from a manual chore to a high-performance automated workflow.

the analysis bottleneck

for most security operations centres, triage is a race against the clock. an employee reports a suspicious email, and the clock starts. the longer it takes to verify the threat, the higher the chance of a successful credential harvest or payload execution.

traditional tools focus on "indicator extraction"—pulling out the urls and the sender ip. but indicators change in seconds. what we actually need is semantic understanding. we need to know what the email is trying to do, not just where it came from.

introducing siren

siren is a high-performance forensic analyzer designed to automate the heavy lifting of phishing investigation. it is not just a parser; it is an orchestration layer that combines traditional technical analysis with modern conversational ai.

the goal was simple: take a raw .eml or .msg file and produce a comprehensive forensic report in seconds, without a human ever having to open a browser.

ai-powered semantic forensics

the core differentiator for siren is its integration with azure openai. while traditional tools rely on static regex patterns to identify "urgent" language or "bank" keywords, siren uses semantic analysis to understand intent.

by feeding the cleaned body content into a hardened llm prompt, siren can:

  • categorise the lure: identify if it is a credential harvest, a financial scam, or a delivery bait.
  • extract hidden context: find subtle social engineering cues that automated systems usually miss.
  • summarise for humans: provide a one-paragraph executive summary that tells a tier-1 analyst exactly why an email is dangerous.

safe detonation and previews

one of the highest-risk tasks in phishing analysis is dealing with attachments. siren mitigates this by using cloudconvert as a safe detonation layer.

instead of an analyst downloading a potentially malicious .pdf or .docx, siren triggers an automated conversion process that generates a safe, static image preview. you can see the content of the lure without ever touching the underlying binary. it is "look, don't touch" forensics.

the serverless engine

siren is built on a modern, event-driven architecture using node.js and azure functions. this choice was deliberate:

  1. scalability: phishing reports come in waves. a serverless backend scales to zero when quiet and scales out instantly during a massive campaign.
  2. cost-efficiency: you only pay for the seconds of compute used during the forensic analysis.
  3. security: each analysis runs in a stateless, ephemeral environment, reducing the risk of cross-contamination between different malicious samples.

cutting through the noise

as security professionals, our most valuable asset is time. every minute spent manually checking a virustotal report for a known-bad sender is a minute not spent hunting for the next 0-day.

siren is about reclaiming that time. it turns the "drudge work" of phishing triage into a high-fidelity, automated pipeline that lets engineers focus on the response, not the research.

you can find the project and the technical documentation here: adamfebery/siren