febery/blog

in modern security operations, phishing triage remains one of the most resource-intensive bottlenecks. traditional security gateways and triage tools rely heavily on static indicators: domain age, sender reputation (spf/dkim/dmarc), ip address blacklists, and known file hashes.

while these controls are effective at stopping mass spam campaigns, they fail catastrophically against sophisticated, targeted attacks. attackers bypass them easily using compromised legitimate accounts, lookalike domains with perfect authentication headers, and highly customised social engineering tactics (like business email compromise and financial fraud).

to solve this human bottleneck, i built siren—an automated phishing forensics engine that moves beyond static rules to analyse the intent of the email using large language models, combined with automated technical enrichment.

here is a look at the architecture, the core engine logic, and the engineering journey of bringing siren to life as a corporate outlook add-in, including our recent architectural pivot to a zero-token client-side eml assembly model.

the core philosophy: intent over authentication

the primary differentiator of siren is its "no-mercy" ai forensics prompt design.

in a traditional triage workflow, if a security analyst sees that an email passed spf, dkim, and dmarc checks, they are far more likely to mark the message as safe. attackers know this and routinely compromise legitimate microsoft 365 or google workspace accounts to send phishing emails internally.

siren's ai forensic engine is designed to disregard authentication passes if social engineering or financial urgency triggers are detected.

graph TD
    A[suspicious email received] --> B[run osint & threat intel enrichment]
    B --> C[extract spf / dkim / dmarc status]
    B --> D[analyse body & context via azure openai]
    C --> E{authentication passed?}
    D --> F{social engineering detected?}
    E -- yes --> G{high risk intent?}
    F -- yes --> G
    G -- yes --> H[verdict: phishing (compromised legitimate account)]
    G -- no --> I[verdict: clean / suspicious]
    E -- no --> H

the semantic analyser focuses on finding:

pretexting & identity mismatches: the display name claims to be an executive, but the sender domain is external, or the conversational style deviates from standard corporate patterns.
financial urgency: urgent requests for bank details, invoice updates, or gift card purchases.
coerced actions: demands to click a link to "prevent account suspension" or view an "urgent shared document."

by shifting the verdict weighting from "who sent it" to "what do they want us to do," siren catches compromised account attacks that standard gateways miss.

architectural evolution: the zero-token pivot

the legacy token crisis

in its original design, the siren outlook add-in obtained a temporary microsoft exchange callback token (via ews or rest) and sent it to the backend. the backend then used that token to query exchange online directly and download the raw eml email file.

however, microsoft globally deprecated and disabled legacy ews/rest callback tokens in exchange online, throwing errorforbiddenclientaccesstokenrequest. to continue fetching emails backend-side, developers were forced into complex setups: setting up modern nested app authentication (naa), registering multi-tenant azure app registrations, managing client ids, and forcing microsoft 365 administrators through complex global admin consent approval portals.

this administrative overhead represents a major friction point. in many enterprise environments, getting security teams to approve and register a new azure app registration with broad read permissions can take weeks or months.

the solution: zero-token browser-based assembly

to bypass the oauth consent and token deprecation hurdles, i re-engineered siren's client-side taskpane to assemble the eml file directly inside the user's browser.

instead of asking the server to download the email, the browser-resident office js apis capture the headers, body, and attachment contents locally. because the user is already authenticated to their mail client, the local office js runtime has native, read-only permissions to retrieve these components without needing external network tokens or credentials.

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                                   outlook taskpane (browser)                            │
│                                                                                         │
│   ┌─────────────────────────────┐   ┌───────────────────────────┐   ┌─────────────────┐ │
│   │  getallinternetheadersasync │   │   body.getasync (html)    │   │ getattachment-  │ │
│   │                             │   │                           │   │  contentasync   │ │
│   └──────────────┬──────────────┘   └─────────────┬─────────────┘   └────────┬────────┘ │
│                  │                                │                          │          │
│                  └───────────────────────┐        │       ┌──────────────────┘          │
│                                          ▼        ▼       ▼                             │
│                                 ┌───────────────────────────┐                           │
│                                 │   assemblemime() compiler │                           │
│                                 └─────────────┬─────────────┘                           │
│                                               │                                         │
│                                               ▼                                         │
│                                     base64encodedemlpayload                             │
└───────────────────────────────────────────────┼─────────────────────────────────────────┘
                                                │ secure https post
                                                ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                                       siren backend                                     │
│  ┌──────────────────────────┐     ┌──────────────────────────┐                          │
│  │   domain whitelist check │     │   res.destroy() socket   │                          │
│  │    (licensed_domains)    ├────►│       termination        │                          │
│  └─────────────┬────────────┘     └──────────────────────────┘                          │
│                │ approved                                                               │
│                ▼                                                                        │
│  ┌──────────────────────────┐     ┌──────────────────────────┐                          │
│  │   virustotal / hibp      │     │    azure openai (llm)    │                          │
│  │      enrichment          │     │    semantic forensics    │                          │
│  └──────────────────────────┘     └──────────────────────────┘                          │
└─────────────────────────────────────────────────────────────────────────────────────────┘

how client-side eml assembly works

parallel extraction: the taskpane queries the local office runtime concurrently to gather:
- all smtp transport headers (office.context.mailbox.item.getallinternetheadersasync).
- the html message body (office.context.mailbox.item.body.getasync).
- binary data for all attachments (office.context.mailbox.item.getattachmentcontentasync).
mime compilation: a custom client-side compiler (assemblemime) parses the elements and constructs a fully compliant mime message (multipart/mixed or text/html) including custom boundaries, base64-encoded attachment streams, and standardised transport headers.
transmission: the resulting eml string is encoded to base64 and posted directly to /api/analyze along with the reporter's metadata.

this eliminates administrative integration entirely. since no enterprise registration is required, installing the add-in only requires uploading a simple manifest.xml file.

defensive engineering: the silent connection drop (`res.destroy()`)

a primary security challenge in a tokenless api architecture is endpoint authorisation. if the backend api accepts raw eml files without requiring an entra id oauth token, how do we prevent unauthorised third parties from scanning files using our api quota?

why http error codes fail

the conventional approach is to inspect the sender's email address and return a 401 unauthorised or 403 forbidden response. however, from a defensive security perspective, this approach is flawed:

information leakage: an http status code and response body (like { "error": "domain not licensed" }) tells an attacker exactly how our protection mechanics work and confirms that their request successfully hit our live backend framework.
resource consumption: generating response headers, formatting json payloads, and keeping tcp connections open under standard http flows consume precious cpu cycles and memory bandwidth, leaving the service vulnerable to denial-of-service attacks.

the silent drop protocol

to solve this, siren implements a silent drop protocol.

when a request hits /api/analyze, the server immediately extracts the identity of the reporter. it compares the reporter's domain against a secured, server-side whitelist.

if the domain is not authorised, the server does not return a graceful error. instead, it calls res.destroy() directly on the express response object. this immediately destroys the underlying tcp socket connection.

no http response headers are sent.
no response body is returned.
the connection is cut off mid-stream, sending zero bytes to the requestor.

to the attacker, the endpoint behaves like a dead socket or a misconfigured firewall, revealing absolutely nothing about the underlying api structure or validation logic. meanwhile, the legitimate browser client throws a generic, non-descript network connection error, ensuring a seamless fallback without leaking any architectural data.

hardening the gateway: cryptographic verification & anti-spoofing

to prevent attackers from simply spoofing the reporter's identity to bypass the domain check, the production gateway does not rely on plain-text identifiers alone.

instead, siren overlays the tokenless architecture with a zero-trust verification layer:

ephemeral payload signatures: the taskpane calculates a cryptographic signature of the outbound payload using a secure, client-side rotated handshake mechanism.
access control tokens: the request headers include a hardened, scoped client identifier (x-siren-client-signature) which is verified by the backend api gateway before processing the request.
behavioural rate limiting: the gateway enforces strict ip-based and domain-based throttling to detect and block abnormal request volumes, preventing bulk automation attempts even if a valid domain is claimed.

this combination ensures that while the deployment remains administrative-free (requiring no heavy azure ad enterprise app approvals), the api cannot be abused as an open proxy by external scanners.

core analysis pipeline

for authorised domains, the backend orchestrates a multi-threaded analysis flow.

1. reputation & threat intel enrichment

the backend extracts and analyses iocs (indicators of compromise) in parallel:

virustotal api: performs metadata reputation lookups on url links found in the email body and computes cryptographic file hashes of attachments to check against known malware definitions. importantly, siren does not "detonate" or follow links, preventing accidental trigger-based actions (like password resets or account deletions).
have i been pwned (hibp): conducts checks using the k-anonymity model. the system hashes the sender's email address and queries hibp using only the first five characters of the sha-1 hash. this ensures that the user's actual email address is never sent to external servers.

2. sandbox attachment rendering

when attachments are present, siren protects security analysts by preventing them from interacting directly with potentially malicious files. the backend utilises the cloudconvert api to open document types (like word, pdf, or excel files) inside an isolated sandbox, returning high-resolution png images of the documents back to the taskpane. this allows analysts to visually review the contents of an attachment safely without running local code.

3. llm intent & forensic summarisation

finally, the raw eml content and gathered reputation indicators are passed to a private, dedicated instance of the azure openai service (gpt-4). using a specialised system prompt, the model reviews the entire context and outputs:

phishing verdict: (clean, suspicious, phishing, or malicious).
phishing score: a numeric rating of risk severity.
intent analysis: explanations of any social engineering techniques detected.

operational impact

by transitioning to a tokenless client-side eml compiler and implementing silent socket drops for perimeter protection, siren achieves two critical enterprise design goals:

frictionless onboarding: administrators can deploy siren organisation-wide in minutes by uploading a single, generic manifest.xml file, avoiding complex enterprise app consent flows.
invisible security boundaries: attackers scanning our forensic gateways are cut off at the tcp level, preserving system resources and preventing information leakage.

building siren: re-engineering phishing triage with ai-powered semantic forensics