Back to thoughts
2 min read

argus part 2: why a custom azure bot?

deciding on the right architecture for an enterprise orchestrator and why off-the-shelf automation wasn't enough.

when building an enterprise-grade orchestrator like argus, the first question is always: why not just use power automate or a logic app?

for simple notifications, those tools are excellent. but when you need to bridge high-fidelity security telemetry with complex itsm workflows and granular rbac, you quickly hit the ceiling of low-code environments.

the limits of low-code

i evaluated several existing paths before deciding on a custom azure bot:

  1. power automate: while easy to start, managing complex json schemas and bidirectional callbacks for 50+ different alert types becomes a maintenance nightmare.
  2. logic apps: better scalability, but the pricing model for high-frequency security polling can become unpredictable.
  3. third-party apps: most off-the-shelf bots are too rigid. they do not understand the specific relationship between a sentinel alert and a custom itsm ticket.

the case for a custom bot

i chose to build argus as a custom node.js bot running on azure app service for three core reasons:

  • unlimited flexibility: by writing the logic in code, i can handle any inbound payload format and transform it exactly as needed.
  • state management: a custom bot allows for sophisticated in-memory caching and persistent state tracking (via azure table storage), which is essential for "in-place" card updates.
  • native integration: building on the microsoft teams sdk ensures that argus feels like a first-class citizen of the workspace, not just a webhook.

why azure?

since the majority of the security telemetry originates from the microsoft stack (sentinel, xdr, entra), keeping the orchestrator in azure was a pragmatic necessity. it allows for:

  • seamless identity: using managed identities for secure access to graph api and azure storage.
  • hardened secrets: integrating directly with azure key vault to manage api keys without hard-coding credentials.
  • global reach: azure app service provides the reliability and low-latency response times required for global security operations.

in part 3, we will look under the hood at the event-driven engine that makes this all possible.