argus part 2: why a custom azure bot?
deciding on the right architecture for an enterprise orchestrator and why off-the-shelf automation wasn't enough.
when building an enterprise-grade orchestrator like argus, the first question is always: why not just use power automate or a logic app?
for simple notifications, those tools are excellent. but when you need to bridge high-fidelity security telemetry with complex itsm workflows and granular rbac, you quickly hit the ceiling of low-code environments.
the limits of low-code
i evaluated several existing paths before deciding on a custom azure bot:
- power automate: while easy to start, managing complex json schemas and bidirectional callbacks for 50+ different alert types becomes a maintenance nightmare.
- logic apps: better scalability, but the pricing model for high-frequency security polling can become unpredictable.
- third-party apps: most off-the-shelf bots are too rigid. they do not understand the specific relationship between a sentinel alert and a custom itsm ticket.
the case for a custom bot
i chose to build argus as a custom node.js bot running on azure app service for three core reasons:
- unlimited flexibility: by writing the logic in code, i can handle any inbound payload format and transform it exactly as needed.
- state management: a custom bot allows for sophisticated in-memory caching and persistent state tracking (via azure table storage), which is essential for "in-place" card updates.
- native integration: building on the microsoft teams sdk ensures that argus feels like a first-class citizen of the workspace, not just a webhook.
why azure?
since the majority of the security telemetry originates from the microsoft stack (sentinel, xdr, entra), keeping the orchestrator in azure was a pragmatic necessity. it allows for:
- seamless identity: using managed identities for secure access to graph api and azure storage.
- hardened secrets: integrating directly with azure key vault to manage api keys without hard-coding credentials.
- global reach: azure app service provides the reliability and low-latency response times required for global security operations.
in part 3, we will look under the hood at the event-driven engine that makes this all possible.