Back to thoughts
3 min read

ai vs ai: decoding the deepfake

cloning a director to demonstrate the reality of social engineering at scale using modern conversational ai.

a few months ago, i hosted a webinar titled "ai vs ai: secops decoded." the goal was to move beyond the theoretical "scary ai" headlines and demonstrate exactly how easy it has become to weaponise commodity ai services for social engineering.

to make the point hit home, i did not use a generic robotic voice. i cloned my director, anna webb, and built a conversational ai that could impersonate her in three distinct, high-risk scenarios.

the technical stack

the scary part about this project is not the complexity, it is the simplicity. i built this using four off-the-shelf services integrated with a small python wrapper:

  • transcription: azure cognitive services for real-time speech-to-text.
  • intelligence: google gemini api to handle the conversational logic and stay "on script."
  • voice cloning: elevenlabs to create a near-perfect clone of anna's voice from just a few minutes of audio.
  • interface: a custom tkinter gui to swap personas on the fly.

three forms of deepfake

using this stack, i created three "personas" that demonstrated different vectors of attack:

  1. the helpdesk hack: impersonating anna as a frustrated employee who has lost her phone and needs urgent access to internal systems.
  2. the manager push: playing the role of a busy director who is bypassing standard procedures to "get things done" on a new, confidential project.
  3. the financial fraud: the most dangerous scenario—a supplier representative apologetically explaining a change in bank details for an urgent, five-figure invoice.

social engineering at scale

the "conversational" part of this project is what changes the game. this is not a pre-recorded audio file. the ai listens to the target, understands the context, and responds in real-time with the correct emotion, tone, and technical detail.

by combining llms with high-fidelity voice cloning, we have reached a point where the "human" element of security—the phone call, the quick chat, the voice note—can no longer be trusted blindly.

secops decoded

as security professionals, we have to fight ai with ai. manual detection of these deepfakes is becoming impossible. we need to pivot our defense strategies to focus on:

  • process over persona: never bypassing financial or access controls based on a voice, no matter how "senior" it sounds.
  • technical verification: implementing cryptographic verification and robust multi-factor authentication that does not rely on human interaction.
  • continuous monitoring: using ai-driven telemetry to identify patterns of irregular communication before the damage is done.

this project was a wake-up call for many who attended the webinar. it proved that the barrier to entry for high-tier social engineering is practically non-existent.

you can find the code for the demonstration here: adamfebery/conversational-ai