Why Video Calls Need a Walkie-Talkie Mode • Stephen Van Tran

We have spent the better part of a decade trying to make digital communication feel “natural.” We’ve chased low-latency audio, high-definition video, and full-duplex sound engines that allow everyone to laugh, sigh, and interrupt simultaneously. The goal was to replicate the conference room. The result is a cacophony of “No, you go,” “Sorry, I think there’s a delay,” and the agonizing spectacle of three people starting a sentence at the exact same millisecond, followed by five seconds of apologetic silence.

We optimized for fidelity when we should have optimized for discipline.

The current state of video calling—whether you’re on Zoom, Teams, or Google Meet—is fundamentally broken not because of the technology, but because of the psychology it encourages. It encourages rambling. It encourages interruption. It encourages the lazy habit of thinking while speaking, rather than thinking before speaking.

My proposal is radical but necessary: video calling platforms need to implement a mandatory, hard-enforced “Walkie-Talkie Mode.” A mode where only one person can speak at a time, triggered by a physical button hold, and where the line is dead until they release it. We need to bring the discipline of aviation radio and military comms to the quarterly marketing sync. Because right now, nobody is hearing anything.

The Half-Duplex Discipline

To understand why we need this, we have to look at environments where miscommunication isn’t just annoying—it’s fatal.

In aviation and military operations, communication lines are often “half-duplex.” This means data flows in both directions, but only one way at a time. You push the button to talk (PTT), and while you are transmitting, you cannot receive. This technical constraint birthed a culture of extreme precision. When you know you have the floor, and you know you are blocking the channel for everyone else, the psychological weight of your words increases. You don’t ramble about the weather or preface your statement with three minutes of “I just kind of feel like maybe…” You state your call sign, your position, your intent, and you get off the frequency.

“Lima Charlie. Over.” (Loud and Clear. I’m done.)

Contrast this with your Tuesday morning standup. The “open mic” culture of modern software implies that every thought is valid and every interruption is part of the “jam session.” But most meetings aren’t jazz; they’re traffic jams.

By introducing a Walkie-Talkie feature, we artificially reintroduce the cost of speaking. It’s a friction feature. In product design, we usually talk about removing friction, but sometimes friction is the only thing keeping a system from devolving into entropy. If you have to physically hold a button to speak, you are constantly reminded that you are consuming a shared resource: the team’s attention. The moment you release that button, you are signaling, explicitly, “I have finished my thought. The floor is yours.”

It kills the “interrupter” dead. You physically cannot interrupt. You have to wait for the “Over.” This forces the listener to actually listen to the entire argument rather than preparing their rebuttal halfway through the second sentence.

Communication vs. Self-Expression

This brings me to a core philosophy that I believe is missing from modern discourse, both in the boardroom and in society at large. We have conflated communication with self-expression. They are not the same thing.

Self-expression is vomiting your internal state onto the world. It is the act of saying what you feel, what you think, and what you want, purely for the catharsis of having said it. It is ego-centric. “I feel this, therefore I say this.” Modern social media and unmuted Zoom calls are engines of self-expression. They encourage us to broadcast our identity and our stream of consciousness.

Communication is entirely different. Communication is the act of conveying an idea from your brain to another person’s brain with the specific motivation that they understand the concept. It is not about you. It is about them.

True communication is an act of empathy. To communicate effectively, you have to step out of your own mind and inhabit the mind of your listener. You have to ask: “What context do they lack? What biases do they hold? What language resonates with them? If I say it this way, will they hear that?”

You have to listen to yourself speak, in real-time, as if you were the audience. This is incredibly hard work. It requires a base knowledge of the recipient’s current understanding. It requires you to identify the gaps in their context and fill them precisely, without being condescending.

Most people don’t do this. They speak high vocabulary into a mic because it makes them feel smart. They use jargon to signal tribal belonging. They ramble to fill the silence because they are insecure. That is self-expression masquerading as communication.

A Walkie-Talkie mode forces the shift from expression to communication. When you are holding that button, the clock is ticking. The channel is blocked. You are forced to condense your complex idea into a packet of information that can be received and acknowledged. It strips away the performative “umms” and “ahhs” and forces you to transmit the signal, not the noise.

The Passion of the Speaker

At the end of the day, communication is hard because it requires passion. Not the passion of “I’m loud and excited,” but the passion of care. You have to care enough about the listener to do the work for them.

If I am speaking to you, and I am rambling, I am being lazy. I am asking you to do the work of sifting through my verbal garbage to find the gold nugget of meaning. I am outsourcing the editing process to the audience. That is disrespectful.

A passionate communicator goes more than halfway. They pre-chew the information. They structure the narrative. They kill their darlings. They make it easy to be understood.

Imagine a meeting where everyone operated this way. Where no one spoke unless they had a fully formed packet of information to transmit. Where “Over” meant “I have verified that my message is complete.” It would be shorter. It would be intense. It would be productive.

The Research: Why We Need Constraints

Deep diving into the current state of digital collaboration tools reveals a landscape that is terrified of silence and friction.

The “Spacebar” Fallacy Zoom and other platforms implemented a “Push-to-Talk” feature years ago—the spacebar unmute. But this is a UI convenience, not a protocol. It is “mute by default,” which is different from “single-channel architecture.” The spacebar doesn’t prevent others from talking over you. It doesn’t queue inputs. It just temporarily turns on your mic. It solves the background noise problem (the dog barking, the leaf blower), but it doesn’t solve the structural problem of conversational overlap.

Discord’s “Priority Speaker” Discord comes closer. Born from gaming—a high-stakes, real-time coordination environment similar to the military—Discord understands that sometimes the Raid Leader needs to be the only voice of God. They have a “Priority Speaker” toggle that dampens all other audio when a specific person talks. This is useful, but it’s hierarchical. It creates a caste system of “Talkers” and “Listeners.” What I am proposing is a democratic Walkie-Talkie mode where anyone can take the floor, but only one can hold it.

Apple’s Walkie-Talkie Apple introduced the Walkie-Talkie app on the Apple Watch. It uses a specialized variant of FaceTime Audio. It’s fantastic for quick, bursty communication. “Grab milk.” “On my way.” But it’s 1:1 . It doesn’t scale to the conference room. It proved, however, that people like the modality. There is a satisfaction in the “beep,” the message, and the silence. It feels definitive.

The Cognitive Load of Full-Duplex Research into “Zoom Fatigue” (a term coined by Stanford researchers) points to several causes: excessive eye contact, the self-view mirror effect, and—crucially—the cognitive load of processing out-of-sync audio cues. In face-to-face conversation, our brains process delay (or lack thereof) to manage turn-taking. In video calls, network jitter adds 50ms to 500ms of latency. This destroys our biological turn-taking hardware. We constantly misread the “end” of a sentence.

A hard PTT system removes the ambiguity. The visual indicator that “User A is Transmitting” removes the guesswork. You don’t have to listen for the breath intake that signals someone is about to interrupt. You just look at the UI. Is the channel open? No? Then I wait.

Implementing the Feature: A Product Spec

If I were a Product Manager at Zoom, Microsoft, or Slack, here is how I would build “Radio Mode”:

The Queue: If two people push the button at once, we don’t get chaos. We get a queue. User A gets the floor. User B gets a visual indicator: “You are next in line.”
The Timer: A subtle visual countdown. You don’t get the floor forever. Maybe the default is 60 seconds. If you can’t make your point in 60 seconds, you need to release and re-request. This kills the filibuster.
The “Over” Sound: A distinct, satisfying audio cue when the channel opens and closes. This acts as a Pavlovian trigger for attention.
The “Break” Function: A dedicated emergency button (perhaps a double-tap) that allows for an interruption only in critical scenarios (“The server is literally on fire”). Using this button should perhaps carry a social cost—it logs “INTERRUPTED” in the chat.

The Cultural Shift

The technology is the easy part. The code to implement a mutex on an audio stream is trivial. The hard part is the culture.

We are addicted to the illusion of collaboration that comes from everyone talking at once. We think that silence means disengagement. In reality, silence is where thinking happens.

In a Walkie-Talkie meeting, there would be silence. User A finishes. “Over.” User B thinks for five seconds before pressing the button. In a standard meeting, those five seconds are terrifying. In a Radio meeting, they are structural. Those five seconds are where User B formulates a response that is actually useful, rather than just a reaction.

We need to become comfortable with the gaps. We need to train our teams that “holding the button” is a responsibility.

Conclusion

We have let our tools dictate our behavior for too long. We let the limitless bandwidth of broadband convince us that we should fill every frequency with noise. It is time to artificially constrain our bandwidth to increase our signal.

Video calling should implement a walkie-talkie feature not because it’s retro, or cool, or “gamified,” but because it forces us to be better humans. It forces us to respect the listener. It forces us to clarify our own thoughts. It forces us to stop expressing ourselves and start communicating.

The technology exists. The precedent exists in the most high-stakes environments on Earth. All we lack is the courage to press the button, say what we mean, and then—most importantly—shut up.

Over.