Audio Problems on Video Conferencing Platforms

“Sorry, I think you cut out”, “What did you just say?”, and other stories.

BOLT Canada
4 min readJan 20, 2021

By Rianna Melnik, Director of Sponsorship (McGill)

Zoom and other video chat platforms like it have provided a much-needed sense of social normalcy throughout this past year. However, they fail to effectively replicate the in-person interactions that we’ve all been lacking as a result of the remote circumstances we’re enduring. While the many live online lectures, virtual meetings and interviews, and makeshift digital birthday parties and get-togethers that I’ve attended since the beginning of the pandemic have helped to suppress some of the isolation I’ve experienced, they fall short in their purported substitution of their real-life parallels.

The problem with sound

Aside from the obvious gaps that distinguish virtual and physical conversations, one of the most prominent issues that tech companies are currently racing to resolve is that of sound quality. From awkward lags, to distorted speech, to inconsistent volumes, the audio experiences provided by platforms such as Zoom leave much to be desired. As tech writer Mark Sparrow puts it, the “artificial soundscape” reflected by these mediums “creates a muddied, unclear and unnatural listening experience”. Every voice that participates in a given video chat emerges from the same source, which is the listener’s speaker, at equal distances and volumes. This inhibits the sound quality and makes it difficult to distinguish one voice from another. Conversely, the spatial configuration of in-person conversations, where “individual voices are separated by distance and appear at specific points” within a given room, facilitates the process of recognizing who is speaking at particular moments in time, and to whom. Essentially, video chat platforms fail to capture the spatiality, volume control, and other significant auditory cues of real conversations, which help to facilitate the quality and flow of sound. As a result, the social interactions that take place on these virtual conferences are more uncomfortable and challenging than they would be in real life.

What’s more, these sound quality problems can lead to what can be referred to as “Zoom Fatigue”, as we are allocating heightened levels of awareness and concentration to understanding the speakers in question and the conversations that are occurring. As such, we tend to leave those aforementioned virtual lectures, meetings, and makeshift social gatherings feeling much more exhausted and frustrated than we would have if they were in person. From what I’ve gathered from mine and my friends’ experiences with ‘Zoom University’, participating in class discussions has become much more stressful, which leads to a reduction in student-professor and peer-to-peer interactions; meetings have become much more draining and difficult to contribute to; and overall, we finish our days and weeks with less energy and motivation than ever before. I believe that all of these issues can be directly attributed to the problems in sound quality that exist on these platforms. As such, resolving them can make for a much more enjoyable and authentic academic experience, and has the potential to transform virtual social interactions in general.

3D Audio: The solution

The good news is that a number of digital audio companies are building solutions to confront these shortcomings. Dirac, for example, is currently working to incorporate 3D audio, also known as spatial audio, into video meetings. The Swedish audio technology company already aims to “revolutionize the way the world hears” through their specialization in digital audio optimization, which is currently being applied to automotive, VR/AR, and gaming listening environments. According to the Dirac’s CEO, Mathias Johansson, 3D sound works by decoupling audio from its original source and “[positioning] it around the listener to create a more natural listening experience”. By attributing specific auditory locations to different speakers on a video call, 3D sound can lead to more authentic conversations. More specifically, when one person is talking, the sound will emerge from a distinct direction out of the device’s speaker which aligns with that person’s location on the listener’s screen. Johansson continues, “By using digital sound optimization technology, the timing, volume, resonance, and echo characteristics of each voice can be controlled as it reaches the ear and enables the listener to accurately ‘place’ the voice exactly where it should be”. This can also be achieved through additional auditory cues, such as detecting the speakers’ motion and position relative to the screen to determine how soft or loud their voice should be. Ultimately, these modifications would relieve the visual and logical concentration that is currently required in virtual conversations in order to understand who is speaking, what they’re saying, and who they’re talking to.

One of the main questions at play here is how easily these 3D sound technologies can be applied to video conferencing. Currently, areas like gaming and virtual reality are already using 3D audio to achieve authentic listening experiences and to replicate the auditory cues that are present in physical space. As such, it would not be a significant challenge to transfer these capabilities to video chat platforms, and to combine them with other means of audio optimization such as “impulse and frequency response correction”, which work to improve the clarity and “intelligibility” of virtual conversations, according to Johansson.

Overall, while it may be a while before we can sit in a classroom, book a private room in Bronfman, or celebrate a friend’s birthday party alongside them, at least we can look forward to less awkward, more authentic video conferences while we wait.

--

--

BOLT Canada

Business tech bootcamps encouraging students to pursue innovation in a whole new light. 💻💡