June 7, 2011

Audio on Android: A Developer’s Perspective

Some people think that the fossil record offers solid proof that humans evolved from apes, and I’ll admit that with the right diet and some electrolysis, Lucy could look halfway decent. But if you really want to clinch the case, read the comment threads on a gadget blog. Any post that compares an egoDevice to a Botroid will spark chest-thumping tribal warfare that would make Jane Goodall blanch. One suspects that our tech media overlords know exactly what they’re doing when they throw side-by-side feature comparisons to the howling commenter troops, and while I want to avert my eyes, I think I could learn something from them. If my career ever needs a booster shot of maximal controversy, I’m going to publish a cartoon of the Prophet Muhammad using a Motorola Xoom.

With that preamble out of the way (call me Jefferson), I’m going to make a few value judgments. When I first began toying with the idea of turning the algorithms I use in my music into apps, I started teaching myself iOS because the iPhone was and remains a more profitable platform for app developers than Android, although the gap is closing. I bought books, watched youtube tutorials, and experimented with example code that I ran on my own iThing. After a couple of months, I gave up. Part of the problem was unfamiliarity with objective C, the language used to code for iOS, but the other problem, which seemed less tractable and more discouraging, was a patronizing and somewhat authoritarian attitude embedded in the way the iOS development tools control the process of creating an app. These tools and the pedagogical materials that explain them almost mandate certain design patterns that structure how applications are put together. These patterns make sense for a lot of apps, I’m sure, but they didn’t make sense for my app. I knew how I wanted to structure my app, and trying to contort it into one of Apple’s design templates appeared unnatural and frustrating, so I began looking at Android as an alternative.

Getting started with Android was easy. There was some new terminology to learn, and there were rules to follow, but I felt that Android struck the right balance between preordained structure and flexibility. Equipped with a flexible development environment, I dove into teaching myself the basics of UI design and Android audio programming. In short order I had a test app that would simply take audio input from the microphone and play it back out the speaker or headphones in real time.

The disappointment began when I pressed Play and started speaking. “Test one, test” went into the phone, kicked off its shoes, had a bite to eat, checked the sports page, and ambled out of the speaker about 250 milliseconds after arriving. Now, 250 milliseconds may not sound like a long time, but in the world of real time audio for music, it’s an eternity. It’s equal in duration to an eighth note at a tempo of 120 beats per minute, which means that if you sang through the phone on the beat of a moderately fast tempo, the output would be exactly off the beat. Just speaking is nearly impossible when you hear your voice at a delay of that length. What was especially galling about this audio latency was that it occurred on the powerful Nexus One, at the time Android’s flagship phone, running the latest version of the OS. By comparison, the iPhone can achieve latency as low as 5 milliseconds, and it could do that in its first generation. I was puzzled. Somehow, a company with some of the world’s smartest engineers had designed, manufactured, and marketed a “Superphone” that turned in a laughable performance at a core engineering task.

I felt confused, angry, isolated, and betrayed–until I found a support group on the internet that showed me that there were other people just like me, going through the same things that I was. OK, I stole that line from an episode of Dateline NBC, but it was illuminating to come across the Google Code page devoted to Issue 3434. Here, you can watch Android audio developers cycle through sundry stages of grief as they realize that yes, real time audio sucks on Android, and there’s nothing they can do about it.

Google has declared its intention to lower audio latency in the future. The Compatibility Definition Document for Gingerbread stipulates that devices that declare themselves as low latency must meet the threshold of 50 ms input latency and 45 output latency. That’s better; it moves the needle from hide-your-face shameful to pretty crappy, but it’s not low latency in the sense that musicians understand the term. Round-trip input-output latency of 95 ms would easily throw off a guitar player trying to use an audio effects app; it would delay output by about a sixteenth note at a fast tempo. To make digital audio effects and synthesis work for musicians, latency must fall below the threshold of the human perceptual present, the chunk of time at which we process sensory input. It’s about 30 milliseconds, which is why 24 and 30 frames per second (with periods of 41.6 ms and 33.3 ms, respectively) function as the baseline frame rates for film and video.

Latency is not the only weak spot of audio performance on Android. The quality of microphones and digital-analog converters (DACs) varies widely between devices. Last week, when I was testing Voloco on my Nexus One, I noticed that singing high notes produced nasty squelching in the middle-high frequency register of the vocoder output. To test whether the microphone of the N1 might be the problem, I recorded myself singing a high note into the N1 and a first generation Motorola Droid simultaneously. I applied no processing, I just recorded the input. I then took those recordings, imported them to an audio editing program, and did a simple frequency analysis on an identical chunk of audio. The top image plots the frequency content of one second of the Droid recording.

Motorola Droid frequency response.

I used a sample rate of 22 khz, so the highest frequency present is 11 khz. You’ll notice that the magnitude spectrum is about what you would expect from a high, sung note. The partials are clearly present and gradually decrease in magnitude with frequency. The bottom image is the analysis of the same chunk of sound recorded with the Nexus One.

Nexus One

One thing you notice is that the Nexus One doesn’t detect any frequencies above 7500 hz, which is disappointing but forgivable. The horrifying problem is the massive boost in energy to all frequencies between 5500 hz and 7500 hz. On the Droid the highest peak in that region is at -54 dB, on the Nexus One it’s -36 dB. That 18 dB difference is ear-splitting when used with a vocoder. Here are the audio files I used for the analysis: droid_high_note,n1_high_note.

Gadget blogs, particularly ones devoted to Android, love hardware specs, but I have yet to see a mobile device review use the words “latency” or “frequency response.” This oversight is unfortunate because audio performance is not a niche feature; it’s what enables a productive connection to the broader world of musical culture, which plays a disproportionately large role in shaping consumer perceptions of cache and status. Furthermore, it’s becoming specious to speak of mass-market music consumption apps and expert music production apps. Technology is blurring the lines between musical production and consumption, reversing a trend that began with the rise of recording in the early 20th Century and returning us to a situation more like that of the 19th (see music creation site UJAM for a perfect example). This development is real, and Apple and even Microsoft understand it (although Microsoft’s Songsmith is pretty bad, and the marketing is just awful). Sadly for me and other music developers who find much to like about Android, Google doesn’t seem to get it, and until Google and OEMs bring audio performance up to par, Android will have little to contribute to music making.

Tags:

Comments (10)


[...] is basically a real-time automatic tuning, pitch-shifting and vocoding app, based on Jazari’s custom vocal processing software as shown in action in the video below. Load up a song from your phone’s music library and it [...]


[...] is basically a real-time automatic tuning, pitch-shifting and vocoding app, based on Jazari’s custom vocal processing software as shown in action in the video below. Load up a song from your phone’s music library and it [...]


[...] fundamentally a real-time involuntary tuning, pitch-shifting and vocoding app, formed on Jazari’s custom outspoken estimate software as shown in movement in a video below. Load adult a strain from your phone’s strain library and [...]


[...] is basically a real-time automatic tuning, pitch-shifting and vocoding app, based on Jazari’s custom vocal processing software as shown in action in the video below. Load up a song from your phone’s music library and it [...]


[...] is basically a real-time automatic tuning, pitch-shifting and vocoding app, based on Jazari’s custom vocal processing software as shown in action in the video below. Load up a song from your phone’s music library and it [...]


[...] is basically a real-time automatic tuning, pitch-shifting and vocoding app, based on Jazari’s custom vocal processing software as shown in action in the video below. Load up a song from your phone’s music library and it [...]


[...] is basically a real-time automatic tuning, pitch-shifting and vocoding app, based on Jazari’s custom vocal processing software as shown in action in the video below. Load up a song from your phone’s music library and it [...]


[...] is basically a real-time automatic tuning, pitch-shifting and vocoding app, based on Jazari’s custom vocal processing software as shown in action in the video below. Load up a song from your phone’s music library and it [...]


[...] is basically a real-time automatic tuning, pitch-shifting and vocoding app, based on Jazari’s custom vocal processing software as shown in action in the video below. Load up a song from your phone’s music library and it [...]


[...] is basically a real-time automatic tuning, pitch-shifting and vocoding app, based on Jazari’s custom vocal processing software as shown in action in the video below. Load up a song from your phone’s music library and it [...]

Leave a reply