Audio Processing on Android using TarsosDSP

TarsosDSP is a Java library for audio processing. Its aim is to provide an easy-to-use interface to practical music processing algorithms.

It comes prebuilt for Android in form of a jar that can be just dragged into any Android project to get going. This post will touch some basics of working with the framework.

Recording Audio on Android

The AudioRecord class manages the audio resources for Java applications to record audio from the audio input hardware of the platform. This is achieved by “pulling” (reading) the data from the AudioRecord object.

As with many implementations accessing hardware functionalities on Android, the very first thing we need to care about is the huge number of different configurations that Android devices can have. Let’s take a look at different values of configuration parameters:

private val RECORDER_CHANNELS = shortArrayOf(AudioFormat.CHANNEL_IN_MONO.toShort(), AudioFormat.CHANNEL_IN_STEREO.toShort())
    private val RECORDER_AUDIO_FORMATS = shortArrayOf(AudioFormat.ENCODING_PCM_16BIT.toShort(), AudioFormat.ENCODING_PCM_8BIT.toShort())
    private val RECORDER_SAMPLE_RATES = intArrayOf(8000, 11025, 22050, 44100)

Whew! In order to initialize an instance of Audio Record on all devices in the wild, we need to try several configurations in order to find one that works on the the target device. Here’s how we can handle this:

data class AudioRecordResult(val audioRecord: AudioRecord, val format: TarsosDSPAudioFormat, val bufferSize: Int)

private fun initAudioRecord(): AudioRecordResult? {
    for (rate in RECORDER_SAMPLE_RATES.reversed()) {
        for (audioFormat in RECORDER_AUDIO_FORMATS) {
            for (channelConfig in RECORDER_CHANNELS) {
                Timber.d("Trying recorder config: Sample rate: %d, format: %d, channel: %d", rate, audioFormat, channelConfig)
                try {
                    val bufferSize = AudioRecord.getMinBufferSize(rate, channelConfig.toInt(), audioFormat.toInt())
                    val bytesPerElement = if (audioFormat == AudioFormat.ENCODING_PCM_8BIT.toShort()) 8 else 16
                    val channels = if (channelConfig == AudioFormat.CHANNEL_IN_MONO.toShort()) 1 else 2
                    val signed = true
                    val bigEndian = false
                    if (bufferSize != AudioRecord.ERROR_BAD_VALUE) {
                        val recorder = AudioRecord(AudioSource.DEFAULT, rate, channelConfig.toInt(), audioFormat.toInt(), bufferSize)
                        if (recorder.state == AudioRecord.STATE_INITIALIZED) {
                            Timber.d("Initialized recorder. Sample rate: %d, format: %d, channel: %d", rate, audioFormat, channelConfig)
                            return AudioRecordResult(recorder, TarsosDSPAudioFormat(rate.toFloat(), bytesPerElement, channels, signed, bigEndian), bufferSize)
                        }
                    }
                } catch (e: Exception) {
                    Timber.e(e, rate.toString() + "Exception, keep trying.")
                }

            }
        }
    }
    return null
}

Processing Audio

As we saw earlier, the application is responsible for polling the AudioRecord object in time in order to access the recorded data stream. Thankfully, TarsosDSP already provides the needed mechanism for pulling this data and feeding to one of the audio stream processors. First, we convert the AudioRecord to an AndroidAudioInputStream.

mInputStream = AndroidAudioInputStream(recorder, format)

AudioDispatcher then reads this stream and passes it on to the processing algorithms.

mDispatcher = AudioDispatcher(mInputStream, bufferSize, bufferSize / 2)
// TODO: Register processors on the dispatcher
Thread(mDispatcher, "Audio dispatching").start()
recorder?.startRecording()

This starts a new thread that polls the AudioRecord and passes on the data to any processors registered on the dispatcher.

Let’s see how we can create a simple processor that detects beats in the sound being recorded from the device. This can be achieved with the PercussionOnsetDetector from TarsosDSP.

val format = mFormat ?: return
val bufferSize = mBufferSize ?: return

mDetector = PercussionOnsetDetector(format.sampleRate, bufferSize, OnsetHandler { time, salience ->
    // TODO: Do something with the beat
}, 60 + (100.0 - 60) * mSensitivity / 100, 8.0)
mDispatcher?.addAudioProcessor(mDetector)

Note: Make sure when integrating this with the application to clean up the created threads that you created to avoid leaking memory.

mDispatcher?.stop()
mRecorder?.release()