SERVAL Open Ears AI machine listening

Building artificial ears for (urban) jungle applications

Setting the stage

Back in 2016 we were in Bardia National park Nepal when I first understood the impact of human wildlife conflict. The night we arrived a local villager was attacked in his house, together with his wife. Fortunately they where unharmed, but their home was ruined... what struck me the most was the respect of this man for the elephant that ruined his house, I will never forget.

SERVAL Open Ears

In this blog I will explain the process we went through while we  developed a sensor solution build on artificial intelligence that can process sounds the same way we can. It can classify what it hears and these signals can be used in real applications from noise pollution in the city of Amsterdam to human wildlife conflict mitigation in Nepal.

The objective

The objective is to develop a listening device that does not record and store sounds, which would imply all kinds of privacy issues, but rather would be able to process the sound and analyse it on the spot and only sends labels of sounds that it identified.

The end-goal is to make the sensor solar powered and deploy it anywhere in the world to mitigate human-wildlife conflicts.

Could we use our new Artificial Intelligence techniques to make this work?

 

Choosing the right technique

When we first came back from Nepal , I spend some time on the web and found two great projects.

The first project was the Sounds of New York City project (SONIC). What I specifically liked about this project was the inherent collaboration of the project with the citizens of New York. I recommend you to watch their video explaining their project, which was an inspiration for me it could be possible.

See the below overview picture explaining this great set up:

The second project was the work by Karol Pyczak a then Phd student who did his Phd research on sound classification. I first found his work on Github and contacted him, explaining my ambition and goals. This was around Januari 2017, when I explained to him I wanted to run his code on an edge device like a Raspberry Pi he laughed, and said that would be really challenging, by Easter that year he bought himself a Raspberry Pi and after that holiday he had his code running on the pi, impressive ...

His research helped me getting our ambitions of the ground. His jupyter notebooks are available online and a great resource for all who want to start on analysing sounds. The picture below shows his initial deep neural network structure he used in his research.


So this made me wonder if it would be possible to build a device that can hear elephants come to town and help locals all over the world to react effectively to mitigate human wildlife conflicts. Every year in India alone 400 people get killed by elephants.

Google Audioset

In October 2017 Google released its Audioset and the accompanying deep learning neural network models. This again shed a new light on our options. We leveraged the work of the guys in this blog that explained how to apply transfer learning on these pre-trained models and train this model for data that we collected for our specific use case.

The audio set collected by Google is, as you would expect, huge. Over 2 million tagged youtube recordings, by now(2020) they have improved the dataset by a couple of versions.

Sensing Clues

As a partner  DIKW Intelligence invests in the development of Artificial Intelligence applications for in the use in the  Sensing Clues Wildlife Intelligence platform. Amongst other things the SERVAL sensor is one of the technologies that we work on.

As our ambition is high, we look forward to train this sensor to also be able to identify sounds produced by (big) wild animals like elephant, lion, wild boar, and other species. Recent research shows that many of these animals communicate in very low frequencies, not detectable for the  human ear. While we have advanced quite a lot, we still have work to do in achieving this goal. Big challenges that we face include facts like: most audio equipment filters out (or just does not record) the sub-frequency sounds we are interested in; knowledge and samples of sounds within this frequency domain are scarce. For example, researchers just learned a few years ago that giraffe, too, produce sounds, hardly audible to the human ear. So, distinguishing sub-frequency sounds is truly an enormous task lying ahead.

Meanwhile in the urban jungle ....

Amsterdam Sounds project

The city of Amsterdam set to fight sound pollution and noise disturbance. To this end they initiated the Amsterdam Sounds project, in which the Serval sound sensor plays an important role! What I especially like in this collaboration is the work we do together with the Sensemakers of Amsterdam. Together we build a dedicated version of the SERVAL sound sensor for detecting sources of sound pollution, the Open Ears sensor.

Putting it all together

So let's put together all the bits and pieces and see where we are.

The basic idea is that sound can we transformed into an image by applying a Fourier transformation, thus creating a spectrogram of the sound.

Some examples taken from the samples collected in Amsterdam:

A "brommer alarm"  sounds like this

And looks like this




And a car horn looks like this


Now the challenge for the classifier is to see the sound structure in the spectrograms... so we are back at image classification again!

So we can talk about all the bits and bolts of the machine learning parts of this (and I am very happy to do so, drop me a note!) but the proof of the pudding is in the eating....

On way to show how good a classifier actually learned a certain task is to look at the confusion matrix of all the classes it is trained to recognise and compare what the model classifies with the so-called ground truth. in this case  a set of sound examples the model has never seen during training.

So to show you some preliminary results lets have a look at the confusion matrix given the default cutoff for all the class probabilities.


So how do we interpret this result?

Let's look at the class "Brommer Alarm", on the horizontal axis we see the predicted classes, so this is what the models says it hears when we play some example sound. In total the model fired  26x time thee flag "I heard a brommer alarm", in 17 cases it was actually right. The Ground Truth is on the vertical axis, showing 17x indeed the Brommer Alarm, but also some mistakes... 2x it was actually a gunshot... you can have a look at the confusion matrix yourself and see if the mistakes it makes actually make sense?

So what would we actually be interested in when deploying such a sensor in the city?

To be able to answer the question if and how useful such a sensor can be we need to talk a little bit about precision and recall. Wikipedia has a great explanation on this topic so please read that if you need to freshen up your memory on the subject ;-) .

Here I just use the great picture that comes with it:

 

So when classifying in this context we are interested in the precision of the model.  Why? Let me put it this way: if the model signals something, it better be right, otherwise it can better keep quiet!

So how can we influence precision? We tell the model only to shout if she thinks there is a high probability she is right.

We can do this by increasing the cut off probability by which the model signals a class, see the below example where we have increased the threshold to speak from 0.45 to 0.9 (extreme but just to make the point).


Now we see the model hardly ever dears to speak , but when she does she is more often right.

So to stick to our example of the "Brommer Alarms", in the first case the model shouted 26 times of which 17 time correct, a precision of 17/26 around 0.65. When we restrict the model with a higher cut off the model only speaks 15 times of which 14 times right, a precision of 0.93. 

Of cause this comes at the cost of missing out cases it should classify, the so called false negatives.

Next steps

Some of the next things we are working on are:

- Improving classification of the same sound but farther away from the source. To do this we have resampled our training samples and lowered the volume by applying a decrease in decibels by 6dB, which results in an other recoding of this sound but roughly 2 x further from the source. We have done this recursively 3 x so we have one sample with 4 variant of loudness (0 db, -6 dB, -12 dB and -18 dB). Field test we are currently doing show improvements in classification of the same sounds further away from the sensor.

- Data augmentation in general is a hot topic in deeplearning(some nice links here and here). As samples are hard to come by and expensive to collect we need to be creative in generating as much augmented data as possible in such a way it increases the model performance in de the end. We are working on a data augmentation strategy for sound samples.

Conclusion

In this blog post I described our quest at DIKW to develop useful applications of Artificial Intelligence. This particular journey has been a great one it broad me in contact with some great people. Hope you enjoyed the read. We are far from done, we keep pursuing the goal of applying this technology on the Sensing Clues Serval sound sensor in the field, most likely to start in the urban jungle of Amsterdam, but hopefully soon thereafter somewhere in Nepal, Kenia or any other place where we can turn wild spaces into safe havens.

Please feel free to contact me, leave a message or share your insights how to progress this further.


Blogs

Van BICC naar DACoE deel 2 door marco — last modified 02-11-2021
Van BICC naar Data & Analytics Center of Excellence deel 2: waar sta je als DACoE in je organisatie en waar moet je aan voldoen
Tijdreeksanalyse in R door marco — last modified 07-12-2021
Tijdreeksanalyse ARIMA in R, Handleiding modelselectie in R
Forecast in R door marco — last modified 16-11-2021
Tutorial Forecast in R
Predicting butter prices door marco — last modified 16-11-2021
Case study assignment for Certified Data Science Proffesional course DIKW Academy
Wat is nieuw in IBM Cognos Analytics 11 door marco — last modified 07-12-2021
Cognos Analytics ontwikkelt zich snel en voortvarend als een betrouwbaar self-service platform voor data analyse
Textmining vs NLP door marco — last modified 30-12-2021
De verschillen en toepassingen van textmining en Natural Language Processing
BERT en Transformer Learners door marco — last modified 02-11-2021
Ontwikkelingen op het gebied van Natural Language Processing
Granuliet WOB documenten door marco — last modified 15-11-2021
Textmining LDA Topic Models toegepast op 2 GB aan WOB documenten over granuliet
SERVAL Open Ears AI machine listening door marco — last modified 15-11-2021
Building artificial ears for (urban) jungle applications
Van BICC naar DACoE deel 1 door marco — last modified 02-11-2021
Van BICC naar Data & Analytics Center of Excellence: waarom je moet veranderen om relevant te blijven
Dashboard coronavirus door marco — last modified 15-11-2021
Eerste observaties van een datascientist
COVID-19 Weersverwachting door marco — last modified 17-01-2022
Ter ondersteuning van het corona dashboard van de rijksoverheid
Koning TOTO: Sjaak vs Bayes door marco — last modified 25-01-2022
Definitieve uitslag voetbal eredivisie op basis van een wiskundig model, Bayesiaanse statistiek, en een kleine Monte Carlo simulatie
COVID-19 Oktober forecast : Het kan vriezen het kan dooien door marco — last modified 23-11-2021
Ter ondersteuning van het corona dashboard van de rijksoverheid
Van BICC naar DACoE deel 3 door marco — last modified 18-01-2022
Van BICC naar Data & Analytics Center of Excellence deel 3:  Van ambitie naar realiteit
Data discovery tools door marco — last modified 28-01-2022
Hoe zorgen data gedreven organisaties er voor dat data snel gevonden wordt en dat nieuwe medewerkers snel productief zijn?
Kijk verder dan je dashboard door marco — last modified 01-02-2022
Met de DIKW Analytical Roadmap kijk je verder!
De zeven pilaren van DataOps door marco — last modified 02-11-2021
DataOps wordt gedefinieerd door zeven hoofdkenmerken
Van voor naar achteren en van links naar rechts in de logistieke keten door marco — last modified 24-01-2022
Het verminderen van opslagkosten en verplaatsingen van het aantal containers
Blokkade Ever Given geeft noodzaak betere data science aan door marco — last modified 11-02-2022
Containerschip blokkeert Suezkanaal
In het verleden behaalde resultaten... door marco — last modified 02-11-2021
Data zorgt voor betere resultaten in de toekomst
Met data bijdragen aan een betere wereld door marco — last modified 18-02-2022
DIKW is partner van Sensing Clues
Welke sandwich mogen wij voor u bereiden? door marco — last modified 22-02-2022
Data gedrevenheid is als een goede en juist belegde sandwich
Er zijn meer logistieke wegen die naar Rome leiden door marco — last modified 28-01-2022
Duurzame innovatieve logistieke oplossing op basis van data science
Data gedreven logistiek onderhoud voorkomt uitval door marco — last modified 20-12-2021
Operationele en logistieke kosten lager door gebruik van data
Data gedreven organisaties hebben grotere kans om te overleven door marco — last modified 02-11-2021
Transformeren naar een data gedreven organisatie kost tijd
De fasen om te transformeren naar een data gedreven organisatie door marco — last modified 02-11-2021
Welke vier fasen doorloopt een organisatie naar data gedrevenheid?
Wat is data engineering? door marco — last modified 03-02-2022
Hoe word je een data engineer?
Hoe data leidt tot de optimalisatie van de customer journey door marco — last modified 02-11-2021
Ondersteun uw customer journey met data strategie

Data Science recente blogs

ChatGPT for Business Intelligence door Nick van de Venn — last modified 18-09-2023
Chatten met je datawarehouse, utopie of werkelijkheid?
Intelligence Factory door Nick van de Venn — last modified 05-07-2023
Agile design thinking met een ML-ops sausje
Bayesiaanse Statistiek door Marc Jacobs — last modified 25-07-2022
Wiskundig raamwerk voor ouderwets leren

Data Science Nieuws & Evenementen

Aedes data science workshop 2 van 3 door marco — last modified 07-02-2022
Voor Aedes organiseert DIKW drie workshops data science
AEDES innovatie boost datascience powered by DIKW door marco — last modified 11-11-2021
De innovatie boost van AEDES is binnen gehaald door de werkgroep Big data
AI Hub Midden Nederland gelanceerd! door marco — last modified 11-11-2021
DIKW is partner van de AI Hub Midden Nederland en ondersteund en helpt het MKB in de regio