SERVAL Open Ears AI machine listening
Setting the stage
Back in 2016 we were in Bardia National park Nepal when I first understood the impact of human wildlife conflict. The night we arrived a local villager was attacked in his house, together with his wife. Fortunately they where unharmed, but their home was ruined... what struck me the most was the respect of this man for the elephant that ruined his house, I will never forget.
In this blog I will explain the process we went through while we developed a sensor solution build on artificial intelligence that can process sounds the same way we can. It can classify what it hears and these signals can be used in real applications from noise pollution in the city of Amsterdam to human wildlife conflict mitigation in Nepal.
The objective
The objective is to develop a listening device that does not record and store sounds, which would imply all kinds of privacy issues, but rather would be able to process the sound and analyse it on the spot and only sends labels of sounds that it identified.
The end-goal is to make the sensor solar powered and deploy it anywhere in the world to mitigate human-wildlife conflicts.
Could we use our new Artificial Intelligence techniques to make this work?
Choosing the right technique
When we first came back from Nepal , I spend some time on the web and found two great projects.
The first project was the Sounds of New York City project (SONIC). What I specifically liked about this project was the inherent collaboration of the project with the citizens of New York. I recommend you to watch their video explaining their project, which was an inspiration for me it could be possible.
See the below overview picture explaining this great set up:
The second project was the work by Karol Pyczak a then Phd student who did his Phd research on sound classification. I first found his work on Github and contacted him, explaining my ambition and goals. This was around Januari 2017, when I explained to him I wanted to run his code on an edge device like a Raspberry Pi he laughed, and said that would be really challenging, by Easter that year he bought himself a Raspberry Pi and after that holiday he had his code running on the pi, impressive ...
His research helped me getting our ambitions of the ground. His jupyter notebooks are available online and a great resource for all who want to start on analysing sounds. The picture below shows his initial deep neural network structure he used in his research.
So this made me wonder if it would be possible to build a device that can hear elephants come to town and help locals all over the world to react effectively to mitigate human wildlife conflicts. Every year in India alone 400 people get killed by elephants.
Google Audioset
In October 2017 Google released its Audioset and the accompanying deep learning neural network models. This again shed a new light on our options. We leveraged the work of the guys in this blog that explained how to apply transfer learning on these pre-trained models and train this model for data that we collected for our specific use case.
The audio set collected by Google is, as you would expect, huge. Over 2 million tagged youtube recordings, by now(2020) they have improved the dataset by a couple of versions.
Sensing Clues
As a partner DIKW Intelligence invests in the development of Artificial Intelligence applications for in the use in the Sensing Clues Wildlife Intelligence platform. Amongst other things the SERVAL sensor is one of the technologies that we work on.
As our ambition is high, we look forward to train this sensor to also be able to identify sounds produced by (big) wild animals like elephant, lion, wild boar, and other species. Recent research shows that many of these animals communicate in very low frequencies, not detectable for the human ear. While we have advanced quite a lot, we still have work to do in achieving this goal. Big challenges that we face include facts like: most audio equipment filters out (or just does not record) the sub-frequency sounds we are interested in; knowledge and samples of sounds within this frequency domain are scarce. For example, researchers just learned a few years ago that giraffe, too, produce sounds, hardly audible to the human ear. So, distinguishing sub-frequency sounds is truly an enormous task lying ahead.
Meanwhile in the urban jungle ....
Amsterdam Sounds project
The city of Amsterdam set to fight sound pollution and noise disturbance. To this end they initiated the Amsterdam Sounds project, in which the Serval sound sensor plays an important role! What I especially like in this collaboration is the work we do together with the Sensemakers of Amsterdam. Together we build a dedicated version of the SERVAL sound sensor for detecting sources of sound pollution, the Open Ears sensor.
Putting it all together
So let's put together all the bits and pieces and see where we are.
The basic idea is that sound can we transformed into an image by applying a Fourier transformation, thus creating a spectrogram of the sound.
Some examples taken from the samples collected in Amsterdam:
A "brommer alarm" sounds like this
And looks like this
And a car horn looks like this
Now the challenge for the classifier is to see the sound structure in the spectrograms... so we are back at image classification again!
So we can talk about all the bits and bolts of the machine learning parts of this (and I am very happy to do so, drop me a note!) but the proof of the pudding is in the eating....
On way to show how good a classifier actually learned a certain task is to look at the confusion matrix of all the classes it is trained to recognise and compare what the model classifies with the so-called ground truth. in this case a set of sound examples the model has never seen during training.
So to show you some preliminary results lets have a look at the confusion matrix given the default cutoff for all the class probabilities.
So how do we interpret this result?
Let's look at the class "Brommer Alarm", on the horizontal axis we see the predicted classes, so this is what the models says it hears when we play some example sound. In total the model fired 26x time thee flag "I heard a brommer alarm", in 17 cases it was actually right. The Ground Truth is on the vertical axis, showing 17x indeed the Brommer Alarm, but also some mistakes... 2x it was actually a gunshot... you can have a look at the confusion matrix yourself and see if the mistakes it makes actually make sense?
So what would we actually be interested in when deploying such a sensor in the city?
To be able to answer the question if and how useful such a sensor can be we need to talk a little bit about precision and recall. Wikipedia has a great explanation on this topic so please read that if you need to freshen up your memory on the subject ;-) .
Here I just use the great picture that comes with it:
So when classifying in this context we are interested in the precision of the model. Why? Let me put it this way: if the model signals something, it better be right, otherwise it can better keep quiet!
So how can we influence precision? We tell the model only to shout if she thinks there is a high probability she is right.
We can do this by increasing the cut off probability by which the model signals a class, see the below example where we have increased the threshold to speak from 0.45 to 0.9 (extreme but just to make the point).
Now we see the model hardly ever dears to speak , but when she does she is more often right.
So to stick to our example of the "Brommer Alarms", in the first case the model shouted 26 times of which 17 time correct, a precision of 17/26 around 0.65. When we restrict the model with a higher cut off the model only speaks 15 times of which 14 times right, a precision of 0.93.
Of cause this comes at the cost of missing out cases it should classify, the so called false negatives.
Next steps
Some of the next things we are working on are:
- Improving classification of the same sound but farther away from the source. To do this we have resampled our training samples and lowered the volume by applying a decrease in decibels by 6dB, which results in an other recoding of this sound but roughly 2 x further from the source. We have done this recursively 3 x so we have one sample with 4 variant of loudness (0 db, -6 dB, -12 dB and -18 dB). Field test we are currently doing show improvements in classification of the same sounds further away from the sensor.
- Data augmentation in general is a hot topic in deeplearning(some nice links here and here). As samples are hard to come by and expensive to collect we need to be creative in generating as much augmented data as possible in such a way it increases the model performance in de the end. We are working on a data augmentation strategy for sound samples.
Conclusion
In this blog post I described our quest at DIKW to develop useful applications of Artificial Intelligence. This particular journey has been a great one it broad me in contact with some great people. Hope you enjoyed the read. We are far from done, we keep pursuing the goal of applying this technology on the Sensing Clues Serval sound sensor in the field, most likely to start in the urban jungle of Amsterdam, but hopefully soon thereafter somewhere in Nepal, Kenia or any other place where we can turn wild spaces into safe havens.
Please feel free to contact me, leave a message or share your insights how to progress this further.
Meer weten over wat data voor uw organisatie kan betekenen?
Neem contact op met Hugo Koopmans
Telefoon: +31 6 4310 6780
E-mail: hugo.koopmans@dikw.com
Blogs
-
Van BICC naar DACoE deel 2 — door marco — last modified 02-11-2021
- Van BICC naar Data & Analytics Center of Excellence deel 2: waar sta je als DACoE in je organisatie en waar moet je aan voldoen
-
Tijdreeksanalyse in R — door marco — last modified 07-12-2021
- Tijdreeksanalyse ARIMA in R, Handleiding modelselectie in R
-
Forecast in R — door marco — last modified 16-11-2021
- Tutorial Forecast in R
-
Predicting butter prices — door marco — last modified 16-11-2021
- Case study assignment for Certified Data Science Proffesional course DIKW Academy
-
Wat is nieuw in IBM Cognos Analytics 11 — door marco — last modified 07-12-2021
- Cognos Analytics ontwikkelt zich snel en voortvarend als een betrouwbaar self-service platform voor data analyse
-
Textmining vs NLP — door marco — last modified 30-12-2021
- De verschillen en toepassingen van textmining en Natural Language Processing
-
BERT en Transformer Learners — door marco — last modified 02-11-2021
- Ontwikkelingen op het gebied van Natural Language Processing
-
Granuliet WOB documenten — door marco — last modified 15-11-2021
- Textmining LDA Topic Models toegepast op 2 GB aan WOB documenten over granuliet
-
SERVAL Open Ears AI machine listening — door marco — last modified 15-11-2021
- Building artificial ears for (urban) jungle applications
-
Van BICC naar DACoE deel 1 — door marco — last modified 02-11-2021
- Van BICC naar Data & Analytics Center of Excellence: waarom je moet veranderen om relevant te blijven
-
Dashboard coronavirus — door marco — last modified 15-11-2021
- Eerste observaties van een datascientist
-
COVID-19 Weersverwachting — door marco — last modified 17-01-2022
- Ter ondersteuning van het corona dashboard van de rijksoverheid
-
Koning TOTO: Sjaak vs Bayes — door marco — last modified 25-01-2022
- Definitieve uitslag voetbal eredivisie op basis van een wiskundig model, Bayesiaanse statistiek, en een kleine Monte Carlo simulatie
-
COVID-19 Oktober forecast : Het kan vriezen het kan dooien — door marco — last modified 23-11-2021
- Ter ondersteuning van het corona dashboard van de rijksoverheid
-
Van BICC naar DACoE deel 3 — door marco — last modified 18-01-2022
- Van BICC naar Data & Analytics Center of Excellence deel 3: Van ambitie naar realiteit
-
Data discovery tools — door marco — last modified 28-01-2022
- Hoe zorgen data gedreven organisaties er voor dat data snel gevonden wordt en dat nieuwe medewerkers snel productief zijn?
-
Kijk verder dan je dashboard — door marco — last modified 01-02-2022
- Met de DIKW Analytical Roadmap kijk je verder!
-
De zeven pilaren van DataOps — door marco — last modified 02-11-2021
- DataOps wordt gedefinieerd door zeven hoofdkenmerken
-
Van voor naar achteren en van links naar rechts in de logistieke keten — door marco — last modified 24-01-2022
- Het verminderen van opslagkosten en verplaatsingen van het aantal containers
-
Blokkade Ever Given geeft noodzaak betere data science aan — door marco — last modified 11-02-2022
- Containerschip blokkeert Suezkanaal
-
In het verleden behaalde resultaten... — door marco — last modified 02-11-2021
- Data zorgt voor betere resultaten in de toekomst
-
Met data bijdragen aan een betere wereld — door marco — last modified 18-02-2022
- DIKW is partner van Sensing Clues
-
Welke sandwich mogen wij voor u bereiden? — door marco — last modified 22-02-2022
- Data gedrevenheid is als een goede en juist belegde sandwich
-
Er zijn meer logistieke wegen die naar Rome leiden — door marco — last modified 28-01-2022
- Duurzame innovatieve logistieke oplossing op basis van data science
-
Data gedreven logistiek onderhoud voorkomt uitval — door marco — last modified 20-12-2021
- Operationele en logistieke kosten lager door gebruik van data
-
Data gedreven organisaties hebben grotere kans om te overleven — door marco — last modified 02-11-2021
- Transformeren naar een data gedreven organisatie kost tijd
-
De fasen om te transformeren naar een data gedreven organisatie — door marco — last modified 02-11-2021
- Welke vier fasen doorloopt een organisatie naar data gedrevenheid?
-
Wat is data engineering? — door marco — last modified 03-02-2022
- Hoe word je een data engineer?
-
Hoe data leidt tot de optimalisatie van de customer journey — door marco — last modified 02-11-2021
- Ondersteun uw customer journey met data strategie
Data Science recente blogs
-
ChatGPT for Business Intelligence — door Nick van de Venn — last modified 18-09-2023
- Chatten met je datawarehouse, utopie of werkelijkheid?
-
Intelligence Factory — door Nick van de Venn — last modified 05-07-2023
- Agile design thinking met een ML-ops sausje
-
Bayesiaanse Statistiek — door Marc Jacobs — last modified 25-07-2022
- Wiskundig raamwerk voor ouderwets leren
Data Science Nieuws & Evenementen
-
Aedes data science workshop 2 van 3 — door marco — last modified 07-02-2022
- Voor Aedes organiseert DIKW drie workshops data science
-
AEDES innovatie boost datascience powered by DIKW — door marco — last modified 11-11-2021
- De innovatie boost van AEDES is binnen gehaald door de werkgroep Big data
-
AI Hub Midden Nederland gelanceerd! — door marco — last modified 11-11-2021
- DIKW is partner van de AI Hub Midden Nederland en ondersteund en helpt het MKB in de regio