\jyear

2021

[1]\fnmBiswajit \surMaity

1]\orgdivComputer Application and Science, \orgnameInstitute of Engineering and Management, \orgaddress\postcode700091, \stateWest Bengal, \countryIndia

2]\orgdivCSE, \orgnameNational Institute and Technology, Durgapur, \orgaddress\postcode713209, \stateWest Bengal, \countryIndia

$\mathcal{A}$ $\mathcal{C}$ lassi $\mathcal{H}$ onk: A System Framework to Annotate and Classify Vehicular Honk from Road Traffic

biswajit.maity1@gmail.com \fnmAbdul \surAlim aalimv187@gmail.com \fnmPopuri Sree Rama Charan \sur psr.19U10738@btech.nitdgp.ac.in \fnmSubrata \surNandi subrata.nandi@gmail.com \fnmSanghita \surBhattacharjee sbhattacharjee.cse@nitdgp.ac.in [ [

Abstract

Some recent studies highlight that vehicular traffic and honking contribute to more than 50% of noise pollution in urban or sub-urban cities in developing regions, including Indian cities. Frequent honking has an adverse effect on health and hampers road safety, the environment, etc. Therefore, recognizing the various vehicle honks and classifying the honk of different vehicles can provide good insights into environmental noise pollution. Moreover, by classifying honks based on vehicle types, we can infer the contextual information of a location, area, or traffic. So far, the researchers have done outdoor sound classification and honk detection, where vehicular honks are collected in a controlled environment or in the absence of ambient noise. Such classification models fail to classify honk based on vehicle types. Therefore, it becomes imperative to design a system that can detect and classify honks of different types of vehicles from which we can infer some contextual information. In this paper, we have developed a novel framework $\mathcal{A}$ $\mathcal{C}$ lassi $\mathcal{H}$ onk that performs raw vehicular honk sensing, data labeling and classifies the honk into three major groups, i.e., light-weight vehicles, medium-weight vehicles, and heavy-weight vehicles. We collected the raw audio samples of different vehicular honking based on spatio-temporal characteristics and converted them into spectrogram images. We have proposed a deep learning-based Multi-label Autoencoder model (MAE) for automated labeling of the unlabeled data samples, which provides 97.64% accuracy in contrast to existing deep learning-based data labeling methods. Further, we have used various pre-trained models, namely Inception V3, ResNet50, MobileNet, ShuffleNet, and proposed an Ensembled Transfer Learning model (EnTL) for vehicle honks classification and performed comparative analysis. Results reveal that EnTL exhibits the best performance compared to pre-trained models and achieves 96.72% accuracy in our dataset. In addition, we have identified a context of a location based on these classified honk signatures in a city.

keywords:

Autoencoder, Data Labeling, Honk Classification, Noise Pollution, Spectrogram, Transfer Learning

1 Introduction

The fast growth in urbanization and industrialization has brought about modernization in people’s life, but it has several negative effects on urban population. In recent years, noise pollution has emerged as a significant concern, especially in cities of developing economies. Several studies have shown the adverse effects of noise pollution on people’s health and life. For example, prolonged noise exposure can cause hearing problem, sleeping disturbance, cardiovascular diseases Gupta \BOthers. (\APACyear2018); Jariwala \BOthers. (\APACyear2017). Study Firdaus \BBA Ahmad (\APACyear2010) reports that exposure to a high level of noise can lead to the complex health problems, significantly affecting the health. Besides this, factors contributing to urban noise pollution were extensively studied in literature Hammer \BOthers. (\APACyear2014); Michali \BOthers. (\APACyear2021). The effects of traffic, construction noise on urban noise rising were explored on the overall urban noise landscape Kalawapudi \BOthers. (\APACyear2020) and noise maps has been created to assess noise level for different times throughout a day. Several studies and report have shown that traffic noise is the prime contributor to urban noise. To monitor noise level in urban road traffic, some studies proposed noise maps Chouksey \BOthers. (\APACyear2023); Andrade \BOthers. (\APACyear2024). Recently, researchers have examined the pollution level changes in roads before and after COVID-19 lockdown using regression analyses Marwah \BBA Agrawala (\APACyear2022). Despite these advancements, current noise maps and monitoring strategies lack to provide contextual information and sources of noise, which could further assist city planning bodies in devising plans to mitigate noise pollution levels. Notably, vehicular honking is a major source to traffic related noise. Characterizing vehicular honking can provide valuable insights into traffic state characterization, noise estimation, prediction. Hence, it becomes imperative to assess noise pollution, particularly honking, to devise urban traffic planning and enhancing urban sustainability.

However, precise detection methods and classification models of honks corresponding to vehicle types that generates such are missing especially in presence of other outdoor noise sources and heterogeneous mixed model traffic conditions. Existing outdoor noise classification models Sen \BOthers. (\APACyear2010); Dim \BOthers. (\APACyear2020); Piczak (\APACyear2015); Salamon \BBA Bello (\APACyear2017) fail to identify sub-classification of honk due to lack of annotated data required for model training. In this work, we aim to address the problem of sub-classifying honk source based on vehicle type that emits the honk. Such sub-classification of honk model can complement the existing noise monitoring Mann \BBA Singh (\APACyear2024), mapping Andrade \BOthers. (\APACyear2024) and modelling research effort Medina-Salgado \BOthers. (\APACyear2022); Hu \BOthers. (\APACyear2022). Furthermore, we can develop various micro-services to diminish the exposure to noise pollution due to road traffic for healthy living. Hence, in this research work, we have focused on annotating and classifying vehicular honks of different types of vehicles. Here, we define three types of vehicles based on their sizes: light-weight vehicles (LWV), medium-weight vehicles (MWV), and heavy-weight vehicles (HWV) Shekhar \BOthers. (\APACyear2022). The details of each type of vehicle are described in Table 1. Furthermore, we have estimated the context of a location that might help in identifying personal noise exposure and apparently, avoid the highly congested road traffic area based on its spatio-temporal nature. This work has two-fold objectives: (a) the classification of different vehicular honks in the presence of various ambient noises, (b) characterize the vehicular honks in such a way so that we can infer some meaningful insights from it, which can further be used to mitigate the harmful effects of honking in our daily lives.

Table 1: Description of vehicle types

Features

LWV

MWV

HWV

Size

Small

Medium

Large

Weight

Light

Medium

Heavy

Type

Motorbike, Scooty,

Auto Rickshaw

Cars

Bus, Truck

Motivation: In the last few years, researchers tried to identify vehicular honks by modeling the raw audio signals using Fast Fourier Transformation (FFT), Mel Frequency Cepstral Coefficients (MFCCs), Spectrogram, etc. Additionally, some researchers have developed environmental sound classification techniques (where car honk is considered as one of the classes) to detect and classify environmental sounds Piczak (\APACyear2015); Salamon \BBA Bello (\APACyear2017); Zhou \BOthers. (\APACyear2017). They have classified the sound of air conditioners, gunfire, street music, automobile horns, children playing, dogs barking, drilling, idling engines, sirens, etc. Different deep-learning models are predominantly used in their works. In this instance, the authors proposed SB-CNN Salamon \BBA Bello (\APACyear2017), ConvNet Zhou \BOthers. (\APACyear2017), TFCNN Mu \BOthers. (\APACyear2021), and other CNN model variants. For model training, the aforementioned cited works utilized datasets such as ESC-50, ESC-10, and UrbanSound8K. It is worth noting that ESC-10 does not include any honk sample. However, both ESC-50 and UrbanSound8K contain car honk samples, with duration ranging from 2 seconds to 9 minutes where ESC-50 contains samples with an average duration of 3 minutes, while UrbanSound8K includes samples with an average duration of 134 minutes. Furthermore, it is noted that the majority of these honk samples were taken in a quiet, and controlled environment. Apart from this sound classification, FFT and band-pass filtering-based honk identification are also discussed in Sen \BOthers. (\APACyear2010); Dim \BOthers. (\APACyear2020), and MFCC-based techniques are addressed in Banerjee \BBA Sinha (\APACyear2012). However, they did not deploy any model which can be able to detect honk automatically. In Maity, Alim\BCBL \BOthers. (\APACyear2022), the author modeled the honk signals as a spectrogram image and deployed a proper model to detect honk. Furthermore, several applications, like helping hearing-impaired Takeuchi \BOthers. (\APACyear2014), healthier route recommendation system Maity, Alim\BCBL \BOthers. (\APACyear2022), are developed based on vehicular honk. As a preliminary experiment, we considered some of the baseline models (SB-CNN Salamon \BBA Bello (\APACyear2017), Dilated CNN Chen \BOthers. (\APACyear2019), CNN Demir \BOthers. (\APACyear2020)) for honk classification. We used $\sim$ 30 mins of manually labeled samples to perform the same. The accuracy achieved by each model is shown in Fig. 1. It is seen that the accuracy of all the models lies between 57% to 63%, which is comparatively much low. For that reason, we need further investigation in terms of model training as well as label data generation.

Refer to caption — Figure 1: Accuracy of different baseline models with our 30 mins of label data

Issues & Challenges: In this work, we faced several significant challenges. The three main system-level challenges that we overcome to increase honk classification accuracy in presence of background noise are as follows: First, the collection of adequate spatio-temporal data: The currently available honk sample datasets (i.e., ESC-50, UrbanSound8k) are insufficient, and were primarily recorded sounds with no background noise. Moreover, these data were not collected with consideration of the spatial and temporal characteristics of a specific location. Secondly, Label data creation: When dealing with large volumes of data, manually tagging honk samples from raw traffic audio data can be both laborious and susceptible to errors. Additionally, existing automated data annotation techniques are often not suitable for our case due to the similarity in honk signatures. Thirdly, selection of appropriate models for honk classification: Different types of honks exhibit a range of characteristics, including intensity, pitch, duration, volume, etc. It is, therefore, difficult to choose appropriate threshold values that cover all of these groups. The honk characteristics present in recorded signals may be partially masked by ambient noise, such as extraneous speech, loud music, bioacoustic noise, electrical noise, and more. This complexity makes it challenging to accurately identify or filter honks from raw audio samples. Additionally, there may be instances where similarities exist between the patterns of MWV and HWV, further complicating the classification process. Therefore, it is crucial to filtering-based data annotation techniques. Hence, we need to fine-tune deep learning models to develop an automated honk classification system.

Contribution: In order to effectively classify vehicular honks while there is oncoming traffic, this paper addresses the aforementioned difficulties and introduces a novel framework $\mathcal{A}$ $\mathcal{C}$ lassi $\mathcal{H}$ onk. Our contributions encompass the preparation of a substantial volume of datasets, the proposal of a novel model for dataset labeling, and the fine-tuning of a suitable transfer learning model that surpasses the performance of alternative models. The primary contributions of this endeavor are outlined as follows.

•

Considering significant spatio-temporal variation, we have recorded raw audio samples by two volunteers using our developed Android application AudREC over a 463 km spanning $\sim$ 13 km road segment in a sub-urban city of Durgapur, India. The details specification is mentioned in Section 4.
•

Since these raw audio samples can not be directly applied to the deep learning models, proper data modeling and labeling are required beforehand. As manual labeling is cumbersome and error-prone, in this research work, we have designed a Multi-label Autoencoder model (MAE) and Multi-label Autoencoder Generative Adversarial Network (MAEGAN) for labeling the unlabeled samples. Experimental results show that the MAE model performs better than MAEGAN, at around 97.64% accuracy at the time of data annotation. In total, 54705 samples are labeled, i.e., equivalent to $\sim$ 15 hours. Furthermore, we have increased the sample set from 54705 to 134286 by using data augmentation techniques. In contrast to the previous UrbanSound8k’s 8000 tagged samples, the resulting richly curated dataset of large volume is employed in our experiments, which can serve as an important resource for other researchers (refer Section 5.2 and 5.3).
•

As per the literature, the spectrogram is a better choice than FFT/MFCC in honk identification/classification. We have studied the utility of various transfer learning models for the same. In this work, four pre-trained CNN models such as MobileNet Attallah (\APACyear2023), ShuffleNet Elhassan \BOthers. (\APACyear2021), ResNet 50 de la Rosa \BOthers. (\APACyear2022), and Inception V3 Wang \BOthers. (\APACyear2019) are used as transfer learning and their classification performances are investigated thoroughly. Furthermore, We have designed and implemented transfer learning-based ensemble method named EnTL by combining four pre-trained models for the task of vehicular honk classification. Experimental results show that EnTL performs better than other models with 96.72 % accuracy (refer Section 6.3).
•

The proposed EnTL has been compared with the baseline models like SB-CNN Salamon \BBA Bello (\APACyear2017), Dilated CNN Chen \BOthers. (\APACyear2019), CNN Demir \BOthers. (\APACyear2020), and TFCNN Mu \BOthers. (\APACyear2021). EnTL improves the accuracy $\sim$ 9% to 21%, indicating its effectiveness in honk classification in the presence of ambient noises (refer Section 6.5).
•

We demonstrate the usefulness of classified honk and sound pressure level (SPL) by detecting outdoor contexts like residential areas, highways, marketplaces, less crowded traffic areas, etc. This inference will help to design several micro-services in the future (refer Section 7.1).

The remaining part of the paper is organized as follows: In Section 2, we discuss and study existing works. The framework of our proposed system is provided in Section 3. The nitty-gritty details of data collection and honk signature analysis are figured out in Section 4. Section 5 describes our proposed methodology where we briefly describe data labeling, augmentation, and model selection for training. Results are analyzed in Section 6. In this section, we also illustrate the procedure to identify the context of a location. Finally, in Section 8, we conclude our paper and discuss potential directions for our work.

2 Related work

Noise pollution or disturbance is the outrageous amount of noise that affects the entire ecosystem. The major source of outdoor noise is vehicular honk which is mainly generated in road traffic areas. Due to the nature of our environment, it becomes challenging to reduce noise pollution levels to sustain and maintain the ecosystem. However, we can try to avoid it through a spectacular boom in software technology. In the last few decades, researchers have addressed and adopted several techniques to characterize the noise pollution level in road traffic areas.

2.1 Noise pollution monitoring & assessment

Noise level characterization is mainly done by monitoring, forecasting, and identifying the source of noise in the environment. Several techniques (like deploying heterogeneous sensors in different places of cities/smartphone-based monitoring) are adopted to measure the pollution level, where smartphone-based sensing is most popular one. Smartphones are basically used for spatial data collection, while fixed sensor-based devices are deployed for temporal data collection.

Smartphone-base monitoring: In Zipf \BOthers. (\APACyear2020); Jezdović \BOthers. (\APACyear2021), the author collected the data using smartphones and notified the efficacious urban planning corresponding to noise pollution exposure. Spatio-temporal patterns of noise pollution are also measured using smartphones in Zamora \BOthers. (\APACyear2017). Apart from this, the researchers in Maity, Polapragada\BCBL \BOthers. (\APACyear2022); Allen \BOthers. (\APACyear2009) have addressed a correlation between noise and air pollution to recognize the primary sources of environmental noise clearly. Due to the sensitivity issue of the smartphone and the lack of proper calibration techniques, infrastructure-based sensing seems more realistic.

Fixed sensor-based monitoring: To get the temporal variation of data, different calibrated sensors can be deployed throughout the city (mainly in the road traffic area) to measure the noise pollution level. In previous studies, the researchers developed a wireless sensor network Santini \BOthers. (\APACyear2008) in order to monitor noise pollution. However, nowadays, many advanced sensors are developed that can capture data with a high level of accuracy Bello \BOthers. (\APACyear2019). Raspberry Pi-controlled cloud-based sensor is also designed in Saha \BOthers. (\APACyear2018) to do the same. Moreover, an Arduino controller with IoT technology Ezhilarasi \BOthers. (\APACyear2017) is also used as a fixed sensing strategy to get the noise data from the environment. Like Santini \BOthers. (\APACyear2008), an advanced wireless sensor-based system is designed for the same in Segura-Garcia \BOthers. (\APACyear2014). Besides the advancement of technology, some pros exist that make the sensors unreliable. The sensors may provide erroneous data or stop functioning due to environmental hazards such as storms, rainfall, etc.

Noise level forecasting: To predict the noise pollution level in road traffic, the researchers Medina-Salgado \BOthers. (\APACyear2022) have developed several models and techniques. In Garg \BOthers. (\APACyear2015); Guarnaccia \BOthers. (\APACyear2017), the authors proposed ARIMA model-based time series traffic noise prediction model. Deep learning-based models are also used in several kinds of research to forecast pollution levels. In Navarro \BOthers. (\APACyear2020), the authors developed LSTM-based models to predict sound pollution levels in urban road traffic. In our previous work Maity, Trinath\BCBL \BOthers. (\APACyear2022), we also estimated the noise pollution level in the road traffic area by predicting the vehicular honk count using the E-LSTM model.

Noise source identification: Apart from the above mentioned techniques, noise pollution measurement by identifying the source is another way to monitor pollution. In Vera-Diaz \BOthers. (\APACyear2018); Suvorov \BOthers. (\APACyear2018), the author identified the source of sound using a deep neural network. Sound source identification using a microphone array configuration is another effective technique to measure pollution exposure. In Grondin \BBA Michaud (\APACyear2019), the author used an open and closed microphone array for sound localization. Wireless Acoustic Sensor Networks-based sound source identification is addressed in Cobos \BOthers. (\APACyear2017) by the authors.

2.2 Vehicular honk identification & applications

The primary source of noise pollution is traffic noise, which is mainly comprised of different vehicular honks. Therefore, vehicular honk identification and classification of the different types of vehicle honk are essential to characterize the noise pollution level. Several researchers are already identified vehicular honks by adopting different strategies. These strategies are mainly classified into three categories: a) classifying environmental sound, where vehicular honk is one of the classes, b) determining the honks directly from the audio samples by using FFT, Spectrogram, or MFCC, and c) applications based on the honk signature. Details of the honk identification and sound classification methods are summarized in Table 2.

Environmental sound classification: There are numerous works that identify vehicular honking by classifying environmental sounds, where car honking is one of the class Piczak (\APACyear2015); Salamon \BBA Bello (\APACyear2017); Zhou \BOthers. (\APACyear2017); Khamparia \BOthers. (\APACyear2019); Abdoli \BOthers. (\APACyear2019); Mesaros \BOthers. (\APACyear2019); Chen \BOthers. (\APACyear2019); Demir \BOthers. (\APACyear2020); Ahmed \BOthers. (\APACyear2020); Mushtaq \BBA Su (\APACyear2020); Guzhov \BOthers. (\APACyear2021); Mu \BOthers. (\APACyear2021). Authors mainly used ESC-10, ESC-50, or UrbanSound8K dataset for sound classification. All the data are collected in a controlled environment where ambient sound is not present. Most of the cases, they modified the CNN models to increase the accuracy. The highest level of accuracy that they have achieved till yet is 97% for the ESC-10 dataset in Guzhov \BOthers. (\APACyear2021). However, the ESC-10 data set does not contain the honk class. Apart from ESC-10, the maximum accuracy achieved by the DCASE-2017 ASC dataset Demir \BOthers. (\APACyear2020) is 96.23%, which contains a car honk as a class.

Honk from raw samples and its applications: In Sen \BOthers. (\APACyear2010), the author used band-pass filtering to remove the noises from the raw audio samples and then applied FFT to detect the honks. They also calculated the duration of each honk in this work. Nevertheless, their selected threshold values for band-pass filters don’t guarantee a definite honk signature all the time. Furthermore, any framework/system was not designed that can identify honk automatically while moving. A similar kind of FFT-based honk detection technique is found in Dim \BOthers. (\APACyear2020). As an application of honk, the authors developed a system that assists the hearing-impaired person in driving. In another work Takeuchi \BOthers. (\APACyear2014), a smartphone-based system is implemented to detect honking and then generate alarm sounds for hearing-impaired people. Apart from these, an embedded system is designed to identify emergency honk Palecek \BBA Cerny (\APACyear2016), and MFCC-based honk detection in Banerjee \BBA Sinha (\APACyear2012). In our previous work Maity, Alim\BCBL \BOthers. (\APACyear2022), we have identified vehicular honk from raw audio samples in the presence of several ambient noises. Apart from this, we have also determined different features of the honk, like duration of the honk, the inter-honk gap between two successive honks, and honk count, etc., and based on these features, we have tried to identify the context of a location and recommend a honk-aware route, which is a healthier route in comparison to the google recommended route.

As per the works mentioned above, it is apparent that the researchers are working on honk identification and environmental sound classification techniques. However, fine-grained vehicular honk classification in terms of vehicle types is our primary objective of the work presented in this paper, and it is not yet done.

Table 2: Survey for vehicular honk identification and environmental sound classification, where vehicular honk is considered as a class

Author, Year

Dataset

Technique

Accuracy Obtained

Remarks

Honk Detection from raw audio samples

R. Sen et al.

(2010) Sen \BOthers. (\APACyear2010)

Self collected

FFT

Only identify the honks,

They have not trained any model

No ML/DL

is used

R. Banerjee et al.

(2012) Banerjee \BBA Sinha (\APACyear2012)

Self collected

Modified MFCC

Only identify the honks,

They have not trained any model

No ML/DL

is used

Takeuchi et al.

(2014) Takeuchi \BOthers. (\APACyear2014)

—

IIR Comb Filter

—

Develop an application for

hearing impaired people

Josef et al.

(2016) Palecek \BBA Cerny (\APACyear2016)

—

Embedded system

—

Own develop hardware

system to detect honk

Dim et al.

(2020 Dim \BOthers. (\APACyear2020)

—

FFT and self design honk

detection algorithm

—

Develop an application for

hearing impaired people

Biswajit et al.

(2022) Maity, Alim\BCBL \BOthers. (\APACyear2022)

Self Collected

Spectrogram to train Different Transfer

Learning models

97.69%

Detected honk in presence

of ambient noise

Sound Classification

Piczak et al.

(2015) Piczak (\APACyear2015)

ESC-50,

ESC-10, and

UrbanSound8K

Log-scaled

mel-spectrograms and develop

a CNN model

Best accuracy 87%

for Esc-10 dataset and remaining

are less than that

10 different class,

where car honk is one class

Salamon et al.

(2017) Salamon \BBA Bello (\APACyear2017)

UrbanSound8K

Log-scaled

mel-spectrograms and

develop a SB-CNN model

85%

10 different class,

where car honk is one class

Zhou et al.

(2017) Zhou \BOthers. (\APACyear2017)

UrbanSound8K

2D mel-spectrogram and

develop a ConvNet model

Best accuracy 86%

10 different class,

where car honk is one class

Khamparia et al.

(2019) Khamparia \BOthers. (\APACyear2019)

ESC-10 and

ESC-50

Spectrogram and develop a CNN,

tensor deep stacking network (TDSN)

Proposed CNN 77% and

proposed TDSN 56%

10 different class,

where car honk is included

Sajjad et al.

(2019) Abdoli \BOthers. (\APACyear2019)

UrbanSound8K

Audio signals of any length as it

splits the signal into overlapped

frames using a sliding window,

and used1D CNN

89%

10 different class,

where car honk not included

Mesaros et al.

(2019) Mesaros \BOthers. (\APACyear2019)

freesound.org,

TUT Sound, and

AudioSet

log-mel energies,

MFCC, pitch,

spectrogram and used CNN, DNN,

RNN, CRNN, LSTM, GRU

Among all the techniques

95% is the highest accuracy

Classified 4 categories

of sound in

three different phase, where

car honk is one class

Yan et al.

(2019) Chen \BOthers. (\APACyear2019)

UrbanSound8K

Log scale mel-spectrogram

and proposed Dilated Convolution

78%

10 different class,

where car honk is one class

Demir et al.

(2020) Demir \BOthers. (\APACyear2020)

DCASE-2017 ASC

and UrbanSound8K

Spectrogram, and used

Transfer learning models

For DCASE-2017 ASC dataset

accuracy is 96.23% and

for UrbanSound8K is 86.70%

For 1st dataset 15 category

and 2nd dataset 10 category

classification is done,

by considering honk is one class

Ahmed et al.

(2020) Ahmed \BOthers. (\APACyear2020)

ESC-50, ESC-10,

and UrbanSound8K

Log-scaled

mel-spectrograms,

and develop a CNN model

Highest accuracy achieved 92.9%

among all the dataset

10 different class,

where car honk is one class

Zohaib et al.

(2020) Mushtaq \BBA Su (\APACyear2020)

ESC-10, ESC-50,

and UrbanSound8K

Mel spectrogram (Mel), Mel Frequency

Cepstral Coefficient (MFCC) and Log-Mel

and proposed DCNN

94.94%, 89.28%, and 95.37% for

the respective dataset mentioned here

10 different class,

where car honk is one class

Andrey et al.

(2021) Guzhov \BOthers. (\APACyear2021)

ESC-50, ESC-10,

and UrbanSound8K

log-power Short Time Fourier

Transform (STFT) and used

ESRes-Net Attention with and

without ImageNet pretrained model

97.0% (ESC-10), 91.5% (ESC-50

and 84.2% / 85.4%

(US8K mono / stereo).

10 different class,

where car honk is one class

Wenjie et al.

(2021) Mu \BOthers. (\APACyear2021)

ESC-50, and

UrbanSound8K

Harmonic spectrogram,

Percussive spectrogram,

Develop a TFCNN model

For ESC-50 dataset

accuracy is 84.4% and

for UrbanSound8K is 91.3%

10 different class,

where car honk is one class

3 Framework

Our proposed framework $\mathcal{A}$ $\mathcal{C}$ lassi $\mathcal{H}$ onk has four distinct modules as displayed in Fig. 2:

[(1)]
1.

Data Acquisition: Data collection is one the most crucial to our work since it allows us to accomplish our tasks. For collecting honk data of different vehicles, we have developed a customized android based application. Further, we process these data for honk classification.
2.

Modeling Honk Signals: Raw audio samples are converted to spectrogram images. These spectrogram images are labeled by our proposed Multi-label autoencoder model (MAE). In addition, augmentation techniques are used to generate the synthetic data, which is used to increase the data volume.
3.

Model Deployment: Different convolutional neural network (CNN) are used to classify the honk signature. The results of each model are analyzed, and the best-suited models are selected for classification.
4.

Applications: Depending upon the nature of honking and the number of particular types of vehicles in a specific area, a unique pattern can be identified, which is further used to determine the context of a location. Once the context is identified, then various applications can be developed.

4 Data acquisition

Our motivation is to identify the context of a location based on different vehicle honking, and sound pressure levels (SPL) by considering spatio-temporal characteristics. To do that, we need a wide range of data collection throughout the city in different places like residential areas, less/high traffic areas, marketplaces, schools/colleges, etc. In this research, a customized android application is developed to collect different vehicular honking data. The application is GPS enabled and it can store the data in the form of raw audio samples as well as it can generate a text file that contains the timestamp, SPL level, and intensity value along with the location. At the time of the data collection, we used 8kHz as a sample rate, a bit depth of 16bits, mono as an audio channel, and an audio format as Wave(.wav). We have mainly obscured two different routes of Durgapur, a sub-urban city in India. Intuitively, we choose each route so that we can cover different demographic areas, which will help us to determine the context of a location. To observe the temporal variation of vehicular honking, we divided each day into three segments: morning, afternoon, and evening, and collected data for each route during these time periods. We traversed 3.5 km to 23 km in each time slot, totaling approximately 463 km distance covered. As transportation, modes, buses were utilized to cover longer distances, while cycles were used to cover shorter distances. We collected around 15 hours of data as a whole. Details of data collection information are represented in Table 3.

Table 3: Details representation of data collection procedure

Route #

Distance covered

in each trail

Total distance

covered

Audio data

duration

Mode of

Transport

Demographic areas

Route 1

23km

423km

712 mins

Bus

Hospital area, Marketplace, Bus terminals,

less/high traffic area, educational institute

Route 2

3.5km

40.5km

202 mins

Cycle

Less/high traffic area, Marketplace,

Residential area

4.1 Spatio-temporal honk signature analysis

In this section, different vehicle honking features are analyzed by considering spatio-temporal characteristics. We observed 5 minutes of data for three different places (residential area, marketplace, high traffic) and verified the count of honks, SPL level, and correlation between them for each type of vehicle. Moreover, we performed manual data labeling for this experiment.

4.1.1 Honk count of different vehicles

The honk frequency produced by different types of vehicles in a specific location can be a crucial parameter to determine the context of a location as well as the pollution level in the environment. To illustrate this, a sample case study was conducted where three different places are considered, such as a residential area, high way, and marketplace, and results are shown in Fig. 3. Fig. 3 shows that a higher number of honk counts are generated by LWV in the residential area, whereas in the case of the marketplace and highway, MWV and HWV generates a higher number of honk count, respectively, which is depicted in Fig. 3 & 3 respectively. A very sensible rhythm is observed in the honk count, which helps us to reach our aim.

4.1.2 Distribution of sound pressure level

Sound pressure level (SPL), measured in decibel(dB) value, is another important parameter in this study. Different vehicles produce different level dB values. Hence, SPL determination based on the vehicular type is essential and Kerner density function(KDE) is used to represent the distribution of the SPL for a specific area. Fig. 4 represents the distribution of the SPL for three different locations. We can visualize that the SPL distribution of the LWV is high, and HWV tends to be negligible in a residential area(Fig. 4). In Fig. 4, it is also clearly observed the highest SPL distribution of MWV in the marketplace, whereas in Fig. 4, HWV depicts the highest distribution in Highway. Moreover, it is apparent that the SPL value is in the higher range in Fig. 4 and 4 in comparison to Fig. 4. Therefore, we can say that there is a spatial variation in the SPL, which might help to determine a context of a location.

4.1.3 Correlation between SPL with honk count

To estimate the correlation between honk count with the SPL, we have used the well-known Pearson correlation (PC). In Fig. 5, we notice that in all the locations, PC values signify a strong positive correlation, implying a higher honk count and a higher SPL. Among all the locations, residential areas contain the highest positive correlation (PC= 0.94, see Fig. 5) because the traffic flow pattern is almost similar throughout the day in residential areas. In case of a highway, the correlation is strongly positive, but the value of the PC (PC= 0.78, see Fig. 5) is less than in residential areas because the traffic/vehicle movement is not well distributed throughout the day. The nature of vehicle movement is more or less similar to a marketplace, and that is the reason for the PC value of 0.86 (Fig. 5), indicating a strong positive correlation.

The experiment outlined above demonstrates a significant correlation between the spatio-temporal characteristics of honks emitted by different vehicles across various scenarios. Manual analysis of these characteristics is inherently challenging and laborious. Consequently, it is imperative to devise a system capable of automatically extracting these properties and accurately classifying honks from unprocessed raw noise samples.

5 Proposed Methodology

This section details the honk classification process and the selection of its features. Initially, we introduce honk signal modeling, followed by a description of data augmentation, data labeling, and preprocessing techniques. Subsequently, we present various deep learning models for classifying vehicular honks and determine the most suitable model for our proposed method.

5.1 Spectrogram generation

A spectrogram is a visual representation of a raw audio file, depicting the loudness of a signal over time at different frequencies present in a waveform. Similar to our previous work Maity, Alim\BCBL \BOthers. (\APACyear2022), we used spectrogram to determine honks from raw audio samples. To generate a spectrogram image, we perform Fourier Transformation, which decomposes the signals into frequencies and shows the amplitude of each frequency over time. Like Maity, Alim\BCBL \BOthers. (\APACyear2022), the current work generates spectrograms by dividing each audio sample into a 1-seconds segment, with the X-axis representing the time(sec), the Y-axis denoting frequency(Hz), and the amplitude of each frequency is represented by the different colors. Brighter color indicates a higher amplitude. As an example, we have considered a random 10-sec audio clip and converted it into spectrogram images, which is shown in Fig. 6. The figure shows that there are five possible honks (2 LWV, 1 MWV, and 2 HWV) within this 10-sec duration. Furthermore, Fig. 6, 6, and 6 illustrates the individual spectrogram image of audio data collected from LWV, MWV, and HWV environment for better understanding.

5.2 Data labeling

Deep learning classifiers require a significant volume of good-quality labeled data. In the absence of such a rich dataset related to the problem presented in this paper, we initially considered manual labeling. Nevertheless, manual labeling is simple, but it is labor-intensive, time-consuming, and more prone to human error. To deal with that, we employed two personnel to label the collected data. In our work, the non-honks data are labeled as ’0’, LWV as ’1”, MWV as ’2’, and HWV as ’3’. Besides audio files, corresponding video files are also used to make our annotation error-free for manual labeling. A total of 1694 discrete samples (1- 3 sec) spanning 30 minutes duration were labeled manually. The details of the manual label data are given in Table 4. We faced some challenges in several cases to recognize the honk of medium-weight vehicles (MWV), as shown in Table 4. For example, the MWV-honking is often labeled as HWV-honking by the personnel. Similarly, the honk of MWV is tagged as LWV-honk in many situations. Results show that our manual labeling process gives a total dissimilarity of around 7%. This dissimilarity might occur due to several issues: 1) raw audio data includes a variety of ambient noises, 2) honk sounds came from different sides of the lane, which is not captured in the videos and the signature of different vehicle honks are sometimes confusing due to their similar pattern, and 3) occasionally, multiple vehicular honks may overlap in time. To improve the accuracy of our training process, we remove mismatched data from our contemplation. However, these remaining samples are too few for training a model, which will result in more errors and low accuracy. Therefore, we require an automated data labeling model to annotate our data with higher accuracy.

Autoencoder(AE) and semi-supervised generative adversarial network (SGAN) based labeled data set preparation has been addressed in many works. The AE-based multi-class data labeling techniques are discussed in Bank \BOthers. (\APACyear2020); Wicker \BOthers. (\APACyear2016); Law \BBA Ghosh (\APACyear2019); Aamir \BOthers. (\APACyear2021). Authors Wicker \BOthers. (\APACyear2016) introduced a multi-class data labeling autoencoder model, where they have compressed the labels using AE by eliminating the non-linear dependencies. A stacked autoencoder network (SAE) is developed in Law \BBA Ghosh (\APACyear2019) to produce a discriminating and decreased input representation of the multi-label data. Similarly, the contractive autoencoder (CAE) model uses layered architecture followed by a feed-forward mechanism to encode unlabeled training data Aamir \BOthers. (\APACyear2021). Apart from these studies, the researchers also developed SGAN-based data labeling models Khan (\APACyear2022); Amin \BOthers. (\APACyear2020). Amin \BOthers. (\APACyear2020) used both SGAN and transfer learning models for labeling the data and model training at a time. In our previous work Maity, Alim\BCBL \BOthers. (\APACyear2022), we considered the SGAN model to label the data into two classes (honk or non-honk) to identify vehicular honks from traffic data in presence of ambient noise. Researchers also tried to combine the autoencoder and adversarial learning for fault diagnosis in Wen \BOthers. (\APACyear2023).

Inspired by this, at first, we tried to label our collected data using the existing autoencoder model, SGAN, SAE Law \BBA Ghosh (\APACyear2019), CAE Aamir \BOthers. (\APACyear2021). Although these techniques give satisfactory outcomes in earlier cited works, they failed to obtain decent accuracy in our work due to the above-mentioned issues. Fig. 7 shows the t-SNE plot of the data samples. To improve the clarity of the plot, we have chosen 30% of the actual data set and noticed that, only in a few cases, the honking signature of LWV and HWV overlapped, but in most of the cases, MWV honking pattern is confusing due to similarities between patterns, either HWV or LWV. Therefore, we need a model that can distinguish all the vehicle types efficiently. In this paper, we have extended the autoencoder (AE) model and proposed two data labeling models, i.e., Multi-label Autoencoder (MAE) and Multi-label Autoencoder Generative Adversarial Network (MAEGAN).

Table 4: Distribution of manual label data

Personnel 1, Personnel 2, Personnel 1 $\cap$ Personnel 2
	Non-honk	LWV	MWV	HWV
Non-honk	740,738,735	11,12,9	6,7,6	3,2,2
LWV	1,2,1	437,439,434	10,8,7	2,1,1
MWV	1,0,0	11,10,8	318,321,311	10,9,7
HWV	0,0,0	1,1,1	5,6,4	206,204,203

(i) Proposed Multi-label Autoencoder (MAE): Autoencoder is the combination of encoder (squeeze the input into a lower-dimensional code) and decoder (reconstructs the image from the lower-dimensional image) along with the latent space (compressed input). In this work, we have used four different autoencoders (AE0 to AE3), where AE0 is used for the images of labeled 0, AE1 is used for the images of labeled 1, and so on. For all four encoders, four latent space(Z0, Z1, Z2, Z3) is generated. Later we combine all the latent space and create a single latent space Z. Now, we pass this Z to CNN models for the vehicular honk classification. We calculated the probabilities outcome of each class, and the highest probability is considered as a selected class. Moreover, we cross-validate the selected class with the amplitude value of the respective raw audio samples. In this way, we have trained our MAE model for data labeling. Once the training is done, for data labeling, we take the unlabeled data and pass it to the trained encoder, from which latent space is generated for all the classes, and then pass it to the CNN model and select the highest probabilistic class with the cross-validation with corresponding amplitude values. The architecture of the proposed model is illustrated in Fig. 8.

For our proposed MAE, Latent space Z is represented as

Z_{i}=\sigma(W_{i}x_{i}+b_{i})

(1)

where $x_{i}$ is the original input image, $b_{i}$ is the bias, $W_{i}$ is the weight and $\sigma$ is the activation function. Similarly, decoding is represented as

\widehat{x_{i}}=\widehat{\sigma}(\widehat{W}z_{i}+\widehat{b_{i}})

(2)

Loss function is formulated as

\mathcal{L}_{i}=\frac{1}{N_{i}}\sum_{n=1}^{N_{i}}(x_{i}-\widehat{x_{i}})^{2}

(3)

In all cases, the value of $i$ ranges between 0 to 3.

(ii) Proposed Multi-label Autoencoder Generative Adversarial Network (MAEGAN): A generative adversarial network (GAN) has two components. One is a generator, which generates the fake images, and another one is a discriminator, which distinguishes the real and fake images. In MAEGAN, instead of one generator and one discriminator, we have used four generators (G0-G3) and four discriminators (D0-D3) to label four different classes, i.e., non-honk, LWV, MWV, HWV. In our work, G0 generates fake images for labeled 0, and D0 is used to distinguish the real and fake images of labeled 0. The remaining generators and discriminators are used for other classes of images. The proposed MAEGAN model uses the trained latent space of the MAE model as a generator. The intuition behind using the latent space of the MAE model is that it is already trained with all the different labels of the image classes. Hence, the generator will generate more accurate images, which may be hard to classify by the discriminator. Similar to the proposed MAE model, we have used the same classifier to classify different classes of honks. It is found that the proposed MAE performs better than MAEGAN, which is shown in the result section in detail.

5.3 Data augmentation and class balancing

By combining manual and MAE model generated samples, we obtained a total of 56399 samples, where 26%, 20%, and 12% of data belongs to LWV, MWV, and HWV classes, respectively, and the remaining 42% of data is in non-honk class. Due to the disparity in size among honk classes, we faced an imbalanced classification problem in our work. Class balancing in such a scenario may lead to losing a large amount of data, which may further result in the reduction of training samples. To overcome this problem, we do the data augmentation on the spectrogram image rather than raw audio. There exist multiple methods of data augmentation (like time warping, time masking, frequency masking, etc.) to enhance the data volume artificially.

In our work, we have used time and frequency mask techniques to augment the volume of data. In these techniques, some vertical bars are randomly placed to conceal the time masks and horizontal bars to conceal frequency masks on the spectrogram images, respectively. To visualize this, we have considered the spectrogram image shown in Fig. 6. We see the spectrogram image with vertical and horizontal bars showing time mask and frequency mask-based augmentation in Fig. 9 and Fig. 9, respectively. After performing data augmentation, the data volume is increased by around 60%. Next, we do the class balancing, which contains a total of 22 hours of data, and split the dataset for training and testing. The details of the distribution are shown in Table 5.

Table 5: Distribution of data before and after class balancing Maity, Alim\BCBL \BOthers. (\APACyear2022)

Manual Label Data

Labeled by MAE

After Augmentation

After Class

Balancing

Distribution of data for

training and testing

Class

#samples

Training set

Testing set

Total

1694

54705

134286

79608

63684

15924

Non-honk

741

22853

68559

19902

15921

3981

LWV

440

14170

42511

19902

15921

3981

MWV

320

11048

33146

19902

15921

3981

HWV

193

6634

19902

15921

3981

5.4 Choice of classification models

In several domains, such as object detection, image recognition, and image classification, Convolutional Neural Networks (CNNs) are widely utilized Piczak (\APACyear2015); Salamon \BBA Bello (\APACyear2017). Transfer learning-based CNN models Demir \BOthers. (\APACyear2020) are often employed to achieve higher accuracy. These models leverage pre-training on the ImageNet dataset, which consists of over 14 million images and is accessible through trusted public Keras libraries. By utilizing pre-trained models, training time is reduced, and there is less dependability on having a large training dataset.

(i) Pre-trained CNN models: For classification, we have chosen four pre-trained CNN models, namely MobileNet, ShuffleNet, ResNet 50, and Inception V3. These models were selected by considering two environments: (i) low-end systems and (ii) hardware accelerator systems. We have used two lightweight models for low-end systems: MobileNet and ShuffleNet. On the other hand, ResNet 50 and Inception V3 are used as Heavyweight models for hardware accelerator systems. For our problem, these models have been trained on our dataset. Additionally, pre-trained models are fine-tuned by adding a dense layer to overcome the problem of overfitting.

(ii) Proposed EnTL model: The ensemble model leverages the idea that aggregating the predictions of multiple models can often lead to more accurate and robust predictions than using a single model. To improve the classification accuracy, we propose a transfer learning-based ensemble method, i.e., Ensambled Transfer Learning (EnTL), combining four fine-tuned pre-trained models (i.e., MobileNet, ShuffleNet, ResNet 50, Inception V3). All the models are trained independently on different subsets of the training data, created through random sampling with replacement. Each model provides a probabilistic score for each class using the ’softmax’ function. For an input image $x$ , a model can generate four probability scores, one for each class. Let $C_{k}^{i}$ be the probability score of model $k$ on class $i$ , where $k$ = 0,1,2, … (M-1); $M$ represents the number of models and $i$ ranges from 0 to 3. We sum up the probability of each class $i$ generated by $M$ models, which is a 1-D vector for $x$ . From this vector, we select the class having the highest value using majority voting as a final class for the input image sample $x$ (see in Equation 4). Fig. 10 illustrates the architecture of the proposed model(EnTL).

Final_{outcome}=\underset{i}{\arg\max}\sum_{k=0}^{M-1}C_{k}^{i}(x)

(4)

6 Experimental result analysis

The performance of all the models for honk classification is evaluated using accuracy, Matthews correlation coefficient (MCC), F1 score, precision, recall, kappa statistics, and area under ROC curve (ROC AUC). In Section 6.1 and 6.3, we have compared the performance of different data labeling and classification models, respectively. To show the superiority of our proposed model, we have compared our proposed classification model with baselines in Section 6.5. In Section 7, we have inferred context considering real-life scenarios, and at the end, we have discussed the overall importance of our work in Section 6.6.

6.1 Comparison with existing data labeling models

To evaluate the effectiveness of the proposed data labeling models i.e., MAE and MAEGAN, we perform a comparative analysis against the conventional AE model, GAN, CAE model Aamir \BOthers. (\APACyear2021), and SAE model. All these models are discussed previously in Section 5.2). Table 6 summarizes the experimental results of all models, MAE and MAEGAN. It can be seen that the CAE model performs better in contrast to existing deep learning models i.e., the conventional AE, GAN, and SAE models. But the results indicate that our proposed MAE and MAEGAN models outperform the CAE model. In fact, the performance of MAE and MAEGAN models is $\sim$ 18% better than the CAE model. This is because the existing algorithms failed to distinguish the honking pattern of MWV and HWV in several cases due to similarities between the data samples, as shown in Fig. 7. As a result, their accuracy dropped. But our multi-label autoencoder models clearly distinguish the honking pattern of MWV, HWV, LWV, and even non-honks and thus give higher accuracy. Furthermore, comparing the performance of the MAE model with the MAEGAN model, we see that the MAE shows $\sim$ 2.5% better outcome than the MAEGAN. Due to this performance superiority, we have used the MAE model for labeling the unlabeled data in our work. To verify the correctness of the proposed MAE, we have performed a ground truth verification discussed in the following sub-section. A total of 1694 samples are labeled manually, and the remaining data are labeled using MAE. The entire data set is divided into five distinct groups, and we fed each group of data one by one to the MAE model for labeling. A total of 54705 samples are labeled by the proposed MAE model with more than 97 % accuracy. Epoch-wise training and validation accuracy are also depicted in Fig. 11.

Table 6: Performance analysis of different models for data labeling

Baseline Models

Proposed Models

Evaluation Metric

Bank \BOthers. (\APACyear2020)

SAE

Law \BBA Ghosh (\APACyear2019)

CAE

Aamir \BOthers. (\APACyear2021)

SGAN

Maity, Alim\BCBL \BOthers. (\APACyear2022)

MAE

MAEGAN

Accuracy

61.65%

70.08%

79.48%

66.19%

97.64%

94.99%

MCC

0.43

0.62

0.71

0.49

0.97

0.93

F1 Score

0.54

0.48

0.71

0.59

0.96

0.92

Precision

0.61

0.83

0.86

0.65

0.96

0.93

Recall

0.52

0.69

0.57

0.97

0.93

Kappa Statistics

0.41

0.56

0.69

0.49

0.97

0.93

Roc AUC

0.69

0.71

0.80

0.74

0.98

0.96

6.2 Ground truth verification of labeled data w.r.t data models

To check the correctness of our proposed MAE model, we picked up 340 samples randomly for validation and labeled them using MAE and MAEGAN models. The samples were also fed to other existing data models (AE, GAN, SAE, CAE) for labeling and then compared with MAE and MAEGAN models. For ground truth verification, these 340 samples are further labeled manually by two volunteers, where the number of LWV, MWV, and HWV honking samples are 88, 64, and 39, respectively. The results are shown in Fig. 12. We see that only one honk of MWV is misclassified as the honk of HWV, and 8 HWV samples are detected as LWV honks by the MAE model. It implies that out of 340 samples, only 9 samples are wrongly tagged by the proposed MAE model, while the misclassified samples generated by other models are 16, 115, 150, 139, and 166 for MAEGAN, AE, SAE, CAE, SGAN models respectively. The result signifies that our proposed MAE model can correctly label the unlabeled samples.

6.3 Classification performance evaluation of pre-trained models and EnTL model

Table 7 illustrates the performance comparison for all the pre-trained models and EnTL for classifying honks using the chosen performance measures. Here, we have used 79608 labeled samples, where 63684 samples were for training and 15924 samples were for testing. We have trained all pre-trained models using the Adam optimizer for 20 epochs. The input image size is 224 × 224 × 3, the learning rate is set at 0.001, and the batch size is set to 312. Results indicate that lightweight models i.e., MobileNet and ShuffleNet, have achieved test accuracy at around 88.82% and 78.59%, respectively.

On the other hand, ResNet 50 and Inception V3 have achieved accuracies of 93.77% and 95.17% , respectively. among the pre-trained models, Inception V3 gives better results in our dataset. Nevertheless, the proposed EnTL model surpasses that of Inception V3 with an accuracy of 96.72% and demonstrates classification accuracy of 95.47%, 97.32%, and 97.72% for LWV, MWV, and HWV, respectively. Additionally, the attained values of MCC, F1-score, precision, recall, Kappa statistics, and ROC-AUC demonstrate that EnTL outperforms other models. The confusion matrix, shown in Fig. 13, is calculated for each model. We have found that the misclassification (false positive) rate is significantly less in EnTL. Only a few non-honk samples (2.7%) are detected as LWV because sometimes the pattern of engine noise/several other noises becomes similar to the LWV. Similarly, 2.41% samples of LWV are classified as MWV due to the same type of honking pattern that may arise in the case of motorbikes and four-wheeler vehicles. The remaining misclassification rate is less than 1%, which is negligible. As a note, the classification performance of the proposed EnTL is better than any single transfer learning model and can be used as a model to classify vehicular honks even in the presence of ambient noises.

Table 7: Performance analysis of different models for honk classification

	Light weight model		Heavy weight model		Proposed model
Evaluation Metric	MobileNet	ShuffleNet	ResNet 50	Inception V3	EnTL
Accuracy	88.82%	78.59%	93.77%	95.17%	96.72%
MCC	0.85	0.72	0.92	0.94	0.96
F1 Score	0.89	0.79	0.94	0.95	0.97
Precision	0.89	0.79	0.94	0.95	0.97
Recall	0.89	0.79	0.94	0.95	0.97
Kappa Statistics	0.85	0.71	0.92	0.94	0.96
ROC AUC	0.93	0.86	0.96	0.97	0.98

6.4 Significance of dataset

To show the importance of our curated dataset, we have trained the best-performing model EnTL for different datasets and plotted the training accuracy in Fig. 14. At first, we trained the model with manually labeled samples. The training accuracy dropped since our manually labeled samples(1694) were few. Next, we trained the model using MAE-generated labeled samples (54705) and obtained better accuracy than training with manual label samples. Finally, the model was trained with the enriched dataset(79806) generated by data augmentation techniques, and we achieved a very good training accuracy. Hence, our proposed methodology and the dataset play a crucial role in this study as well as the annotate dataset can be useful for other research.

6.5 Performance comparison of EnTL with the recent literature of honk classification

In this section, the performance of the EnTL model is compared with baseline work using chosen performance metrics. Here, we have considered SBCNN Salamon \BBA Bello (\APACyear2017), Dilated CNN Chen \BOthers. (\APACyear2019), CNN Demir \BOthers. (\APACyear2020), and TFCNN Mu \BOthers. (\APACyear2021) as baseline works. The configuration settings of each baseline model are represented in Table 8. The detailed results are presented in Table 9, from where we notice that the performance of EnTL is increased by at least $\sim$ 9% and at most $\sim$ 21% in contrast to other models. The baseline models reported higher sound classification accuracy in publicly available datasets like ESC-10, ESC-50, or UrbanSound8k, where ambient noise was either absent or not considered. Moreover, each audio file’s duration is much less in these datasets. In contrast, our data was collected from a real environment in the presence of various ambient noises. Hence, the accuracy fell sharply when we fed our dataset to the existing models. However, the Dilated CNN provides the best output among all the baseline models. When we compare our model with Dilated CNN model, the performance of our model is increased by 10.15%, 17.07%, 11.49%, 10.22%, 12.79%, 17.07%, 7.69% for accuracy, MCC, F1-score, Precision, Recall, Kappa Statistics, ROC AUC. The EnTL model is good for classifying honk based on the vehicle types in the presence of ambient noises.

Table 8: Specifications of the baseline models Maity, Alim\BCBL \BOthers. (\APACyear2022)

Models

Conv2d

Max polling

Up Sampling

Dense Layer

Batch

Normalization

Activation

Function

SB-CNN Salamon \BBA Bello (\APACyear2017)

Softmax

Dilated CNN Chen \BOthers. (\APACyear2019)

Softmax

CNN Demir \BOthers. (\APACyear2020)

Softmax

TFCNN Mu \BOthers. (\APACyear2021)

Softmax

Table 9: Assessing the EnTL model’s performance in comparison to standard baseline methods.

Accuracy

MCC

F1 Score

Precision

Recall

Kappa

Statistics

ROC

AUC

TFCNN Mu \BOthers. (\APACyear2021)

77.82%

0.71

0.78

0.79

0.78

0.70

0.85

CNN Demir \BOthers. (\APACyear2020)

80.45%

0.76

0.81

0.88

0.80

0.74

0.87

Dilated CNN Chen \BOthers. (\APACyear2019)

86.57%

0.82

0.87

0.88

0.86

0.82

0.91

SBCNN Salamon \BBA Bello (\APACyear2017)

75.23%

0.68

0.74

0.79

0.75

0.67

0.83

Our Proposed

Model (EnTL)

96.72%

0.96

0.97

0.96

0.98

6.6 Discussion & remarks

•

Importance of our curated dataset: Due to the fundamental nature of the traffic data, our dataset was the combination of several ambient noises, but still, we achieved a decent accuracy for honk classification. When we performed the same experiment with the baseline models, the overall accuracy decreased as all the baseline models were trained either in a controlled environment or ambient noise-free data only. Moreover, we have generated 22 hours of labeled data repository, which can further be used for developing several micro-services based on honk features.
•

Significance of our proposed data labeling model: Due to the similar pattern of honk signature, the existing data labeling models failed to reach a good accuracy, whereas our proposed MAE model succeeded with 21% more accuracy than existing models. Therefore, we can extrapolate that this proposed model is more suitable and will give better outcomes when the data pattern is very closely related.
•

Role of honk classification models: There may be circumstances where real-time detection and model execution are both required on some resource-constrained devices like IoT edge devices or smartphones. In this situation, we suggest using the MobileNet model because it is lightweight and doesn’t significantly degrade performance (by about 6.53% as seen in our tests). Furthermore, our proposed model EnTL outperforms other models and provides the best outcome.
•

Inference of context: Proper honk classification facilitates us in recognizing the outdoor context of a location. It has had a big influence on researching passive sensing methods to develop several micro-services. In this research, we have presented a glimpse of spatio-temporal context of three different locations.

7 Context identification

Location information can be learned by identifying the context patterns in road traffic. The location context can be detected using a variety of techniques. GPS data has been used in many studies for detecting location context, transportation mode detection, etc. Despite providing output with high precision, GPS comes with several drawbacks. These include tracking personal data without prior permission or knowledge, fast battery drains, and so on. The work presented in this paper has mainly concentrated on identifying the context of a location based on the movement of different types of vehicles in a locality. As shown in 6.3, the proposed model EnTL classifies different vehicular honks based on the vehicle types with higher accuracy, which can be an excellent indicator for understanding the location. Along with the classified honks, we can also use the SPL or sound pressure level of a location to detect location context. As a case study, we have presented context identification based on spatio-temporal characteristics below.

7.1 Spatio-temporal context inference

To assess the context of a location, we have collected vehicular honk data from different outdoor scenes i.e. highways, market place, and residential areas for three different time slots. Continuous ten days of data have been collected for the five minutes duration of each time slot. The overall hoking signature for all the locations and time slots are depicted in Fig. 15. We have also provided a snap of all the places that were visited in different time slots. From Fig. 15, we have observed the following sound pattern (f# denotes the features number for each of the locations):

•

Highway: [f1] Throughout the day, the highest number of honks are generated by HWV. [f2] The number of honks generated by the LWV is the lowest as compared with others. [f3] All vehicle honks increase as time proceeds.
•

Market place: [f1] Honks of LWV and MWV are higher. [f3] comparatively, HWV honks are much lower. [f2] Overall honking pattern goes down in the afternoon.
•

Residential Area: [f1] Some LWV honks are found throughout the day [f2] HWV honks are almost negligible in this place. [f3] In comparison to other places, the number of honks is significantly less in residential areas.

Therefore, we can say that a unique temporal pattern exists in all the locations, which distinguishes one place from another.

7.2 Ground truth verification of context in real scenario

In order to verify how well our proposed system $\mathcal{A}$ $\mathcal{C}$ lassi $\mathcal{H}$ onk works, we moved through four different areas in a single trace and plotted the various honk data in a graph. The perceived result is presented in Fig. 16 and the street view of our travel is also shown along with the latitude and longitude value in Fig. 17. We started our journey from the residential area and then passed through a marketplace, low-traffic area, high-traffic area, and again returned back to the residential area. Fig. 16 shows all most similar kinds of honking patterns that we observed in Section 7.1. A distinguishable honking pattern among all the locations is visible, which justifies our institution to identify the context of a place.

8 Conclusion & future work

Noise pollution and vehicular honking complement each other, and their adverse impact is long-lasting for daily human life. Hence understanding the honking behavior is an important aspect based on spatio-temporal characteristics. In this paper, we have developed $\mathcal{A}$ $\mathcal{C}$ lassi $\mathcal{H}$ onk, a novel framework, which can classify different vehicle honks from the raw audio samples. As an application, we can detect and understand the context of a location based on the vehicle honking in road traffic. Due to the similar honking pattern of some vehicles, the existing deep learning-based data labeling models failed to obtain good accuracy in labeling the unlabeled honk samples. In this paper, we have proposed a Multi-label autoencoder (MAE), an extended AE model, that correctly labels the unlabeled honk samples for training with higher accuracy. Moreover, we have evaluated some pre-trained CNN models (ShuffleNet, MobileNet, ResNet 50, Inception V3) and proposed an ensemble model named EnTL combining four pre-trained models by using majority voting to classify vehicular honks. Experimental results illustrate that EnTL performs better than the pre-trained models and also other CNN-based baseline models with a higher percentage of accuracy. The classified honking pattern and SPL of an area are further utilized to derive the context of a location. Additionally, we have validated our model with ground truth data.

Finally, we would like to mention a few potential future scopes and the directions of this research. In the future, we can incorporate the following features to get a more authentic context of a location: i) Our models successfully classify honks based on their vehicle types. However, the extension of honk-level features like honk duration, inter-honk gap, etc., can further provide insights into traffic conditions and the driving culture of a particular location. ii) The correlation between noise and air quality, which was studied in some existing works, can be revisited in the context of such vehicle classes. iii) Our system failed to identify vehicle types when honks from multiple vehicle types overlapped in time. Suitable modification of the model is required for further improvement.

References

\bibcommenthead
Aamir \BOthers. (\APACyear2021) \APACinsertmetastaraamir2021deep{APACrefauthors}Aamir, M., Mohd Nawi, N., Wahid, F.\BCBL Mahdin, H. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleA deep contractive autoencoder for solving multiclass classification problems A deep contractive autoencoder for solving multiclass classification problems.\BBCQ \APACjournalVolNumPagesEvolutionary Intelligence141619–1633. \PrintBackRefs\CurrentBib
Abdoli \BOthers. (\APACyear2019) \APACinsertmetastarabdoli2019end{APACrefauthors}Abdoli, S., Cardinal, P.\BCBL Koerich, A.L. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleEnd-to-end environmental sound classification using a 1D convolutional neural network End-to-end environmental sound classification using a 1d convolutional neural network.\BBCQ \APACjournalVolNumPagesExpert Systems with Applications136252–263. \PrintBackRefs\CurrentBib
Ahmed \BOthers. (\APACyear2020) \APACinsertmetastarahmed2020automatic{APACrefauthors}Ahmed, M., Robin, T.I., Shafin, A.A.\BCBL \BOthersPeriod. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleAutomatic environmental sound recognition (AESR) using convolutional neural network Automatic environmental sound recognition (aesr) using convolutional neural network.\BBCQ \APACjournalVolNumPagesInternational Journal of Modern Education & Computer Science125. \PrintBackRefs\CurrentBib
Allen \BOthers. (\APACyear2009) \APACinsertmetastarallen2009spatial{APACrefauthors}Allen, R.W., Davies, H., Cohen, M.A., Mallach, G., Kaufman, J.D.\BCBL Adar, S.D. \APACrefYearMonthDay2009. \BBOQ\APACrefatitleThe spatial relationship between traffic-generated air pollution and noise in 2 US cities The spatial relationship between traffic-generated air pollution and noise in 2 us cities.\BBCQ \APACjournalVolNumPagesEnvironmental research1093334–342. \PrintBackRefs\CurrentBib
Amin \BOthers. (\APACyear2020) \APACinsertmetastaramin2020semi{APACrefauthors}Amin, I., Hassan, S.\BCBL Jaafar, J. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleSemi-Supervised Learning for Limited Medical Data Using Generative Adversarial Network and Transfer Learning Semi-supervised learning for limited medical data using generative adversarial network and transfer learning.\BBCQ \APACrefbtitle2020 International Conference on Computational Intelligence (ICCI) 2020 international conference on computational intelligence (icci) (\BPGS 5–10). \PrintBackRefs\CurrentBib
Andrade \BOthers. (\APACyear2024) \APACinsertmetastarandrade2024urban{APACrefauthors}Andrade, E.d.L., de Lima, E.A., Martins, A.C.G., Zannin, P.H.T.\BCBL da Cunha e Silva, D.C. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleUrban noise assessment in hospitals: measurements and mapping in the context of the city of Sorocaba, Brazil Urban noise assessment in hospitals: measurements and mapping in the context of the city of sorocaba, brazil.\BBCQ \APACjournalVolNumPagesEnvironmental Monitoring and Assessment1963267. \PrintBackRefs\CurrentBib
Attallah (\APACyear2023) \APACinsertmetastarattallah2023cercan{APACrefauthors}Attallah, O. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleCerCan· Net: Cervical Cancer Classification Model via Multi-layer Feature Ensembles of Lightweight CNNs and Transfer Learning Cercan· net: Cervical cancer classification model via multi-layer feature ensembles of lightweight cnns and transfer learning.\BBCQ \APACjournalVolNumPagesExpert Systems with Applications120624. \PrintBackRefs\CurrentBib
Banerjee \BBA Sinha (\APACyear2012) \APACinsertmetastarbanerjee2012two{APACrefauthors}Banerjee, R.\BCBT \BBA Sinha, A. \APACrefYearMonthDay2012. \BBOQ\APACrefatitleTwo stage feature extraction using modified MFCC for honk detection Two stage feature extraction using modified mfcc for honk detection.\BBCQ \APACrefbtitle2012 International Conference on Communications, Devices and Intelligent Systems (CODIS) 2012 international conference on communications, devices and intelligent systems (codis) (\BPGS 97–100). \PrintBackRefs\CurrentBib
Bank \BOthers. (\APACyear2020) \APACinsertmetastarbank2020autoencoders{APACrefauthors}Bank, D., Koenigstein, N.\BCBL Giryes, R. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleAutoencoders Autoencoders.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2003.05991. \PrintBackRefs\CurrentBib
Bello \BOthers. (\APACyear2019) \APACinsertmetastarbello2019sonyc{APACrefauthors}Bello, J.P., Silva, C., Nov, O., Dubois, R.L., Arora, A., Salamon, J.\BDBLDoraiswamy, H. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleSonyc: A system for monitoring, analyzing, and mitigating urban noise pollution Sonyc: A system for monitoring, analyzing, and mitigating urban noise pollution.\BBCQ \APACjournalVolNumPagesCommunications of the ACM62268–77. \PrintBackRefs\CurrentBib
Chen \BOthers. (\APACyear2019) \APACinsertmetastarchen2019environmental{APACrefauthors}Chen, Y., Guo, Q., Liang, X., Wang, J.\BCBL Qian, Y. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleEnvironmental sound classification with dilated convolutions Environmental sound classification with dilated convolutions.\BBCQ \APACjournalVolNumPagesApplied Acoustics148123–132. \PrintBackRefs\CurrentBib
Chouksey \BOthers. (\APACyear2023) \APACinsertmetastarchouksey2023heterogeneous{APACrefauthors}Chouksey, A.K., Kumar, B., Parida, M., Pandey, A.D.\BCBL Verma, G. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleHeterogeneous road traffic noise modeling at mid-block sections of mid-sized city in India Heterogeneous road traffic noise modeling at mid-block sections of mid-sized city in india.\BBCQ \APACjournalVolNumPagesEnvironmental Monitoring and Assessment195111349. \PrintBackRefs\CurrentBib
Cobos \BOthers. (\APACyear2017) \APACinsertmetastarcobos2017survey{APACrefauthors}Cobos, M., Antonacci, F., Alexandridis, A., Mouchtaris, A.\BCBL Lee, B. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleA survey of sound source localization methods in wireless acoustic sensor networks A survey of sound source localization methods in wireless acoustic sensor networks.\BBCQ \APACjournalVolNumPagesWireless Communications and Mobile Computing2017. \PrintBackRefs\CurrentBib
de la Rosa \BOthers. (\APACyear2022) \APACinsertmetastarde2022geometric{APACrefauthors}de la Rosa, F.L., Gómez-Sirvent, J.L., Sánchez-Reolid, R., Morales, R.\BCBL Fernández-Caballero, A. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleGeometric transformation-based data augmentation on defect classification of segmented images of semiconductor materials using a ResNet50 convolutional neural network Geometric transformation-based data augmentation on defect classification of segmented images of semiconductor materials using a resnet50 convolutional neural network.\BBCQ \APACjournalVolNumPagesExpert Systems with Applications206117731. \PrintBackRefs\CurrentBib
Demir \BOthers. (\APACyear2020) \APACinsertmetastardemir2020new{APACrefauthors}Demir, F., Abdullah, D.A.\BCBL Sengur, A. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleA new deep CNN model for environmental sound classification A new deep cnn model for environmental sound classification.\BBCQ \APACjournalVolNumPagesIEEE Access866529–66537. \PrintBackRefs\CurrentBib
Dim \BOthers. (\APACyear2020) \APACinsertmetastardim2020smartphone{APACrefauthors}Dim, C.A., Feitosa, R.M., Mota, M.P.\BCBL Morais, J.M.d. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleA Smartphone Application for Car Horn Detection to Assist Hearing-Impaired People in Driving A smartphone application for car horn detection to assist hearing-impaired people in driving.\BBCQ \APACrefbtitleInternational Conference on Computational Science and Its Applications International conference on computational science and its applications (\BPGS 104–116). \PrintBackRefs\CurrentBib
Elhassan \BOthers. (\APACyear2021) \APACinsertmetastarelhassan2021dsanet{APACrefauthors}Elhassan, M.A., Huang, C., Yang, C.\BCBL Munea, T.L. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleDSANet: Dilated spatial attention for real-time semantic segmentation in urban street scenes Dsanet: Dilated spatial attention for real-time semantic segmentation in urban street scenes.\BBCQ \APACjournalVolNumPagesExpert Systems with Applications183115090. \PrintBackRefs\CurrentBib
Ezhilarasi \BOthers. (\APACyear2017) \APACinsertmetastarezhilarasi2017system{APACrefauthors}Ezhilarasi, L., Sripriya, K., Suganya, A.\BCBL Vinodhini, K. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleA system for monitoring air and sound pollution using arduino controller with iot technology A system for monitoring air and sound pollution using arduino controller with iot technology.\BBCQ \APACjournalVolNumPagesInternational Research Journal in Advanced Engineering and Technology (IRJAET)321781–1785. \PrintBackRefs\CurrentBib
Firdaus \BBA Ahmad (\APACyear2010) \APACinsertmetastarfirdaus2010noise{APACrefauthors}Firdaus, G.\BCBT \BBA Ahmad, A. \APACrefYearMonthDay2010. \BBOQ\APACrefatitleNoise pollution and human health: a case study of municipal corporation of Delhi Noise pollution and human health: a case study of municipal corporation of delhi.\BBCQ \APACjournalVolNumPagesIndoor and built environment196648–656. \PrintBackRefs\CurrentBib
Garg \BOthers. (\APACyear2015) \APACinsertmetastargarg2015applications{APACrefauthors}Garg, N., Soni, K., Saxena, T.\BCBL Maji, S. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleApplications of Autoregressive integrated moving average (ARIMA) approach in time-series prediction of traffic noise pollution Applications of autoregressive integrated moving average (arima) approach in time-series prediction of traffic noise pollution.\BBCQ \APACjournalVolNumPagesNoise Control Engineering Journal632182–194. \PrintBackRefs\CurrentBib
Grondin \BBA Michaud (\APACyear2019) \APACinsertmetastargrondin2019lightweight{APACrefauthors}Grondin, F.\BCBT \BBA Michaud, F. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleLightweight and optimized sound source localization and tracking methods for open and closed microphone array configurations Lightweight and optimized sound source localization and tracking methods for open and closed microphone array configurations.\BBCQ \APACjournalVolNumPagesRobotics and Autonomous Systems11363–80. \PrintBackRefs\CurrentBib
Guarnaccia \BOthers. (\APACyear2017) \APACinsertmetastarguarnaccia2017development{APACrefauthors}Guarnaccia, C., Mastorakis, N.E., Quartieri, J., Tepedino, C.\BCBL Kaminaris, S.D. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleDevelopment of seasonal ARIMA models for traffic noise forecasting Development of seasonal arima models for traffic noise forecasting.\BBCQ \APACrefbtitleMATEC Web of Conferences Matec web of conferences (\BVOL 125, \BPG 05013). \PrintBackRefs\CurrentBib
Gupta \BOthers. (\APACyear2018) \APACinsertmetastargupta2018noise{APACrefauthors}Gupta, A., Gupta, A., Jain, K.\BCBL Gupta, S. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleNoise pollution and impact on children health Noise pollution and impact on children health.\BBCQ \APACjournalVolNumPagesThe Indian Journal of Pediatrics854300–306. \PrintBackRefs\CurrentBib
Guzhov \BOthers. (\APACyear2021) \APACinsertmetastarguzhov2021esresnet{APACrefauthors}Guzhov, A., Raue, F., Hees, J.\BCBL Dengel, A. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleEsresnet: Environmental sound classification based on visual domain models Esresnet: Environmental sound classification based on visual domain models.\BBCQ \APACrefbtitle2020 25th International Conference on Pattern Recognition (ICPR) 2020 25th international conference on pattern recognition (icpr) (\BPGS 4933–4940). \PrintBackRefs\CurrentBib
Hammer \BOthers. (\APACyear2014) \APACinsertmetastarhammer2014environmental{APACrefauthors}Hammer, M.S., Swinburn, T.K.\BCBL Neitzel, R.L. \APACrefYearMonthDay2014. \BBOQ\APACrefatitleEnvironmental noise pollution in the United States: developing an effective public health response Environmental noise pollution in the united states: developing an effective public health response.\BBCQ \APACjournalVolNumPagesEnvironmental health perspectives1222115–119. \PrintBackRefs\CurrentBib
Hu \BOthers. (\APACyear2022) \APACinsertmetastarhu2022comprehensive{APACrefauthors}Hu, Q., Wu, X.\BCBL Bian, L. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleComprehensive diagnosis model of environmental impact caused by expressway vehicle emission Comprehensive diagnosis model of environmental impact caused by expressway vehicle emission.\BBCQ \APACjournalVolNumPagesEnvironmental Monitoring and Assessment19411796. \PrintBackRefs\CurrentBib
Jariwala \BOthers. (\APACyear2017) \APACinsertmetastarjariwala2017noise{APACrefauthors}Jariwala, H.J., Syed, H.S., Pandya, M.J.\BCBL Gajera, Y.M. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleNoise pollution & human health: a review Noise pollution & human health: a review.\BBCQ \APACjournalVolNumPagesIndoor Built Environ1–4. \PrintBackRefs\CurrentBib
Jezdović \BOthers. (\APACyear2021) \APACinsertmetastarjezdovic2021crowdsensing{APACrefauthors}Jezdović, I., Popović, S., Radenković, M., Labus, A.\BCBL Bogdanović, Z. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleA crowdsensing platform for real-time monitoring and analysis of noise pollution in smart cities A crowdsensing platform for real-time monitoring and analysis of noise pollution in smart cities.\BBCQ \APACjournalVolNumPagesSustainable Computing: Informatics and Systems31100588. \PrintBackRefs\CurrentBib
Kalawapudi \BOthers. (\APACyear2020) \APACinsertmetastarkalawapudi2020noise{APACrefauthors}Kalawapudi, K., Singh, T., Dey, J., Vijay, R.\BCBL Kumar, R. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleNoise pollution in Mumbai Metropolitan Region (MMR): An emerging environmental threat Noise pollution in mumbai metropolitan region (mmr): An emerging environmental threat.\BBCQ \APACjournalVolNumPagesEnvironmental monitoring and assessment1921–20. \PrintBackRefs\CurrentBib
Khamparia \BOthers. (\APACyear2019) \APACinsertmetastarkhamparia2019sound{APACrefauthors}Khamparia, A., Gupta, D., Nguyen, N.G., Khanna, A., Pandey, B.\BCBL Tiwari, P. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleSound classification using convolutional neural network and tensor deep stacking network Sound classification using convolutional neural network and tensor deep stacking network.\BBCQ \APACjournalVolNumPagesIEEE Access77717–7727. \PrintBackRefs\CurrentBib
Khan (\APACyear2022) \APACinsertmetastarkhan2022semi{APACrefauthors}Khan, N. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleSemi-Supervised Generative Adversarial Network for Stress Detection Using Partially Labeled Physiological Data Semi-supervised generative adversarial network for stress detection using partially labeled physiological data.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2206.14976. \PrintBackRefs\CurrentBib
Law \BBA Ghosh (\APACyear2019) \APACinsertmetastarlaw2019multi{APACrefauthors}Law, A.\BCBT \BBA Ghosh, A. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleMulti-label classification using a cascade of stacked autoencoder and extreme learning machines Multi-label classification using a cascade of stacked autoencoder and extreme learning machines.\BBCQ \APACjournalVolNumPagesNeurocomputing358222–234. \PrintBackRefs\CurrentBib
Maity, Alim\BCBL \BOthers. (\APACyear2022) \APACinsertmetastarmaity2022dehonk{APACrefauthors}Maity, B., Alim, A., Bhattacharjee, S.\BCBL Nandi, S. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleDeHonk: A deep learning based system to characterize vehicular honks in presence of ambient noise Dehonk: A deep learning based system to characterize vehicular honks in presence of ambient noise.\BBCQ \APACjournalVolNumPagesPervasive and Mobile Computing88101727. \PrintBackRefs\CurrentBib
Maity, Polapragada\BCBL \BOthers. (\APACyear2022) \APACinsertmetastarmaity2022coan{APACrefauthors}Maity, B., Polapragada, Y., Bhattacharjee, S.\BCBL Nandi, S. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleCoAN: A system framework correlating the air and noise pollution sensor data Coan: A system framework correlating the air and noise pollution sensor data.\BBCQ \APACjournalVolNumPagesPervasive and Mobile Computing81101546. \PrintBackRefs\CurrentBib
Maity, Trinath\BCBL \BOthers. (\APACyear2022) \APACinsertmetastarmaity2022predhonk{APACrefauthors}Maity, B., Trinath, M.A.S.L.P., Bhattacharjee, S.\BCBL Nandi, S. \APACrefYearMonthDay2022. \BBOQ\APACrefatitlePredHonk: A Framework to Predict Vehicular Honk Count using Deep Learning Models Predhonk: A framework to predict vehicular honk count using deep learning models.\BBCQ \APACrefbtitleTENCON 2022-2022 IEEE Region 10 Conference (TENCON) Tencon 2022-2022 ieee region 10 conference (tencon) (\BPGS 1–6). \PrintBackRefs\CurrentBib
Mann \BBA Singh (\APACyear2024) \APACinsertmetastarmann2024random{APACrefauthors}Mann, S.\BCBT \BBA Singh, G. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleRandom effect generalized linear model-based predictive modelling of traffic noise Random effect generalized linear model-based predictive modelling of traffic noise.\BBCQ \APACjournalVolNumPagesEnvironmental Monitoring and Assessment1962168. \PrintBackRefs\CurrentBib
Marwah \BBA Agrawala (\APACyear2022) \APACinsertmetastarmarwah2022covid{APACrefauthors}Marwah, M.\BCBT \BBA Agrawala, P.K. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleCOVID-19 lockdown and environmental pollution: an Indian multi-state investigation Covid-19 lockdown and environmental pollution: an indian multi-state investigation.\BBCQ \APACjournalVolNumPagesEnvironmental Monitoring and Assessment194249. \PrintBackRefs\CurrentBib
Medina-Salgado \BOthers. (\APACyear2022) \APACinsertmetastarmedina2022urban{APACrefauthors}Medina-Salgado, B., Sanchez-DelaCruz, E., Pozos-Parra, P.\BCBL Sierra, J.E. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleUrban traffic flow prediction techniques: A review Urban traffic flow prediction techniques: A review.\BBCQ \APACjournalVolNumPagesSustainable Computing: Informatics and Systems35100739. \PrintBackRefs\CurrentBib
Mesaros \BOthers. (\APACyear2019) \APACinsertmetastarmesaros2019sound{APACrefauthors}Mesaros, A., Diment, A., Elizalde, B., Heittola, T., Vincent, E., Raj, B.\BCBL Virtanen, T. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleSound event detection in the DCASE 2017 challenge Sound event detection in the dcase 2017 challenge.\BBCQ \APACjournalVolNumPagesIEEE/ACM Transactions on Audio, Speech, and Language Processing276992–1006. \PrintBackRefs\CurrentBib
Michali \BOthers. (\APACyear2021) \APACinsertmetastarmichali2021noise{APACrefauthors}Michali, M., Emrouznejad, A., Dehnokhalaji, A.\BCBL Clegg, B. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleNoise-pollution efficiency analysis of European railways: A network DEA model Noise-pollution efficiency analysis of european railways: A network dea model.\BBCQ \APACjournalVolNumPagesTransportation Research Part D: Transport and Environment98102980. \PrintBackRefs\CurrentBib
Mu \BOthers. (\APACyear2021) \APACinsertmetastarmu2021environmental{APACrefauthors}Mu, W., Yin, B., Huang, X., Xu, J.\BCBL Du, Z. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleEnvironmental sound classification using temporal-frequency attention based convolutional neural network Environmental sound classification using temporal-frequency attention based convolutional neural network.\BBCQ \APACjournalVolNumPagesScientific Reports1111–14. \PrintBackRefs\CurrentBib
Mushtaq \BBA Su (\APACyear2020) \APACinsertmetastarmushtaq2020environmental{APACrefauthors}Mushtaq, Z.\BCBT \BBA Su, S\BHBIF. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleEnvironmental sound classification using a regularized deep convolutional neural network with data augmentation Environmental sound classification using a regularized deep convolutional neural network with data augmentation.\BBCQ \APACjournalVolNumPagesApplied Acoustics167107389. \PrintBackRefs\CurrentBib
Navarro \BOthers. (\APACyear2020) \APACinsertmetastarnavarro2020sound{APACrefauthors}Navarro, J.M., Martínez-España, R., Bueno-Crespo, A., Martínez, R.\BCBL Cecilia, J.M. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleSound levels forecasting in an acoustic sensor network using a deep neural network Sound levels forecasting in an acoustic sensor network using a deep neural network.\BBCQ \APACjournalVolNumPagesSensors203903. \PrintBackRefs\CurrentBib
Palecek \BBA Cerny (\APACyear2016) \APACinsertmetastarpalecek2016emergency{APACrefauthors}Palecek, J.\BCBT \BBA Cerny, M. \APACrefYearMonthDay2016. \BBOQ\APACrefatitleEmergency horn detection using embedded systems Emergency horn detection using embedded systems.\BBCQ \APACrefbtitle2016 IEEE 14th International Symposium on Applied Machine Intelligence and Informatics (SAMI) 2016 ieee 14th international symposium on applied machine intelligence and informatics (sami) (\BPGS 257–261). \PrintBackRefs\CurrentBib
Piczak (\APACyear2015) \APACinsertmetastarpiczak2015environmental{APACrefauthors}Piczak, K.J. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleEnvironmental sound classification with convolutional neural networks Environmental sound classification with convolutional neural networks.\BBCQ \APACjournalVolNumPagesIEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)1–6. \PrintBackRefs\CurrentBib
Saha \BOthers. (\APACyear2018) \APACinsertmetastarsaha2018raspberry{APACrefauthors}Saha, A.K., Sircar, S., Chatterjee, P., Dutta, S., Mitra, A., Chatterjee, A.\BDBLSaha, H.N. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleA raspberry Pi controlled cloud based air and sound pollution monitoring system with temperature and humidity sensing A raspberry pi controlled cloud based air and sound pollution monitoring system with temperature and humidity sensing.\BBCQ \APACrefbtitle2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC) 2018 ieee 8th annual computing and communication workshop and conference (ccwc) (\BPGS 607–611). \PrintBackRefs\CurrentBib
Salamon \BBA Bello (\APACyear2017) \APACinsertmetastarsalamon2017deep{APACrefauthors}Salamon, J.\BCBT \BBA Bello, J.P. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleDeep convolutional neural networks and data augmentation for environmental sound classification Deep convolutional neural networks and data augmentation for environmental sound classification.\BBCQ \APACjournalVolNumPagesIEEE Signal Processing Letters243279–283. \PrintBackRefs\CurrentBib
Santini \BOthers. (\APACyear2008) \APACinsertmetastarsantini2008first{APACrefauthors}Santini, S., Ostermaier, B.\BCBL Vitaletti, A. \APACrefYearMonthDay2008. \BBOQ\APACrefatitleFirst experiences using wireless sensor networks for noise pollution monitoring First experiences using wireless sensor networks for noise pollution monitoring.\BBCQ \APACrefbtitleProceedings of the workshop on Real-world wireless sensor networks Proceedings of the workshop on real-world wireless sensor networks (\BPGS 61–65). \PrintBackRefs\CurrentBib
Segura-Garcia \BOthers. (\APACyear2014) \APACinsertmetastarsegura2014low{APACrefauthors}Segura-Garcia, J., Felici-Castell, S., Perez-Solano, J.J., Cobos, M.\BCBL Navarro, J.M. \APACrefYearMonthDay2014. \BBOQ\APACrefatitleLow-cost alternatives for urban noise nuisance monitoring using wireless sensor networks Low-cost alternatives for urban noise nuisance monitoring using wireless sensor networks.\BBCQ \APACjournalVolNumPagesIEEE Sensors Journal152836–844. \PrintBackRefs\CurrentBib
Sen \BOthers. (\APACyear2010) \APACinsertmetastarsen2010horn{APACrefauthors}Sen, R., Raman, B.\BCBL Sharma, P. \APACrefYearMonthDay2010. \BBOQ\APACrefatitleHorn-ok-please Horn-ok-please.\BBCQ \APACrefbtitleProceedings of the 8th international conference on Mobile systems, applications, and services Proceedings of the 8th international conference on mobile systems, applications, and services (\BPGS 137–150). \PrintBackRefs\CurrentBib
Shekhar \BOthers. (\APACyear2022) \APACinsertmetastarshekhar2022liver{APACrefauthors}Shekhar, C., Debadarshini, J.\BCBL Saha, S. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleLiVeR: Lightweight Vehicle Detection and Classification in Real-Time Liver: Lightweight vehicle detection and classification in real-time.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2206.06173. \PrintBackRefs\CurrentBib
Suvorov \BOthers. (\APACyear2018) \APACinsertmetastarsuvorov2018deep{APACrefauthors}Suvorov, D., Dong, G.\BCBL Zhukov, R. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleDeep residual network for sound source localization in the time domain Deep residual network for sound source localization in the time domain.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1808.06429. \PrintBackRefs\CurrentBib
Takeuchi \BOthers. (\APACyear2014) \APACinsertmetastartakeuchi2014smart{APACrefauthors}Takeuchi, K., Matsumoto, T., Takeuchi, Y., Kudo, H.\BCBL Ohnishi, N. \APACrefYearMonthDay2014. \BBOQ\APACrefatitleA smart-phone based system to detect warning sound for hearing impaired people A smart-phone based system to detect warning sound for hearing impaired people.\BBCQ \APACrefbtitleInternational Conference on Computers for Handicapped Persons International conference on computers for handicapped persons (\BPGS 506–511). \PrintBackRefs\CurrentBib
Vera-Diaz \BOthers. (\APACyear2018) \APACinsertmetastarvera2018towards{APACrefauthors}Vera-Diaz, J.M., Pizarro, D.\BCBL Macias-Guarasa, J. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleTowards end-to-end acoustic localization using deep learning: From audio signals to source position coordinates Towards end-to-end acoustic localization using deep learning: From audio signals to source position coordinates.\BBCQ \APACjournalVolNumPagesSensors18103418. \PrintBackRefs\CurrentBib
Wang \BOthers. (\APACyear2019) \APACinsertmetastarwang2019pulmonary{APACrefauthors}Wang, C., Chen, D., Hao, L., Liu, X., Zeng, Y., Chen, J.\BCBL Zhang, G. \APACrefYearMonthDay2019. \BBOQ\APACrefatitlePulmonary image classification based on inception-v3 transfer learning model Pulmonary image classification based on inception-v3 transfer learning model.\BBCQ \APACjournalVolNumPagesIEEE Access7146533–146541. \PrintBackRefs\CurrentBib
Wen \BOthers. (\APACyear2023) \APACinsertmetastarwen2023novel{APACrefauthors}Wen, H., Guo, W.\BCBL Li, X. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleA novel deep clustering network using multi-representation autoencoder and adversarial learning for large cross-domain fault diagnosis of rolling bearings A novel deep clustering network using multi-representation autoencoder and adversarial learning for large cross-domain fault diagnosis of rolling bearings.\BBCQ \APACjournalVolNumPagesExpert Systems with Applications225120066. \PrintBackRefs\CurrentBib
Wicker \BOthers. (\APACyear2016) \APACinsertmetastarwicker2016nonlinear{APACrefauthors}Wicker, J., Tyukin, A.\BCBL Kramer, S. \APACrefYearMonthDay2016. \BBOQ\APACrefatitleA nonlinear label compression and transformation method for multi-label classification using autoencoders A nonlinear label compression and transformation method for multi-label classification using autoencoders.\BBCQ \APACrefbtitlePacific-Asia Conference on Knowledge Discovery and Data Mining Pacific-asia conference on knowledge discovery and data mining (\BPGS 328–340). \PrintBackRefs\CurrentBib
Zamora \BOthers. (\APACyear2017) \APACinsertmetastarzamora2017accurate{APACrefauthors}Zamora, W., Calafate, C.T., Cano, J\BHBIC.\BCBL Manzoni, P. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleAccurate ambient noise assessment using smartphones Accurate ambient noise assessment using smartphones.\BBCQ \APACjournalVolNumPagesSensors174917. \PrintBackRefs\CurrentBib
Zhou \BOthers. (\APACyear2017) \APACinsertmetastarzhou2017using{APACrefauthors}Zhou, H., Song, Y.\BCBL Shu, H. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleUsing deep convolutional neural network to classify urban sounds Using deep convolutional neural network to classify urban sounds.\BBCQ \APACjournalVolNumPagesIEEE Region 10 Conference(TENCON)3089–3092. \PrintBackRefs\CurrentBib
Zipf \BOthers. (\APACyear2020) \APACinsertmetastarzipf2020citizen{APACrefauthors}Zipf, L., Primack, R.B.\BCBL Rothendler, M. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleCitizen scientists and university students monitor noise pollution in cities and protected areas with smartphones Citizen scientists and university students monitor noise pollution in cities and protected areas with smartphones.\BBCQ \APACjournalVolNumPagesPloS one159e0236785. \PrintBackRefs\CurrentBib

𝒜\mathcal{A}𝒞\mathcal{C}lassiℋ\mathcal{H}onk: A System Framework to Annotate and Classify Vehicular Honk from Road Traffic

Abstract

keywords:

1 Introduction

2 Related work

2.1 Noise pollution monitoring & assessment

2.2 Vehicular honk identification & applications

3 Framework

4 Data acquisition

4.1 Spatio-temporal honk signature analysis

4.1.1 Honk count of different vehicles

4.1.2 Distribution of sound pressure level

4.1.3 Correlation between SPL with honk count

5 Proposed Methodology

5.1 Spectrogram generation

5.2 Data labeling

5.3 Data augmentation and class balancing

5.4 Choice of classification models

6 Experimental result analysis

6.1 Comparison with existing data labeling models

6.2 Ground truth verification of labeled data w.r.t data models

6.3 Classification performance evaluation of pre-trained models and EnTL model

6.4 Significance of dataset

6.5 Performance comparison of EnTL with the recent literature of honk classification

6.6 Discussion & remarks

7 Context identification

7.1 Spatio-temporal context inference

7.2 Ground truth verification of context in real scenario

8 Conclusion & future work

References

$\mathcal{A}$ $\mathcal{C}$ lassi $\mathcal{H}$ onk: A System Framework to Annotate and Classify Vehicular Honk from Road Traffic