• info mystetho


RaviShankar Prasad? , Gurkan Yilmaz† , Olivier Chetelat† , Mathew Magimai.-Doss*

*Idiap Research Institute, CH–1920 Martigny, Switzerland † CSEM, CH–2002 Neuchtel, Switzerland


Heart auscultation is a widely used technique for diagnosing cardiac abnormalities. In that context, capturing of phonocardiogram (PCG) signals and automatically monitoring of the heart by identifying S1 and S2 complexes is an emerging field. One of the first steps involved for identifying S1–S2 complexes is detection of the locations of these events in the PCG signals. Methods proposed in literature, to detect these events in the PCG signal, have largely focused on exploiting the dominant low frequency characteristics of the S1– S2 complexes through frequency–domain processing. In this paper, we propose a purely time–domain processing based method that employs a heavily decaying low pass filter (referred to as zero frequency filter) to suppress extraneous factors and detect S1–S2 locations. We demonstrate the potential of the proposed approach through investigations on two publicly available data sets, namely the PASCAL heart sounds challenge 2011 (PHSC–2011) and PhysioNet CinC. The method is also evaluated through an analysis with wearable sensors in the presence and absence of speech activity. Index Terms— Phonocardiogram, S1–S2 detection, zero frequency filter, modified ZFF


Auscultation of the heart deals with listening to the internal heart sounds and thus drawing conclusions towards the condition of the heart. The process of defining patterns of cardiac functionality from heart sounds requires rigorous training and expertise of practitioners. For several potentially fatal heart diseases, such as natural and prosthetic heart valve dysfunction, or even for the cases of heart failure, heart sound auscultation is one of the most reliable, cheap and successful measures for primitive screening and diagnosis [1]. However, the process is subjective and hence the observations may vary depending on several factors. Phonocardiogram (PCG) signals are recorded sound wave representations of human heart activity, which are typically characterized by S1(lub) and S2(dub) events. The human heart beats and produces these two sounds under normal condition [2]. S1 sound results from the closure of atrioventricular valves at the beginning of ventricular systole. The sound comprises of components resulting from the closure of the mitral valve and the tricuspid valve. S2 sound appears at the end of ventricular systole, denoting a transition to diastole. It comprises of components resulting from the closure of aortic valve and the pulmonary valve. The PCG and ECG signals are related in such a way that the S1 event follows the R–peak. Abnormality in the function of the heart may appear as low–frequency noise, like murmurs, within the S1 and S2 (called systole) or between S2 and S1 (called diastole) duration [3]. Segmentation of these events is therefore an essential front–end step in the analysis of PCG, and towards developing automatic diagnostic tools for cardiac ailments. Detection of the onset of S1 and S2 events is one of the major challenges in heart sound analysis. Heart sound segmentation algorithms mentioned in literature are broadly classified into two main approaches: those aided with an ECG reference to synchronize the segmentation, and those without this information. In ECG reference based approach, the QRS complexes and T–waves are detected initially in order to locate S1 and S2 segments, respectively [4]. For noisy ECG signals where T–waves are not clearly visible, identification of S2 is performed using an unsupervised classifier [5]. The present approach to the problem is independent of the availability of the ECG signal, as most of other previously proposed approaches are. Such methods are further classified into supervised and unsupervised methods. Several segmentation methods rely on thresholds to identify events using a transformed or filtered representation of PCG signals [6]. A method based on wavelet transform uses energy based features along with the timing information and simplicity features derived from the approximation coefficients [7]. The simplicity features are obtained as entropy values of the normalized eigen components of the correlation matrix of segments within the signal. Another method utilizes the autocorrelation (AC) function over energy envelope of PCG signals, derived from the approximation and detail coefficients obtained using wavelet analysis, to demarcate cardiac activity events [8]. A method uses Shannon energy obtained from the low frequency signal derived using approximation coefficients of fast wavelet transform, to segment and classify PCG signals [9]. Supervised methods utilize parameters such as homomorphic envelogram, Hilbert and wavelet envelops, derived from a band– limited PCG signal (mostly up to 1 kHz), for the purpose of training a hidden semi Markov model (HSMM) to segment PCG signals [6]. Another method introduces duration dependent HMMs to refine the probabilistic approach to detect events in PCG [10]. Decision statistics (DS) derived from spectro–temporal features are used along with duration constraints to identify peaks in PCG corresponding to S1 and S2 events [11]. A method uses eigen decomposition method to achieve the spectral clustering of power spectral density (SCPSD) for task of event detection [12]. The ensemble empirical mode decomposition (EEMD) based method uses additional features obtained from the kurtosis of the signal for segmentation of PCG signals [13]. Atoms characterized by time delay and duration, frequency, phase and amplitude values of segments within a signal are hypothesized as events. Density function representing these atoms obtained through a time–frequency decomposition of PCG signals helps in clustering the events in PCG [14]. The characteristic waveform and characteristic moment waveforms obtained using multiscale moment analysis on Viola integral waveform are employed for the purpose of segmentation of PCG [15]. A method employs multi–layered perceptron (MLP) architecture over parameters derived using short time energy and auto–regression analysis, to detect landmarks in pediatric PCG signals [16]. Another method uses the Shannon energy obtained for the spectrum obtained using Stransform, as a parameter for the radial basis function (SRBF) neural networks to obtain PCG envelop [17]. Most of these methods use spectral tranforms, adaptive thresholding, and complex filtering, clustering and learning steps, which are difficult to implement in real–time scenario. The current work attempts identification of events in PCG using a low–frequency signal derived using a heavily decaying resonator. The proposed method is motivated from the zero frequency filtering (ZFF) method, and obtains a modified ZFF signal which highlights the characteristics of S1 and S2 events. The proposed method is implemented within the temporal domain, and exhibits a lower complexity and latency. The method is tested for its performance on publicly available datasets, and also on data acquired using a wearable device. The paper is organized as follows: Sec. 2 discusses the ZFF method. Sec. 3 explains the motivation, and introduces the proposed method. Sec. 4 explains the experimental setup and the results obtained across different databases. Sec. 5 summarizes the paper.

Fig. 1. ZFF and modified ZFF signals obtained from speech signal. (a) Speech signal. (b) ZFF signal (2N + 1 ∼ T0). (c) Modified ZFF signal (2N + 1 ∼ T0/8).


The ZFF method filters a time domain signal with a heavily decaying digital resonator centered at 0 Hz. The ZFF method originated in the context of speech processing to identify location of significant excitation, known as glottal closure instants (GCIs), within the vibrating vocal fold source signal in speech [18]. The underlying motivation of the method being that the spectral characteristics of a temporal discontinuity is evenly spread across all bands, including very low frequencies in the vicinity of 0 Hz. The contribution of the vocal tract system response is significantly low at frequencies near 0 Hz. The signal when filtered using the resonator, exhibits a polynomial growth trend. The GCIs identified at the zero crossing locations in the final output, once a trend removal operation is performed across an appropriate duration. The zero frequency filter is implemented as a cascaded resonators centered at 0 Hz. The resultant of the filter is given by x[n] = s[n] + 2x[n − 1] − x[n − 2], and the equivalent transfer function is given by, H(z) = 1 1 − 2z−1 + z−2 , (1) where s[n] is the input to the resonator.

The zero frequency filtering is implemented as an integrator resulting in a trend of polynomial growth with time in the filtered signal. Fig. 1 shows a segment of speech signal, and the resultant signal obtained with the ZFF method using different trend removal duration. Fig. 1(a) shows the speech signal obtained from a voiced segment, and Fig. 1(b) shows the corresponding trend removed ZFF output y[n]. The polynomial trend in the filtered signal x[n] in this case is removed using a local mean removal operation across a duration comparable to the pitch period (∼ T0) of the signal, given by y[n] = x[n] − 1 2N + 1 nX +N k=n−N x[k]; N + 1 ≤ n ≤ L − N, (2) where L is the net length of the signal x[n], and 2N + 1 is the length of the trend removal window. A detailed description of the steps involved in implementation of the ZFF method is given in [18].


The section presents the motivation in adopting the ZFF method for detection of events in PCG signals. The section also discusses changes in the ZFF method leading to a modified output. A step by step description of the algorithm proposed is also presented.

3.1. Motivation

When the trend removal window duration is T0 ≤ 2N + 1 ≤ 2T0, T0 being the fundamental period of input s[n], the resultant y[n] exhibits a negative to positive zero crossing at the location of discontinuity. The signal y[n] therefore is periodic with T0. A modification in the trend removal window duration results in a shift in the filter response, leading to a shift in the period of y[n], as shown in Fig. 1(c), for a duration 2N +1 < f0/2. The filtered signal appears modulated by the instantaneous energy of the original signal. The other components and interferences in s[n] are absent in the signal obtained by modified ZFF. The log–magnitude response (log|Y (ω)|) of the ZFF signal obtained for different trend removal duration are shown in Fig. 2. The dominant characteristics in log|Y2(ω)| in Fig. 2(b) appear shifted to relatively higher frequency bands, as compared to log|Y1(ω)| in Fig. 2(a) where it is concentrated around the f0 of the segment. Log|Y1(ω)| represents the spectral characteristics of a ZFF signal, obtained using a trend removal duration comparable to the fundamental period of the signal, whereas log|Y2(ω)| corresponds to a signal obtained with a smaller trend removal duration. The shift in the dominant signal characteristics is inversely proportional to the duration of the trend removal operation with respect to the fundamental period

The modified ZFF signal serves as motivation for the identification of events in PCG signal. The ZFF and modified ZFF signals decay sharply beyond their range of spectral dominance. This behaviour gives the resultant signal an ability to mitigate the effect of interference while preserving information around a narrow band of interest. The amplitude modulations of the modified ZFF signal relates to the amplitude information of the narrowband component in the signal and its harmonics. A carefully estimated duration of the trend removal operation can thus lead to a filtered signal with desired narrowband characteristics. Information content of the events in PCG signals has been reported to appear bandlimited within a narrow range of frequencies and thus can be efficiently and robustly captured using the modified ZFF signal. The following sections discuss the method proposed, and the results obtained on different databases.

Fig. 2. Spectrogram obtained for ZFF signals employing different trend removal duration. (a) 2N + 1 comparable to a pitch period. (b) 2N + 1 lesser than half pitch period

The value of threshold is chosen empirically, as the peaks corresponding to cardiac events exhibit higher energy than this threshold, as compared to other spurious peaks. Fig. 3 shows an example of the proposed analysis, illustrating the steps involved in the proposed method to identify event locations in PCG. Figs. 3(a) and 3(b) shows a noisy PCG signal segment s[n] obtained from the PASCAL– CHSC 2011 database [19], and the corresponding modified ZFF signal y1[n] obtained using a window duration of 20 ms. The envelope of the y1[n] signal appears modulated by the energy of PCG signal, mostly at significant events. This behavior is highlighted by the Hilbert envelope (y1HE [n]) of y1 as shown in Fig. 3(c). A peak identification method is employed to identify the peaks in y1HE [n]. Lesser dominant peaks, present at several other locations in y1HE [n] apart from the S1 and S2 locations, can be eliminated by using the slope e[n] at the PNZ locations in y1HE [n], given in Fig. 3(d). Weighing of the y1HE [n] signal with e[n] helps locating the cardiac events, even in the presence of noise.

3.2. Detection of events in PCG signals

The significant proportion of frequency response of the S1 and S2 events has been reported to exist below 150 Hz, however, the signal is periodic with a larger duration (∼ 60–100 cycles a minute). A modification in ZFF method is made to shift the desired response to a spectral range of interest, between 100–200 Hz in the case of event detection within PCG signals. The modified ZFF is predominantly a monocomponent signal and hence it is easy to track the modulations in it. The present work uses a window duration 2N + 1 = 20 ms, chosen heuristically, for the resulting signal response appear in the desired range. Following are the steps of the algorithm, proposed to detect S1 and S2 locations in PCG signals. 1. Obtain the ZFF signal x[n] from the PCG signal s[n] using the filter given in Eqn. 1. 2. Obtain the modified ZFF signal y[n] by removing the trend in x[n] using a window duration of 20 ms. 3. Obtain the slope e[n] of y[n] at the positive to negative zero crossing (PNZ) locations. The slope values are higher at the S1 and S2 event locations because of the modulations in y[n]. 4. Compute the Hilbert envelope yHE[n] from the signal y[n]. 5. Determine the peaks in yHE[n], and weigh by the corresponding value in e[n] to get yw[n]. A threshold of 0.25 of maximum strength of peaks in yw[n] is used to determine peak locations corresponding to S1 and S2.

The value of threshold is chosen empirically, as the peaks corresponding to cardiac events exhibit higher energy than this threshold, as compared to other spurious peaks. Fig. 3 shows an example of the proposed analysis, illustrating the steps involved in the proposed method to identify event locations in PCG. Figs. 3(a) and 3(b) shows a noisy PCG signal segment s[n] obtained from the PASCAL– CHSC 2011 database [19], and the corresponding modified ZFF signal y1[n] obtained using a window duration of 20 ms. The envelope of the y1[n] signal appears modulated by the energy of PCG signal, mostly at significant events. This behavior is highlighted by the Hilbert envelope (y1HE [n]) of y1 as shown in Fig. 3(c). A peak identification method is employed to identify the peaks in y1HE [n]. Lesser dominant peaks, present at several other locations in y1HE [n] apart from the S1 and S2 locations, can be eliminated by using the slope e[n] at the PNZ locations in y1HE [n], given in Fig. 3(d). Weighing of the y1HE [n] signal with e[n] helps locating the cardiac events, even in the presence of noise.

Fig. 3. PCG signal analysis using the proposed method. (a) Noisy PCG signal. (b) Modified ZFF signal. (c) Hilbert envelope of modified ZFF signal. (d) Gradient of modified ZFF signal.


4.1. Databases and evaluation measures The proposed method is evaluated on two datasets. The PASCAL Heart Sound Challenge (PHSC11) dataset has been compiled and labeled for a challenge on localization and classification of heart sounds [19]. The signals are acquired at a sampling frequency of 44.1 kHz and 4 kHz, respectively, in two different sets having 176 and 656 auscultations with normal heart beats, murmurs and extra systoles. Another dataset is the PhysioNet/CinC Challenge 2016, which includes PCG recordings of healthy subjects and pathological patients, with a total of 3153 heart sound recordings [20]. The recordings are divided into normal (2488) and abnormal (665) classes. The datasets contain clean, as well as noisy PCG signals. Previously proposed methods chose to evaluate only on a selected set of data and the criteria of selection is not explained. As a consequence, a direct comparison among these methods is a non–trivial task. The average duration of S1 and S2 segments are about 100 ms with a dispersion of ±35 ms [11]. Any event detected within a duration of 100 ms from the beginning of S1 and S2 is considered as a correct detection [6]. The performance measures used to evaluate the proposed algorithms are sensitivity (Se) and positive predictivity (+P). These parameters are derived as follows: Se = T P T P + F N ; +P = T P T P + F P ; where TP refers to true positives, FN refers to false negatives and FP refers to false positives. The labels provided within the databases are identified as ground truth.

4.2. Results

Tab. 1 shows the results obtained by the proposed method on PHSC11 and CinC challenge data sets. All these mentioned methods use different sets of data to report their performances. The performance of the proposed method has been reported on all the records (dataset A) for which annotations are available, in the PHSC11 data set. The performance on Physionet CinC data has been obtained on a set of 200 recordings where the ground truth has been derived using HSMM [6]. A slightly low sensitivity on PhysioNet data set can be attributed to noisy records, which leads to the presence of spurious peaks. For the sake of completeness, we also report the contrastive performance of different methods evaluated on their in–house data sets collected in a lab setup. A direct comparison between those results and our results cannot be made, however, it can be observed that the proposed method performs in a comparable range of those other methods.

Table 1. A comparison of PCG segmentation methods. ‘in–house DB’ is database mostly collected by the authors themselves. (∼ : data not available)

4.3. Observations on wearable sensor data

Identification of S1–S2 events in PCG signal acquired using wearable sensors is a challenging task due to interferences introduced due to the contact between skin and sensors as well as presence of external sounds (e.g., speech). ZFF based analysis could help to alleviate the interference by a significant factor. To demonstrate this aspect, the proposed method is employed to detect events in PCG data obtained using a wearable sensor. The wearable sensor SENSE [21], which has been developed to record high quality cardiac activity measurements, has been extended to acquire the thoracic sounds by means of a device based on [22]. The sensor is able to acquire high SNR ECG signals, even when the subjects are in motion. The PCG signals are acquired at a sampling rate of 16 kHz and are aligned with the ECG data. Fig. 4 shows the results of the proposed method to identify events in the PCG signal acquired using the wearable sensor, with and without the interference of speech signals. Fig. 4(a) shows the PCG signal, and the corresponding ECG signal, recorded in absence of speech. Fig. 4(b) shows the weighted

Fig. 4. Hilbert envelope for ZFF signal obtained from wearable sensors. (a) and (c) PCG signals along with ECG signals obtained from the sensor SENSE, in absence and presence of speech, respectively. (b) and (d) weighted Hilbert envelope of modified ZFF signals.

Hilbert envelope of the modified ZFF signal obtained using a trend removal window of 5 ms duration. The peaks in the Hilbert envelope highlight the location of S1 and S2 events in PCG signal. Figs. 4(c) and 4(d) show the PCG, and the corresponding Hilbert envelope signal, recorded in the presence of speech. The effects of interference of speech can be noted in Fig. 4(c), where the events are masked by a high SNR interference. The filtered signal obtained using modified ZFF is able to suppress the interference owing to its sharp response, and the prominent peaks in Fig. 4(d) coincide with the R–peaks in the reference ECG signal, and can be postulated as locations for cardiac events.


The proposed method to identify the S1 and S2 events in PCG signals combines the ZFF method with the Hilbert envelope operation. The ZFF method employs a heavily decaying resonator centered at 0 Hz, which helps eliminate a significant proportion of interference by suppressing other spectral components. Hilbert envelope of a modified ZFF signal proves a robust way to highlight the locations of events in PCG. The proposed method yields promising results on Physionet CinC and PHSC11 datasets, which include clean as well as noisy recordings. The method has also been verified on signals acquired using a wearable sensors, where it yields a good detection performance even in the presence of noise/interference. Owing to its robustness and low complexity in implementation [23], the method can potentially be integrated into wearable sensors.


This work was funded through CSEM–Idiap project AUDIO– REAPPS.


[1] V Nivitha Varghees and K I Ramachandran, “A novel heart sound activity detection framework for automated heart sound analysis,” Biomedical Signal Processing and Control, vol. 13, pp. 174–188, 2014.

[2] L Hamza Cherif, M Mostafi, and SM Debbal, “Digital filters in heart sound analysis,” International Journal of Clinical Medicine Research, vol. 1, no. 3, pp. 97–108, 2014.

[3] Maryam Hamidi, Hassan Ghassemian, and Maryam Imani, “Classification of heart sound signal using curve fitting and fractal dimension,” Biomedical Signal Processing and Control, vol. 39, pp. 351–359, 2018.

[4] Milad El-Segaier, O Lilja, S Lukkarinen, Leif Sornmo, R Sep- ¨ ponen, and Erkki Pesonen, “Computer-based detection and analysis of heart sound and murmur,” Annals of Biomedical Engineering, vol. 33, no. 7, pp. 937–942, 2005.

[5] P Carvalho, P Gilt, J Henriques, L Eugenio, and M Antunes, ´ “Low complexity algorithm for heart sound segmentation using the variance fractal dimension,” in Proc. IEEE International Workshop on Intelligent Signal Processing. IEEE, 2005, pp. 194–199.

[6] David B Springer, Lionel Tarassenko, and Gari D Clifford, “Logistic regression–HSMM–based heart sound segmentation,” IEEE Transactions on Biomedical Engineering, vol. 63, no. 4, pp. 822–832, 2015.

[7] Jithendra Vepa, Paresh Tolay, and Abhishek Jain, “Segmentation of heart sounds using simplicity features and timing information,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2008, pp. 469–472.

[8] Joao Pedrosa, Ana Castro, and Tiago TV Vinhoza, “Automatic heart sound segmentation and murmur detection in pediatric phonocardiograms,” in Proc. 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 2014, pp. 2294–2297.

[9] Dinesh Kumar, Paulo Carvalho, Manuel Antunes, Paulo Gil, Jorge Henriques, and Luis Eugenio, “A new algorithm for detection of S1 and S2 heart sounds,” in Proc. IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. IEEE, 2006, vol. 2, pp. 1180–1183.

[10] Samuel E Schmidt, Claus Holst-Hansen, Claus Graff, Egon Toft, and Johannes J Struijk, “Segmentation of heart sound recordings by a duration–dependent hidden Markov model,” Physiological measurement, vol. 31, no. 4, pp. 513, 2010.

[11] H Naseri and MR Homaeinezhad, “Detection and boundary identification of phonocardiogram sounds using an expert frequency–energy based metric,” Annals of Biomedical Engineering, vol. 41, no. 2, pp. 279–292, 2013.

[12] Sangita Das, Saurabh Pal, and Madhuchhanda Mitra, “Automated fundamental heart sound detection using spectral clustering technique,” in Proc. IEEE Calcutta Conference (CALCON). IEEE, 2017, pp. 264–267.

[13] Chrysa D Papadaniil and Leontios J Hadjileontiadis, “Efficient heart sound segmentation and extraction using ensemble empirical mode decomposition and kurtosis features,” IEEE journal of Biomedical and Health Informatics, vol. 18, no. 4, pp. 1138–1152, 2013.

[14] Hong Tang, Ting Li, Tianshuang Qiu, and Yongwan Park, “Segmentation of heart sounds based on dynamic clustering,” Biomedical Signal Processing and Control, vol. 7, no. 5, pp. 509–516, 2012.

[15] Zhonghong Yan, Zhongwei Jiang, Ayaho Miyamoto, and Yunlong Wei, “The moment segmentation analysis of heart sound pattern,” Computer methods and programs in biomedicine, vol. 98, no. 2, pp. 140–150, 2010.

[16] Amir A Sepehri, Arash Gharehbaghi, Thierry Dutoit, Armen Kocharian, and A Kiani, “A novel method for pediatric heart sound segmentation without using the ecg,” Computer methods and programs in biomedicine, vol. 99, no. 1, pp. 43–48, 2010.

[17] Ali Moukadem, Alain Dieterlen, Nicolas Hueber, and Christian Brandt, “A robust heart sounds segmentation module based on s-transform,” Biomedical Signal Processing and Control, vol. 8, no. 3, pp. 273–281, 2013.

[18] K Sri Rama Murty and B Yegnanarayana, “Epoch extraction from speech signals,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1602–1613, 2008. [19] P. Bentley, G. Nordehn, M. Coimbra, and S. Mannor, “The PASCAL Classifying Heart Sounds Challenge 2011 (CHSC2011) Results,” [20] Gari D Clifford, Chengyu Liu, Benjamin Moody, David Springer, Ikaro Silva, Qiao Li, and Roger G Mark, “Classification of normal/abnormal heart sound recordings: The physionet/computing in cardiology challenge 2016,” in Proc. Computing in Cardiology Conference (CinC). IEEE, 2016, pp. 609– 612.


[22] Gurkan Yilmaz, Pierre Starkov, Mathilde Crettaz, Josias ¨ Wacker, and Olivier Chetelat, “A low-cost usb-compatible ´ electronic stethoscope unit for multi-channel lung sound acquisition,” in XV Mediterranean Conference on Medical and Biological Engineering and Computing – MEDICON 2019, Jorge Henriques, Nuno Neves, and Paulo de Carvalho, Eds., Cham, 2020, pp. 1299–1303, Springer International Publishing.

[23] Nagapuri Srinivas, Gayadhar Pradhan, and Puli Kishore Kumar, “FPGA implementation of zero frequency filter,” in Conference on Information and Communication Technology (CICT). IEEE, 2018, pp. 1–5.


Tel: +41 76 223 10 21

       +41 79 175 98 75

MyStetho SA, 15 rue des sources, 1205 Geneva - CH

  • LinkedIn
  • YouTube
  • Instagram
  • White Facebook Icon