ISCA Archive 的 关于 dysarthria 的所有文章摘要(1996~2024)

ISCA Archive 的 关于 dysarthria 的所有文章摘要(1996~2024)

构音障碍(Dysarthria)研究全景总结(1996--2024)

所有文章摘要(1996~2024)

1996

The nemours database of dysarthric speech

Xavier Menéndez-Pidal, James B. Polikoff, Shirley M. Peters, Jennie E. Leonzio, H. T. Bunnell

The Nemours database is a collection of 814 short nonsense sentences; 74 sentences spoken by each of 11 male speakers with varying degrees of dysarthria. Additionally, the database contains two connected-speech paragraphs produced by each of the 11 speakers. The database was designed to test the intelligibility of dysarthric speech before and after enhancement by various signal processing methods, and is available on CD-ROM. It can also be used to investigate general characteristics of dysarthric speech such as production error patterns. The entire database has been marked at the word level and sentences for 10 of the 11 talkers have been marked at the phoneme level as well. This paper describes the database structure and techniques adopted to improve the performance of a Discrete Hidden Markov Model (DHMM) labeler used to assign initial phoneme labels to the elements of the database. These techniques may be useful in the design of automatic recognition systems for persons with speech disorders, especially when limited amounts of training data are available.

1999

Application of acoustic speech analysis in amyotrophic lateral sclerosis subjects

Barbara Tomik, Wieslaw Wszolek, Lidia Glodzik-Sobanska, Anna Lechwacka, Andrzej Szczudlik, Zbigniew Engel

Assessment of dysarthria in ALS patients has not been fully studied. The aim of the study was to assess a typical dysarthria profile for different ALS group. 53 patients with definite (n=27) or probable (n=26) ALS (according to WFN criteria) were studied. Each patient had three acoustic, computer-analysed tests. The following consonants and vowels: "R", "L", "D", "T", "M", "W", "P", "B", "G","K","H", "Q", "O", "U", "T" were chosen for analysis. We used the Euclidian principle for analyses of sequences of sound formants and the mean sound distances from the pattern (Δf=125 Hz, ΔT=9 ms, Δs=0.5 dB). Our study showed the occurrence of characteristic dysarthria profile in different ALS groups ie. for bulbar group: "B", "O", "I", "W", "T" and for the limb group; "B", "I", "T", "W", "O" were the most deformed. We also demonstrated that preclinical dysarthric disorders occur among the ALS limb group. This study indicated a possibility of detecting and monitoring dysarthria in ALS based on acoustic speech analysis of changes in certain sounds.

Index Terms. acoustic speech analysis, dysarthria, ALS

2002

How speakers with and without speech impairment mark the question statement contrast

Rupal Patel

We sought to better understand how normal speakers signal the yesno question-statement contrast and whether speakers with severe speech impairment could signal the contrast and how they would do so. We asked a group of eight speakers with severe dysarthria to produce ten, three-syllable phrases in an interrogative tone and in a declarative tone. We asked a group of gender matched normal speakers to perform the same task. We then performed a set of acoustic analyses to identify, which features speakers used, and how these features differed between speaker groups. Our findings indicate that both sets of speakers use multiple prosodic cues to signal the question statement contrast of which F0 is only one cue. Other cues include loudness and syllable duration. We also found that speakers with dysarthria often exaggerated salient prosodic cues and occasionally used alternative cues over which they had more precise control.

Application of the lee silverman voice treatment (LSVT) to individuals with multiple sclerosis, ataxic dysarthria, and stroke

Leslie Will, Lorraine O. Ramig, Jennifer L. Spielman

Reduced speech intelligibility has been observed in Parkinsons disease (PD), ataxic dysarthria, multiple sclerosis (MS) and in individuals who have suffered a cerebral vascular accident (CVA). Data support the effectiveness of the Lee Silverman Voice Treatment (LSVT) on the improvement of vocal function and speech intelligibility in PD. Today, only a small percentage of patients with chronic neurologic disease receive treatment to improve speech and voice. This paper will report improvement in case studies of individuals with MS, ataxic dysarthria and CVA and suggest that the effects of LSVT may not be restricted to the dysarthria associated with PD.

Realisations of nuclear pitch accents in Swabian dialect and Parkinson²s dysarthria: a preliminary report

Kathrin Claßen

The phonetic realisation of pitch accents associated with little sonorant material varies between languages and dialects. For speakers of Northern Standard German it has been shown that nuclear falling accents are truncated while rising accents are compressed. In order to further investigate effects of the German Swabian dialect native Swabian speakers were investigated with regard to truncation and compression. In addition, two Swabian speakers suffering from Parkinson's disease were examined, because basal ganglia dysfunction - a typical morphological trait of parkinsonian subjects - is frequently accompanied by dysarthophonia. Results of the present study indicate that there is no dialect-specific effect between Northern Standard German and Swabian. In contrast, Swabian parkinsonian subjects show compression both in rising and falling nuclear pitch accents. Further investigations of the timing concept (alignment) of the H* peak in monosyllabic test items with nuclear falling accents (H*L) reveal different timing concepts of Parkinsonian subjects as compared to healthy control subjects.

2003

A formant-trajectory model and its usage in comparing coarticulatory effects in dysarthric and normal speech

Xiaochuan Niu, Jan P. H. van Santen

Dysarthria is a diverse group of motor speech disorders that typically are associated with impaired intelligibility. As part of a project to develop augmentative communication technologies for intelligibility enhancement of dysarthric speech, a quantitative method is proposed for measuring the relative contributions to impaired intelligibility of vowels of three factors: First, target shift: Dysarthric speakers may have spectral targets that differ from those of normal speakers. Second, coarticulation: The degree of contextual influence on articulation may be greater in dysarthric speech than in normal speech. Third, random variability: Dysarthric speakers may articulate the same phoneme in the same context with more variability. The method is based on a linear model of formant trajectories of vowels in consonant contexts. The results from analysis of a dysarthric and a normal speech sample showed surprisingly similar target values, but increased coarticulation and random variability for the dysarthric sample.

Index Terms. Dysarthria, coarticulation, formant

2004

Formant re-synthesis of dysarthric speech

Alexander Kain, Xiaochuan Niu, John-Paul Hosom, Qi Miao, Jan P. H. van Santen

Dysarthria is a motor speech disorder that is often associated with irregular phonation (e.g. vocal fry) and amplitude, incoordination of articulators, and restricted movement of articulators, among other problems. The present study is part of a project on voice transformation systems for dysarthria, with the goal of producing intelligibility-enhanced speech. We report on a procedure in which formants and energies are estimated from dysarthric speech; next, these trajectories are modified to more closely approximate desired targets; finally, transformed speech is generated using formant synthesis. Results indicate that the transformation step enhances intelligibility, and that removal of vocal fry enhances perceived quality. However, the initial step of stylizing the formant trajectories results in a decrement in intelligibility, thereby reducing the net impact of the process.

F0 and formant frequency distribution of dysarthric speech - a comparative study

Hiroki Mori, Yasunori Kobayashi, Hideki Kasuya, Hajime Hirose, Noriko Kobayashi

We are investigating acoustical analysis for dysarthric speech, which appears as a symptom of neurologic disease, in order to elucidate its physiological and acoustical mechanism, and to develop aids for diagnosis and training, etc. In this report, acoustical characteristics of various kinds of dysarthrias are measured. As a result, shrinking of the F0 range as well as vowel space are observed in dysarthric speech. Also, from the comparison of F0 range and vowel formant frequencies it is suggested that speech effort to produce wider F0 range can influence vowel quality as well.

Revisiting dysarthria assessment intelligibility metrics

Phil Green, James Carmichael

This study reports on the development of an automated isolated-word intelligibility metric system designed to improve the scoring consistency and reliability of the Frenchay Dysarthria Assessment Test (FDA). The proposed intelligibility measurements are based on the probabilistic likelihood scores derived from the forced alignment of the dysarthric speech to whole-word hidden Markov models (HMMs) trained on data from a variety of normal speakers. The hypothesis is that these probability scores are correlated to the decoding effort made by naive listeners when trying to comprehend dysarthric utterances. Initial results indicate that the scores returned from these composite measurements provide a more fine-grained assessment of a given dysarthric individual's oral communicative competence when compared with traditional "right-or-wrong" scoring of expert listeners' transcriptions of dysarthric speech samples.

2007

Modelling confusion matrices to improve speech recognition accuracy, with an application to dysarthric speech

Omar Caballero Morales, Stephen Cox

Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited vocabulary decrease speech recognition accuracy. In this paper, we introduce a technique that can increase recognition accuracy in speakers with low intelligibility by incorporating information from an estimate of the speaker's phoneme confusion matrix. The technique performs much better than standard speaker adaptation when the number of sentences available from a speaker for confusion matrix estimation or adaptation is low, and has similar performance for larger numbers of sentences.

pectral transition features in dysarthric speech

María E. Hernández-Díaz Huici, Werner Verhelst

Some spectral transition features are introduced and tested in samples from dysarthric patients. The goal is to explore their potential as descriptors of articulatory deviations. This preliminary analysis includes only stop consonants extracted from the diadochokinetic task. Results and discussion are detailed for each one of the dysarthric groups included in the experiment.

Index Terms. Articulation, dysarthria, spectral transitions

2008

Combining neural network and rule-based systems for dysarthria diagnosis

James Carmichael, Vincent Wan, Phil Green

This study reports on the development of a diagnostic expert system - incorporating a multilayer perceptron (MLP) - designed to identify any sub-type of dysarthria (loss of neuro-muscular control over the articulators) manifested by a patient undergoing a Frenchay Dysarthria Assessment (FDA) evaluation. If sufficient information is provided describing pathological features of the patient's speech, the rule-based classifier (RBC) can out-perform the MLP in terms of rendering a more accurate and consistent diagnosis. The combination MLP/RBC developed during this study realised an overall improvement in diagnostic accuracy of 9.3% (absolute) for a selection of dysarthric cases, representing a substantial improvement over the benchmark system which - unlike the MLP/RBC - cannot directly process acoustic data.

2009

Optimized feature set to assess acoustic perturbations in dysarthric speech

Sunil Nagaraja, Eduardo Castillo-Guerra

This paper is focused on the optimization of features derived to characterize the acoustic perturbations encountered in a group of neurological disorders known as Dysarthria. The work derives a set of orthogonal features that enable acoustic analyses of dysarthric speech from eight different Dysarthria types. The feature set is composed by combinations of objective measurements obtained with digital signal processing algorithms and perceptual judgments of the most reliably perceived acoustic perturbations. The effectiveness of the features to provide relevant information of the disorders is evaluated with different classifiers enabling a classification rate up to 93.7%.

Universal access: speech recognition for talkers with spastic dysarthria

Harsh Vardhan Sharma, Mark Hasegawa-Johnson

This paper describes the results of our experiments in small and medium vocabulary dysarthric speech recognition, using the database being recorded by our group under the Universal Access initiative. We develop and test speaker-dependent, word- and phone-level speech recognizers utilizing the hidden Markov Model architecture; the models are trained exclusively on dysarthric speech produced by individuals diagnosed with cerebral palsy. The experiments indicate that (a) different system configurations (being word vs. phone based, number of states per HMM, number of Gaussian components per state specific observation probability density etc.) give useful performance (in terms of recognition accuracy) for different speakers and different task-vocabularies, and (b) for very low intelligibility subjects, speech recognition outperforms human listeners on recognizing dysarthric speech.

2010

Kinematic analysis of tongue movement control in spastic dysarthria

Heejin Kim, Panying Rong, Torrey M. Loucks, Mark Hasegawa-Johnson

This study provided a quantitative analysis of the kinematic deviances in dysarthria associated with spastic cerebral palsy. Of particular interest were tongue tip movements during alveolar consonant release. Our analysis based on kinematic measures indicated that speakers with spastic dysarthria had a restricted range of articulation and disturbances in articulatory-voicing coordination. The degree of kinematic deviances was greater for lower intelligibility speakers, supporting an association between articulatory dysfunctions and intelligibility in spastic dysarthria.

Automatic speech recognition for assistive writing in speech supplemented word prediction

John-Paul Hosom, Tom Jakobs, Allen Baker, Susan Fager

This paper describes a system for assistive writing, the Speech Supplemented Word Prediction Program (SSWPP). This system uses the first letter of a word typed by the user as well as the user's (possibly low-intelligibility) speech to predict the intended word. The ASR system, which is the focus of this paper, is a speaker-dependent isolated-word recognition system. Word-level results from a non-dysarthric speaker indicate that almost all errors could be corrected by the SSWPP language model. Results from five speakers with moderate to severe dysarthria (average intelligibility 61.7%) averaged 62% for word recognition and 65% for out-of-vocabulary identification.

Acoustic cues to lexical stress in spastic dysarthria

Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman

The current study examined the acoustic cues to lexical stress produced by speakers with spastic dysarthria and healthy control speakers. Of particular interest was the effect of stress location, which represented whether lexical stress was on the first vs. second syllable of the word. Results suggest that speakers with dysarthria convey lexical stress differently than do control speakers. The difference was greater for secondsyllable stressed words compared to first-syllable stressed words. In addition, for the first-syllable stressed words, speakers with dysarthria utilized the pitch and intensity cues to a greater degree compared to control speakers.

Index Terms: lexical stress, dysarthric speech, acoustic cues

2011

Temporal performance of dysarthric patients in speech and tapping tasks

Eiji Shimura, Kazuhiko Kakehi

Dysarthria is defined as a locomotor disorder of the vocal speech organ due to a pathological change of nerve and muscle systems. Several methods of speaking rate control have been widely used for the rehabilitation of dysarthria. However, these methods are not always effective depending on the condition of the dysarthric patient. In this study, we investigated the performance of tempo perception of dysarthrias, which has not yet been fully studied. Several types of experiments were conducted for both dysarthric patients and normal subjects. The experiments included speech production and tapping tasks with and without reference samples of utterances or tapping.

The experimental results showed that some of the dysarthric subjects exhibited disorders both in the locomotor of the vocal speech organ and in their memory of tempo and rhythm.

Spectral features for automatic blind intelligibility estimation of spastic dysarthric speech

Richard Hummel, Wai-Yip Chan, Tiago H. Falk

In this paper, we explore the use of the standard ITU-T P.563 speech quality estimation algorithm for automatic assessment of dysarthric speech intelligibility. A linear mapping consisting of three salient P.563 internal features is proposed and shown to accurately estimate spastic dysarthric speech intelligibility. Delta-energy features are further proposed in order to characterize the atypical spectral dynamics and limited vowel space observed with spastic dysarthria. Experiments using the publicly-available Universal Access database (10 speaker patients) show that when salient delta-energy and internal P.563 features are used, correlations with subjective intelligibility ratings as high as 0.98 can be attained.

Acoustic analysis of speech as a promising instrument for monitoring and differential diagnosis of Parkinsons's disease

Sabine Skodda

Parkinsonian speech is characterized by abnormally low voice intensity, with vocal decay, poor voice quality, reduced prosodic pitch and loudness inflection, imprecise vowels and consonants, dysrhythmia and short rushes of speech, mumbling, and reduced speech intelligibility. Recently, there have been new acoustic analysis methods to capture different aspects of these speech abnormalities. In this review, selected studies are summarized in order to illustrate the application of acoustic analysis of speech for the objective measurement and quantification of different aspects of parkinsonian dysarthria.

Index Terms. parkinson´s disease, acoustic analysis of speech, hypokinetic dysarthria, dysprosody, vowel articulation, syllable repetition, motor speech performance

Acoustic analysis of voice and speech characteristics in early untreated Parkinson's disease

Jan Rusz, R. Cmejla, H. Ruzickova, J. Klempir, V. Majerova, J. Picmausova, J. Roth, E. Ruzicka

Parkinson's disease (PD) is a neurological illness characterized by progressive loss of dopaminergic neurons, primarily in the substantia nigra pars compacta. changes in speech associated with hypokinetic dysarthria are a common manifestation in patients with idiopathic PD. the aim of this study is to investigate the feasibility of automated acoustic measures for the identification of voice and speech disorders in PD. the speech data were collected from 46 czech native speakers, 24 with early PD before receiving pharmacotherapy treatment. We have applied several traditional and non-standard measurements in combination with statistical decision-making strategy to assess the extent of vocal impairment of recruited speakers. subsequently, we have applied support vector machine to find the best combination of measurements to differentiate PD from healthy subjects. this method leads to overall classification performance of 85%. admittedly, we have found relationships between measures of phonation and articulation and bradykinesia and rigidity in PD. in conclusion, the acoustic analysis can ease the clinical assessment of voice and speech disorders, and serve as measures of clinical progression as well as in the monitoring of treatment effects.

Index Terms. parkinson's disease, speech disorders, hypokinetic dysarthria, acoustic analysis, biomedical application

2012

Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech

Milton Sarria Paja, Tiago H. Falk

In this paper, automatic dysarthria severity classification is explored as a tool to advance objective intelligibility prediction of spastic dysarthric speech. A Mahalanobis distance-based discriminant analysis classifier is developed based on a set of acoustic features formerly proposed for intelligibility prediction and voice pathology assessment. Feature selection is used to sift salient features for both the disorder severity classification and intelligibility prediction tasks. Experimental results show that a two-level severity classifier combined with a 9-dimensional intelligibility prediction mapping can achieve 0.92 correlation and 12.52 root-mean-square error with subjective intelligibility ratings. The effects of classification errors on intelligibility accuracy are also explored and shown to be insignificant.

Index Terms: Intelligibility, dysarthria, diagnosis

Combination of multiple speech dimensions for automatic assessment of dysarthric speech intelligibility

Myung Jong Kim, Hoirin Kim

This paper focuses on the problem of automatically assessing the speech intelligibility of patients with dysarthria, which is a motor speech disorder. To effectively capture the characteristics of the speech disorder, various features are extracted in three speech dimensions such as phonetic quality, prosody, and voice quality. Then, we find the best feature set satisfying a new feature selection criterion that the selected features produce small prediction errors as well as low mutual dependency among them. Finally, the selected features are combined using support vector regression. Evaluation of the proposed method on a database of 94 speakers with dysarthria yielded an root mean square error of 8.1 to subjectively rated scores in the range of 0 to 100, which is a promising performance that the system can be successfully applied to help a speech therapist diagnosing the degree of speech disorder.

Index Terms: Dysarthria, feature selection, speech dimension, speech intelligibility, support vector regression

2013

Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models

Myung Jong Kim, Joohong Yoo, Hoirin Kim

Dysarthria is a motor speech disorder that impairs the physical production of speech. Modern automatic speech recognition for normal speech is ineffective for dysarthric speech due to the large mismatch of acoustic characteristics. In this paper, a new speaker adap- tation scheme is proposed to reduce the mismatch. First, a speaker with dysarthria is classified into one of the pre-defined severitylevels, and then an initial model to be adapted is selected depending on their severity-level. The candidates of an initial model are generated using dysarthric speech associated with their labeled severitylevel in the training phase. Finally, speaker adaptation is applied to the selected initial model. Evaluation of the proposed method on a database of several hundred words for 31 speakers with moderate to mild dysarthria showed that the proposed approach provides substantial improvement over the conventional speaker-adaptive system when a small amount of adaptation data is available.

Dysarthria intelligibility assessment in a factor analysis total variability space

David Martínez, Phil D. Green, H. Christensen

Speech technologies are more important every day to assist people with speech disorders. They can help to increase their quality of life or help clinicians to make a diagnosis. In this paper a new methodology based on a total variability subspace modelled by factor analysis is proposed to assess the intelligibility of people with dysarthria. The acoustic information of each recording is efficiently compressed and a Pearson correlation of 0.91 between the vectors in this subspace (iVectors) and the intelligibility is obtained. As acoustic information only perceptual linear prediction features are used. The experiments are conducted on Universal Access Speech database. Also a new error metric to overcome the subjectivity in the intelligibility labels is proposed.

Consonant distortions in dysarthria due to parkinson's disease, amyotrophic lateral sclerosis and cerebellar ataxia

Tanja Kocjančič Antolík, Cécile Fougeron

This paper addresses the presence and type of consonant distortions in speech of 79 French speakers with dysarthria due to Parkinson's disease (PD), Amyotrophic Lateral Sclerosis (ALS) and cerebellar ataxia (CA), and 26 control speakers. A total of 4990 consonants including selected occurrences of /d/, /g/, /t/, /k/ and /s/ in CV word-initial syllables, and /t/ in CV word medial and IP initial position were examined manually. Results show that the ALS group stands out with the more distorted consonants, while the PD and CA performed similarly. The distribution of the type of distortions differs also in the three dysarthric groups. While the most frequent type of distortion in ALS is incomplete closures of stops, devoicing of voiced consonant is the most frequent in the PD and CA groups. In the ALS group, distortions are also more uniformly distributed over consonant type and positions, while voiced consonants are more prone to distortion in PD and CA, as well as consonants in word medial position for PD. Finally, consonant distortions contribute strongly to perceived intelligibility and articulatory imprecision for the ALS and PD group.

2014

Acoustic and kinematic characteristics of vowel production through a virtual vocal tract in dysarthria

Jeff Berry, Andrew Kolb, Cassandra North, Michael T. Johnson

Broadening our understanding of the components and processes of speech sensorimotor learning is crucial to furthering methods of speech neurorehabilitation. Recent research in limb sensorimotor control has used virtual environments to study learning in novel sensorimotor working spaces. Comparable experimental paradigms have yet to be undertaken to study speech learning. We present acoustic and kinematic data obtained from participants producing vowels in unfamiliar articulatory-acoustic working spaces using a virtual vocal tract. Talkers with dysarthria and healthy controls were asked to produce vowels using an electromagnetic articulograph-driven speech synthesizer for participant-controlled auditory feedback. The aim of the work was to characterize performance within and between groups to generate hypotheses regarding experimental manipulations that may bolster our understanding of speech sensorimotor learning. Results indicate that dysarthric talkers displayed relatively reduced acoustic working spaces and somewhat more variable acoustic targets compared to controls. Kinematic measures of articulatory dynamics, particularly peak speed and movement jerk-cost, were idiosyncratic and did not dissociate talker groups. These findings suggest that individuals with dysarthria and healthy talkers may use idiosyncratic movement strategies in learning to control a virtual vocal tract, but that dysarthric talkers may nonetheless exhibit acoustic limitations that parallel deficits in speech intelligibility.

2015

Development of a Cantonese dysarthric speech corpus

Ka Ho Wong, Yu Ting Yeung, Edwin H. Y. Chan, Patrick C. M. Wong, Gina-Anne Levow, Helen Meng

Dysarthria is a neurogenic communication disorder affecting speech production. Significant differences in phonemic inventories and phonological patterns across the world's languages render generalization of disordered speech patterns from one language (e.g, English) to another (e.g., Cantonese) difficult. Capitalizing on existing methods in developing English-language dysarthric speech corpora, we develop a Cantonese corpus in order to investigate articulatory and prosodic characteristics of Cantonese dysarthric speech, focusing on speaking rate and pitch and loudness control. Currently, we have collected 7.5 and 2.5 hours of speech data from 11 dysarthric subjects and 5 control speakers respectively. Our preliminary analysis reveals the characteristics of Cantonese dysarthric speech are consistent with general properties of motor speech disorders found in other languages.

A syllable-based analysis of speech temporal organization: a comparison between speaking styles in dysarthric and healthy populations

Brigitte Bigi, Katarzyna Klessa, Laurianne Georgeton, Christine Meunier

A comparison of how healthy and dysarthric pathological speakers adapt their production is a way to better understand the processes and constraints that interact during speech production in general. The present study focuses on spontaneous speech obtained with varying recording scenarios from five different groups of speakers. Patients suffering from motor speech disorder (dysarthria) affecting speech production are compared to healthy speakers. Three types of dysarthria have been explored: Parkinson's Disease, Amyotrophic Lateral Sclerosis and Cerebellar ataxia. This paper first presents general figures based on syllable-level annotation mining, including detailed information about healthy/pathological speakers variability. Then, we report on the results of automatic timing parsing of interval sequences in speech syllable annotations performed using TGA (Time Group Analysis) methodology. We observed that mean syllable-based speaking rates in time groups for the healthy speakers were higher than those measured in the recordings of dysarthric speakers. The variability in timing patterns (duration regression slopes, intercepts, and nPVI) depended also on the speaking styles in particular populations.

2016

Improvement of Continuous Dysarthric Speech Quality

Anusha Prakash, M. Ramasubba Reddy, Hema A Murthy

Dysarthria refers to a group of motor speech disorders as the result of any neurological injury to the speech production system. Dysarthric speech is characterised by poor speech articulation, resulting in degradation in speech quality. Hence, it is important to correct or improve dysarthric speech so as to enable people having dysarthria to communicate better. The aim of this paper is to improve the quality of continuous speech of several people suffering from dysarthria. Experiments in the current work use two databases- Nemours database and speech data collected from a dysarthric speaker of Indian origin. Durational analysis of dysarthric speech versus normal speech is performed. Based on the analysis, manual modifications are made directly to the speech waveforms and an automatic technique is developed for the same. Evaluation tests indicate an average preference of 78.44% and 67.04% for the manually and automatically altered speech over the original dysarthric speech, thus emphasising the effect of durational modifications on the perception of speech quality. Intelligibility of speech generated by three techniques, namely, proposed automatic modification technique, a formant re-synthesis technique, and an HMM-based adaptive system, is compared.

Dysarthric Speech Modification Using Parallel Utterance Based on Non-negative Temporal Decomposition

Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki

We present in this paper a speech modification method for a person with dysarthria resulting from athetoid cerebral palsy. The movements of such speakers are limited by their athetoid symptoms, and their consonants are often unstable or unclear, which makes it difficult for them to communicate. In this paper, duration and spectral modification using Non-negative Temporal Decomposition (NTD) is applied to a dysarthric voice. F0 is also modified by using linear-transformation. In order to confirm the effectiveness of our method, objective and subjective tests were conducted, and we also investigated the relationship between the intelligibility and individuality of dysarthric speech.

Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral Estimation

Chitralekha Bhat, Bhavik Vachhani, Sunil Kopparapu

Dysarthria is a motor speech disorder resulting from impairment in muscles responsible for speech production, often characterized by slurred or slow speech resulting in low intelligibility. With speech based applications such as voice biometrics and personal assistants gaining popularity, automatic recognition of dysarthric speech becomes imperative as a step towards including people with dysarthria into mainstream. In this paper we examine the applicability of voice parameters that are traditionally used for pathological voice classification such as jitter, shimmer, F0 and Noise Harmonic Ratio (NHR) contour in addition to Mel Frequency Cepstral Coefficients (MFCC) for dysarthric speech recognition. Additionally, we show that multi-taper spectral estimation for computing MFCC improves the unseen dysarthric speech recognition. A Deep neural network (DNN) - hidden Markov model (HMM) recognition system fared better than a Gaussian Mixture Model (GMM) - HMM based system for dysarthric speech recognition. We propose a method to optimally use incremental dysarthric data to improve dysarthric speech recognition for an ASR with DNN-HMM. All evaluations were done on Universal Access Speech Corpus.

Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model

Myungjong Kim, Jun Wang, Hoirin Kim

Dysarthria is a neuro-motor speech disorder that impedes the physical production of speech. Patients with dysarthria often have trouble in pronouncing certain sounds, resulting in undesirable phonetic variation. Current automatic speech recognition systems designed for the general public are ineffective for dysarthric sufferers due to the phonetic variation. In this paper, we investigate dysarthric speech recognition using Kullback-Leibler divergence-based hidden Markov models. In the model, the emission probability of state is modeled by a categorical distribution using phoneme posterior probabilities from a deep neural network, and therefore, it can effectively capture the phonetic variation of dysarthric speech. Experimental evaluation on a database of several hundred words uttered by 30 speakers consisting of 12 mildly dysarthric, 8 moderately dysarthric, and 10 control speakers showed that our approach provides substantial improvement over the conventional Gaussian mixture model and deep neural network based speech recognition systems.

2017

Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition

Bhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu

Dysarthria is a motor speech disorder, resulting in mumbled, slurred or slow speech that is generally difficult to understand by both humans and machines. Traditional Automatic Speech Recognizers (ASR) perform poorly on dysarthric speech recognition tasks. In this paper, we propose the use of deep autoencoders to enhance the Mel Frequency Cepstral Coefficients (MFCC) based features in order to improve dysarthric speech recognition. Speech from healthy control speakers is used to train an autoencoder which is in turn used to obtain improved feature representation for dysarthric speech. Additionally, we analyze the use of severity based tempo adaptation followed by autoencoder based speech feature enhancement. All evaluations were carried out on Universal Access dysarthric speech corpus. An overall absolute improvement of 16% was achieved using tempo adaptation followed by autoencoder based speech front end representation for DNN-HMM based dysarthric speech recognition

Cross-Database Models for the Classification of Dysarthria Presence

Stephanie Gillespie, Yash-Yee Logan, Elliot Moore, Jacqueline Laures-Gore, Scott Russell, Rupal Patel

Dysarthria is a motor speech disorder that impacts verbal articulation and co-ordination, resulting in slow, slurred and imprecise speech. Automated classification of dysarthria subtypes and severities could provide a useful clinical tool in assessing the onset and progress in treatment. This study represents a pilot project to train models to detect the presence of dysarthria in continuous speech. Subsets of the Universal Access Research Dataset (UA-Speech) and the Atlanta Motor Speech Disorders Corpus (AMSDC) database were utilized in a cross-database training strategy (training on UA-Speech / testing on AMSDC) to distinguish speech with and without dysarthria. In addition to traditional spectral and prosodic features, the current study also includes features based on the Teager Energy Operator (TEO) and the glottal waveform. Baseline results on the UA-Speech dataset maximize word- and participant-level accuracies at 75.3% and 92.9% using prosodic features. However, the cross-training of UA-Speech tested on the AMSDC maximize word- and participant-level accuracies at 71.3% and 90% based on a TEO feature. The results of this pilot study reinforce consideration of dysarthria subtypes in cross-dataset training as well as highlight additional features that may be sensitive to the presence of dysarthria in continuous speech.

Acoustic Evaluation of Nasality in Cerebellar Syndromes

M. Novotný, Jan Rusz, K. Spálenka, Jiří Klempíř, D. Horáková, Evžen Růžička

Although previous studies have reported the occurrence of velopharyngeal incompetence connected with ataxic dysarthria, there is a lack of evidence related to nasality assessment in cerebellar disorders. This is partly due to the limited reliability of challenging analyses and partly due to nasality being a less pronounced manifestation of ataxic dysarthria. Therefore, we employed 1/3-octave spectra analysis as an objective measurement of nasality disturbances. We analyzed 20 subjects with multiple system atrophy (MSA), 13 subjects with cerebellar ataxia (CA), 20 subjects with multiple sclerosis (MS) and 20 healthy (HC) speakers. Although we did not detect the presence of hypernasality, our results showed increased nasality fluctuation in 65% of MSA, 43% of CA and 30% of MS subjects compared to 15% of HC speakers, suggesting inconsistent velopharyngeal motor control. Furthermore, we found a statistically significant difference between MSA and HC participants (p<0.001), and significant correlation between the natural history cerebellar subscore and neuroprotection in Parkinson plus syndromes --- Parkinson plus scale and nasality fluctuations in MSA (r=0.51, p<0.05). In conclusion, acoustic analysis showed an increased presence of abnormal nasality fluctuations in all ataxic groups and revealed that nasality fluctuation is associated with distortion of cerebellar functions.

2018

Dysarthric Speech Classification Using Glottal Features Computed from Non-words, Words and Sentences

Narendra N P, Paavo Alku

Dysarthria is a neuro-motor disorder resulting from the disruption of normal activity in speech production leading to slow, slurred and imprecise (low intelligible) speech. Automatic classification of dysarthria from speech can be used as a potential clinical tool in medical treatment. This paper examines the effectiveness of glottal source parameters in dysarthric speech classification from three categories of speech signals, namely non-words, words and sentences. In addition to the glottal parameters, two sets of acoustic parameters extracted by the openSMILE toolkit are used as baseline features. A dysarthric speech classification system is proposed by training support vector machines (SVMs) using features extracted from speech utterances and their labels indicating dysarthria/healthy. Classification accuracy results indicate that the glottal parameters contain discriminating information required for the identification of dysarthria. Additionally, the complementary nature of the glottal parameters is demonstrated when these parameters, in combination with the openSMILE-based acoustic features, result in improved classification accuracy. Analysis of classification accuracies of the glottal and openSMILE features for non-words, words and sentences is carried out. Results indicate that in terms of classification accuracy the word level is best suited in identifying the presence of dysarthria.

A Multitask Learning Approach to Assess the Dysarthria Severity in Patients with Parkinson's Disease

Juan Camilo Vásquez Correa, Tomas Arias, Juan Rafael Orozco-Arroyave, Elmar Nöth

Parkinson's disease is a neurodegenerative disorder characterized by a variety of motor and non-motor symptoms. Particularly, several speech impairments appear in the initial stages of the disease, which affect aspects related to respiration and the movement of muscles and limbs in the vocal tract. Most of the studies in the literature aim to assess only one specific task from the patients, such as the classification of patients vs. healthy speakers, or the assessment of the neurological state of the patients. This study proposes a multitask learning approach based on convolutional neural networks to assess at the same time several speech deficits of the patients. A total of eleven speech aspects are considered, including difficulties of the patients to move articulators such as lips, palate, tongue and larynx. According to the results, the proposed approach improves the generalization of the convolutional network, producing more representative feature maps to assess the different speech symptoms of the patients. The multitask learning scheme improves in of up to 4% the average accuracy relative to single networks trained to assess each individual speech aspect.

Dysarthric Speech Recognition Using Convolutional LSTM Neural Network

Myungjong Kim, Beiming Cao, Kwanghoon An, Jun Wang

Dysarthria is a motor speech disorder that impedes the physical production of speech. Speech in patients with dysarthria is generally characterized by poor articulation, breathy voice and monotonic intonation. Therefore, modeling the spectral and temporal characteristics of dysarthric speech is critical for better performance in dysarthric speech recognition. Convolutional long short-term memory recurrent neural networks (CLSTM-RNNs) have recently successfully been used in normal speech recognition, but have rarely been used in dysarthric speech recognition. We hypothesized CLSTM-RNNs have the potential to capture the distinct characteristics of dysarthric speech, taking advantage of convolutional neural networks (CNNs) for extracting effective local features and LSTM-RNNs for modeling temporal dependencies of the features. In this paper, we investigate the use of CLSTM-RNNs for dysarthric speech recognition. Experimental evaluation on a database collected from nine dysarthric patients showed that our approach provides substantial improvement over both standard CNN and LSTM-RNN based speech recognizers.

2019

Personalizing ASR for Dysarthric and Accented Speech with Limited Data

Joel Shor, Dotan Emanuel, Oran Lang, Omry Tuval, Michael Brenner, Julie Cattiau, Fernando Vieira, Maeve McNally, Taylor Charbonneau, Melissa Nollstadt, Avinatan Hassidim, Yossi Matias

Automatic speech recognition (ASR) systems have dramatically improved over the last few years. ASR systems are most often trained from 'typical' speech, which means that underrepresented groups don't experience the same level of improvement. In this paper, we present and evaluate finetuning techniques to improve ASR for users with non-standard speech. We focus on two types of non-standard speech: speech from people with amyotrophic lateral sclerosis (ALS) and accented speech. We train personalized models that achieve 62% and 35% relative WER improvement on these two groups, bringing the absolute WER for ALS speakers, on a test set of message bank phrases, down to 10% for mild dysarthria and 20% for more serious dysarthria. We show that 71% of the improvement comes from only 5 minutes of training data. Finetuning a particular subset of layers (with many fewer parameters) often gives better results than finetuning the entire model. This is the first step towards building state of the art ASR models for dysarthric speech.

Feature Representation of Pathophysiology of Parkinsonian Dysarthria

Alice Rueda, J.C. Vásquez-Correa, Cristian David Rios-Urrego, Juan Rafael Orozco-Arroyave, Sridhar Krishnan, Elmar Nöth

This paper focuses on selecting features that can best represent the pathophysiology of Parkinson's disease (PD) dysarthria. PD dysarthria has often been the subject of feature selection and classification experiments, but rarely have the selected features been attempted to be matched to the pathophysiology of PD dysarthria. PD dysarthria manifests through changes in control of a person's speech production muscles and affects respiration, articulation, resonance, and laryngeal properties, resulting in speech characteristics such as short phrases separated by pauses, reduced speed for non-repetitive syllables or supernormal speed of repetitive syllables, reduced resonance, irregular vowel generation, etc. Articulation, phonation, diadochokinesis (DDK) rhythm, and Empirical Mode Decomposition (EMD) features were extracted from the DDK and sustained /a/ recordings of the Spanish GITA Corpus. These recordings were captured from 50 healthy (HC) and 50 PD subjects. A two-stage filter-wrapper feature selection process was applied to reduce the number of features from 3,534 to 15. These 15 features mainly represent the instability of the voice and rhythm. SVM, Random Forest and Naive Bayes were used to test the discriminative power of the selected features. The results showed that these sustained /a/ and /pa-ta-ka/ stability features could successfully discriminate PD from HC with 70% accuracy.

The CUHK Dysarthric Speech Recognition Systems for English and Cantonese

Shoukang Hu, Shansong Liu, Heng Fai Chang, Mengzhe Geng, Jiani Chen, Lau Wing Chung, To Ka Hei, Jianwei Yu, Ka Ho Wong, Xunying Liu, Helen Meng

Speech disorders affect many people around the world and introduce a negative impact on their quality of life. Dysarthria is a neural-motor speech disorder that obstructs the normal production of speech. Current automatic speech recognition (ASR) systems are developed for normal speech. They are not suitable for accurate recognition of disordered speech. To the best of our knowledge, the majority of disordered speech recognition systems developed to date are for English. In this paper, we present two disordered speech recognition systems for both English and Cantonese. Both systems demonstrate competitive performance when compared with the Google speech recognition API and human recognition results.

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

Daniel Korzekwa, Roberto Barra-Chicote, Bozena Kostek, Thomas Drugman, Mateusz Lajszczak

We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not provide interpretable outputs. On the contrary, we show that this latent space successfully encodes interpretable characteristics of dysarthria, is effective at detecting dysarthria, and that manipulation of the latent space allows the model to reconstruct healthy speech from dysarthric speech. This work can help patients and speech pathologists to improve their understanding of the condition, lead to more accurate diagnoses and aid in reconstructing healthy speech for afflicted patients.

Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition

Shansong Liu, Shoukang Hu, Yi Wang, Jianwei Yu, Rongfeng Su, Xunying Liu, Helen Meng

Automatic speech recognition (ASR) for disordered speech is a challenging task. People with speech disorders such as dysarthria often have physical disabilities, leading to severe degradation of speech quality, highly variable voice characteristics and large mismatch against normal speech. It is also difficult to record large amounts of high quality audio-visual data for developing audio-visual speech recognition (AVSR) systems. To address these issues, a novel Bayesian gated neural network (BGNN) based AVSR approach is proposed. Speaker level Bayesian gated control of contributions from visual features allows a more robust fusion of audio and video modality. A posterior distribution over the gating parameters is used to model their uncertainty given limited and variable disordered speech data. Experiments conducted on the UASpeech dysarthric speech corpus suggest the proposed BGNN AVSR system consistently outperforms state-of-the-art deep neural network (DNN) baseline ASR and AVSR systems by 4.5% and 4.7% absolute (14.9% and 15.5% relative) in word error rate.

Diagnosing Dysarthria with Long Short-Term Memory Networks
Alex Mayle, Zhiwei Mou, Razvan Bunescu, Sadegh

Mirshekarian, Li Xu, Chang Liu

This paper proposes the use of Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units for determining whether Mandarin-speaking individuals are afflicted with a form of Dysarthria based on samples of syllable pronunciations. Several LSTM network architectures are evaluated on this binary classification task, using accuracy and Receiver Operating Characteristic (ROC) curves as metrics. The LSTM models are shown to significantly improve upon a baseline fully connected network, reaching over 90% area under the ROC curve on the task of classifying new speakers, when a sufficient number of cepstrum coefficients are used. The results show that the LSTM's ability to leverage temporal information within its input makes for an effective step in the pursuit of accessible Dysarthria diagnoses.

2020

Dysarthria Detection and Severity Assessment Using Rhythm-Based Metrics

Abner Hernandez, Eun Jung Yeo, Sunhee Kim, Minhwa Chung

Dysarthria refers to a range of speech disorders mainly affecting articulation. However, impairments are also seen in suprasegmental elements of speech such as prosody. In this study, we examine the effect of using rhythm metrics on detecting dysarthria, and for assessing severity level. Previous studies investigating prosodic irregularities in dysarthria tend to focus on pitch or voice quality measurements. Rhythm is another aspect of prosody which refers to the rhythmic division of speech units into relatively equal time. Speakers with dysarthria tend to have irregular rhythmic patterns that could be useful for detecting dysarthria. We compare the classification accuracy between solely using standard prosodic features against using both standard prosodic features and rhythm-based features, using random forest, support vector machine, and feed-forward neural network. Our best performing classifiers achieved a relative percentage increase of 7.5% and 15% in detection and severity assessment respectively for the QoLT Korean dataset, while the TORGO English dataset had an increase of 4.1% and 3.2%. Results indicate that including rhythmic information can increase accuracy performance regardless of the classifier. Furthermore, we show that rhythm metrics are useful in both Korean and English.

Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-Based Voice Conversion System

Chen-Yu Chen, Wei-Zhong Zheng, Syu-Siang Wang, Yu Tsao, Pei-Chun Li, Ying-Hui Lai

The voice conversion (VC) system is a well-known approach to improve the communication efficiency of patients with dysarthria. In this study, we used a gated convolutional neural network (Gated CNN) with the phonetic posteriorgrams (PPGs) features to perform VC for patients with dysarthria, with WaveRNN vocoder used to synthesis converted speech. In addition, two well-known deep learning-based models, convolution neural network (CNN) and bidirectional long short-term memory (BLSTM) were used to compare with the Gated CNN in the proposed VC system. The results from the evaluation of speech intelligibility metric of Google ASR and listening test showed that the proposed system performed better than the original dysarthric speech. Meanwhile, the Gated CNN model performs better than the other models and requires fewer parameters compared to BLSTM. The results suggested that Gated CNN can be used as a communication assistive system to overcome the degradation of speech intelligibility caused by dysarthria.

Automatic Assessment of Dysarthric Severity Level Using Audio-Video Cross-Modal Approach in Deep Learning

Han Tong, Hamid Sharifzadeh, Ian McLoughlin

Dysarthria is a speech disorder that can significantly impact a person's daily life, and yet may be amenable to therapy. To automatically detect and classify dysarthria, researchers have proposed various computational approaches ranging from traditional speech processing methods focusing on speech rate, intelligibility, intonation, etc. to more advanced machine learning techniques. Recently developed machine learning systems rely on audio features for classification; however, research in other fields has shown that audio-video cross-modal frameworks can improve classification accuracy while simultaneously reducing the amount of training data required compared to uni-modal systems (i.e. audio- or video-only).

In this paper, we propose an audio-video cross-modal deep learning framework that takes both audio and video data as input to classify dysarthria severity levels. Our novel cross-modal framework achieves over 99% test accuracy on the UASPEECH dataset --- significantly outperforming current uni-modal systems that utilise audio data alone. More importantly, it is able to accelerate training time while improving accuracy, and to do so with reduced training data requirements.

Recognising Emotions in Dysarthric Speech Using Typical Speech Data

Lubna Alhinti, Stuart Cunningham, Heidi Christensen

Effective communication relies on the comprehension of both verbal and nonverbal information. People with dysarthria may lose their ability to produce intelligible and audible speech sounds which in time may affect their way of conveying emotions, that are mostly expressed using nonverbal signals. Recent research shows some promise on automatically recognising the verbal part of dysarthric speech. However, this is the first study that investigates the ability to automatically recognise the nonverbal part. A parallel database of dysarthric and typical emotional speech is collected, and approaches to discriminating between emotions using models trained on either dysarthric (speaker dependent, matched) or typical (speaker independent, unmatched) speech are investigated for four speakers with dysarthria caused by cerebral palsy and Parkinson's disease. Promising results are achieved in both scenarios using SVM classifiers, opening new doors to improved, more expressive voice input communication aids.

Automatic Discrimination of Apraxia of Speech and Dysarthria Using a Minimalistic Set of Handcrafted Features

Ina Kodrasi, Michaela Pernon, Marina Laganaro, Hervé Bourlard

To assist clinicians in the differential diagnosis and treatment of motor speech disorders, it is imperative to establish objective tools which can reliably characterize different subtypes of disorders such as apraxia of speech (AoS) and dysarthria. Objective tools in the context of speech disorders typically rely on thousands of acoustic features, which raises the risk of difficulties in the interpretation of the underlying mechanisms, over-adaptation to training data, and weak generalization capabilities to test data. Seeking to use a small number of acoustic features and motivated by the clinical-perceptual signs used for the differential diagnosis of AoS and dysarthria, we propose to characterize differences between AoS and dysarthria using only six handcrafted acoustic features, with three features reflecting segmental distortions, two features reflecting loudness and hypernasality, and one feature reflecting syllabification. These three different sets of features are used to separately train three classifiers. At test time, the decisions of the three classifiers are combined through a simple majority voting scheme. Preliminary results show that the proposed approach achieves a discrimination accuracy of 90%, outperforming using state-of-the-art features such as openSMILE which yield a discrimination accuracy of 65%.

2021

A preliminary approach to the acoustic-perceptual characterization of dysarthria

Eugenia San Segundo, Jonathan Delgado

We have conducted an acoustic-perceptual evaluation of 15 dysarthric and 15 neurologically healthy speakers. On the one hand, speech samples were analysed with Praat (13 acoustic parameters were extracted, related to F0, frequency and amplitude variation, as well as other source and vocal tract measures). On the other hand, two raters evaluated all the voices perceptually using the Simplified Vocal Profile Analysis (SVPA), which implements a visual analog scale for each voice quality setting in an online interface. The results show, for the perceptual analyses, that (1) intra- and interrater agreement is overall very good; and that (2) the perceptual settings 'vocal tract tension' and 'laryngeal tension' are the most useful to characterize dysarthria. In terms of statistical modelling, most linear models were significant using only 4-5 acoustic parameters, but the specific parameters in each model depend on the VPA setting under consideration. All in all, acoustic-perceptual assessment through the SVPA seems to be an important complement to traditional assessment in dysarthria, as it provides information on the functioning of the supraglottic structures commonly affected in this type of motor speech disorder, in which the muscles used to produce speech are damaged, paralyzed, or weakened.

On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease

J.C. Vásquez-Correa, Julian Fritsch, J.R. Orozco-Arroyave, Elmar Nöth, Mathew Magimai-Doss

Parkinson's disease produces several motor symptoms, including different speech impairments that are known as hypokinetic dysarthria. Symptoms associated to dysarthria affect different dimensions of speech such as phonation, articulation, prosody, and intelligibility. Studies in the literature have mainly focused on the analysis of articulation and prosody because they seem to be the most prominent symptoms associated to dysarthria severity. However, phonation impairments also play a significant role to evaluate the global speech severity of Parkinson's patients. This paper proposes an extensive comparison of different methods to automatically evaluate the severity of specific phonation impairments in Parkinson's patients. The considered models include the computation of perturbation and glottal-based features, in addition to features extracted from a zero frequency filtered signals. We consider as well end-to-end models based on 1D CNNs, which are trained to learn features from the raw speech waveform, reconstructed glottal signals, and zero-frequency filtered signals. The results indicate that it is possible to automatically classify between speakers with low versus high phonation severity due to the presence of dysarthria and at the same time to evaluate the severity of the phonation impairments on a continuous scale, posed as a regression problem.

Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization

Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

Dysarthric speech detection (DSD) systems aim to detect characteristics of the neuromotor disorder from speech. Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains respectively, but the two domains may differ in terms of speech stimuli, disease etiology, etc. It is hard to acquire labelled data in the target domain, due to high costs of annotating sizeable datasets. This paper makes a first attempt to formulate cross-domain DSD as an unsupervised domain adaptation (UDA) problem. We use labelled source-domain data and unlabelled target-domain data, and propose a multi-task learning strategy, including dysarthria presence classification (DPC), domain adversarial training (DAT) and mutual information minimization (MIM), which aim to learn dysarthria-discriminative and domain-invariant biomarker embeddings. Specifically, DPC helps biomarker embeddings capture critical indicators of dysarthria; DAT forces biomarker embeddings to be indistinguishable in source and target domains; and MIM further reduces the correlation between biomarker embeddings and domain-related cues. By treating the UASPEECH and TORGO corpora respectively as the source and target domains, experiments show that the incorporation of UDA attains absolute increases of 22.2% and 20.0% respectively in utterance-level weighted average recall and speaker-level accuracy.

2022

Interpretable dysarthric speaker adaptation based on optimal-transport

Rosanna Turrisi, Leonardo Badino

This work addresses the mismatch problem between the distribution of training data (source) and testing data (target), in the challenging context of dysarthric speech recognition. We focus on Speaker Adaptation (SA) in command speech recognition, where data from multiple sources (i.e., multiple speakers) are available. Specifically, we propose an unsupervised Multi-Source Domain Adaptation (MSDA) algorithm based on optimal-transport, called MSDA via Weighted Joint Optimal Transport (MSDA-WJDOT). We achieve a Command Error Rate relative reduction of 16% and 7% over the speaker-independent model and the best competitor method, respectively. The strength of the proposed approach is that, differently from any other existing SA method, it offers an interpretable model that can also be exploited, in this context, to diagnose dysarthria without any specific training. Indeed, it provides a closeness measure between the target and the source speakers, reflecting their similarity in terms of speech characteristics. Based on the similarity between the target speaker and the healthy/dysarthric source speakers, we then define the healthy/dysarthric score of the target speaker that we leverage to perform dysarthria detection. This approach does not require any additional training and achieves a 95% accuracy in the dysarthria diagnosis.

Automatic Speaker Verification System for Dysarthria Patients

Shinimol Salim, Syed Shahnawazuddin, Waquar Ahmad

Dysarthria is one of the most common speech communication disorder associate with a neurological damage that weakens the muscles necessary for speech. In this paper, we present our efforts towards developing an automatic speaker verification (ASV) system based on x-vectors for dysarthric speakers with varying speech intelligibility (low, medium and high). For that purpose, a baseline ASV system was trained on speech data from healthy speakers since there is severe scarcity of data from dysarthric speakers. To improve the performance with respect to dysarthric speakers, data augmentation based on duration modification is proposed in this study. Duration modification with several scaling factors was applied to healthy training speech. An ASV was then trained on healthy speech augmented with its duration modified versions. It compensates for the substantial disparities in phone duration between normal and dysarthric speakers of varying speech intelligibilty. Experiment evaluations presented in this study show that proposed duration modification-based data augmentation resulted in a relative improvement of 22% over the baseline. Further to that, a relative improvement of 26% was obtained in the case of speakers with high severity level of dysarthria.

Automated Detection of Wilson's Disease Based on Improved Mel-frequency Cepstral Coefficients with Signal Decomposition

Zhenglin Zhang, Li-Zhuang Yang, Xun Wang, Hai Li

Wilson's disease (WD), a rare genetic movement disorder, is characterized by early-onset dysarthria. Automated speech assessment is thus valuable in early diagnosis and intervention. Time-frequency features, such as Mel-frequency cepstral coefficients (MFCC), have been frequently used. However, human speech signals are nonlinear and nonstationary, which cannot be captured by traditional features based on the Fourier transform. Moreover, the dysarthria type of WD patients is complex and different from other movement disorders such as Parkinson's disease. Thus, developing sensitive time-frequency measures for WD patients is needed. The present study proposes DMFCC, the improved MFCC using signal decomposition. We validate the usefulness of DMFCC in WD detection with a sample of 60 WD patients and 60 matched healthy controls. Results show that the DMFCC achieves the best classification accuracy (86.1%), improving by 13.9%-44.4% compared to baseline features such as MFCC and the state-of-art Hilbert cepstral coefficients (HCCs). The present study is a first attempt to demonstrate the validity of automated acoustic measures in WD detection, and the proposed DMFCC provides a novel tool for speech assessment.

Validation of the Neuro-Concept Detector framework for the characterization of speech disorders: A comparative study including Dysarthria and Dysphonia

Sondes Abderrazek, Corinne Fredouille, Alain Ghio, Muriel Lalain, Christine Meunier, Virginie Woisard

Recently, we have proposed a general analytical framework, called Neuro-based Concept Detector (NCD), to interpret the deep representations of a DNN. Based on the activation patterns of hidden neurons, this framework highlights the ability of neurons to detect a specific concept related to the final task. Its main strength is to provide an interpretability tool for any type of DNN performing a classification task, whatever the application domain. Thanks to NCD, we have demonstrated the emergence of phonetic features in the classification layers of a CNN-based model for French phone classification. The emergence of this concept, of great interest in the field of clinical phonetics, has been studied considering healthy speech. Applied to Head and Neck Cancers, we have shown that this framework automatically reflects the level of impairment of the phonetic features produced by a patient, which is supported by the strong correlations with perceptual assessments performed by clinical experts. The objective of the work presented here is to validate the proposed framework by confronting it to new populations of patients, but with very different pathologies (neurodegenerative diseases/ Dysarthria and vocal dysfunction/ Dysphonia). The robustness of the approach to the phonetic content variability of read text is also studied.

Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Noeth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

State-of-the-art automatic speech recognition (ASR) systems perform well on healthy speech. However, the performance on impaired speech still remains an issue. The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. Dysarthric speech recognition is particularly difficult as several aspects of speech such as articulation, prosody and phonation can be impaired. Specifically, we train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model. Results suggest that speech representations pretrained on large unlabelled data can improve word error rate (WER) performance. In particular, features from the multilingual model led to lower WERs than Fbanks or models trained on a single language. Improvements were seen in English speakers with cerebral palsy caused dysarthria (UASpeech corpus), Spanish speakers with Parkinsonian dysarthria (PC-GITA corpus) and Italian speakers with paralysis-based dysarthria (EasyCall corpus). Compared to using Fbank features, XLSR-based features reduced WERs by 6.8%, 22.0%, and 7.0% for the UASpeech, PC-GITA, and EasyCall corpus, respectively.

Investigating the Impact of Speech Compression on the Acoustics of Dysarthric Speech

Kelvin Tran, Lingfeng Xu, Gabriela Stegmann, Julie Liss, Visar Berisha, Rene Utianski

Acoustic analysis plays an important role in the assessment of dysarthria. Out of a public health necessity, telepractice has become increasingly adopted as the modality in which clinical care is given. While there are differences in software among telepractice platforms, they all use some form of speech compression to preserve bandwidth, with the most common algorithm being the Opus codec. Opus has been optimized for compression of speech from the general (mostly healthy) population. As a result, for speech-language pathologists, this begs the question: is the remotely transmitted speech signal a faithful representation of dysarthric speech? Existing high-fidelity audio recordings from 20 speakers of various dysarthria types were encoded at three different bit rates defined within Opus to simulate different internet bandwidth conditions. Acoustic measures of articulation, voice, and prosody were extracted, and mixed-effect models were used to evaluate the impact of bandwidth conditions on the measures. Significant differences in cepstral peak prominence, degree of voice breaks, jitter, vowel space area, pitch, and vowel space area were observed after Opus processing, providing insight into the types of acoustic measures that are susceptible to speech compression algorithms.

2023

Towards Speaker-Independent Voice Conversion for Improving Dysarthric Speech Intelligibility

Seraphina Fong, Marco Matassoni, Gianluca Esposito, Alessio Brutti

To improve the intelligibility of dysarthric patient speech, state-of-the-art work has focused on speaker-dependent voice conversion (VC) systems. Speaker-dependent systems are computationally expensive as they require training an individual model for a given speaker and often need many hours of speech data to perform well. Recording hours of speech data can be challenging for patients with dysarthria to provide. The present work, as part of a master's thesis project, proposes to investigate speaker-independent approaches for improving dysarthric speech intelligibility. Objective evaluation of preliminary results demonstrate that speaker-independent VC has potential, with pretrained any-to-any models performing better than training a single many-to-many model from scratch.

Classification of Multi-class Vowels and Fricatives From Patients Having Amyotrophic Lateral Sclerosis with Varied Levels of Dysarthria Severity

Chowdam Venkata Thirumala Kumar, Tanuka Bhattacharjee, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh

Dysarthria due to Amyotrophic Lateral Sclerosis (ALS) progressively distorts the acoustic space affecting the discriminability of different vowels and fricatives. However, the extent to which this happens with increasing severity is not thoroughly investigated. In this work, we perform automatic 4-class vowel (/a/, /i/, /o/, /u/) and 3-class fricative (/s/, /sh/, /f/) classification at varied severity levels and compare the performances with those from manual classification (through listening tests). Experiments with speech data from 119 ALS and 40 healthy subjects suggest that the manual and automatic classification accuracies reduce with an increase in dysarthria severity reaching 59.22% and 61.67% for vowels and 41.78% and 38.00% for fricatives, respectively, at the most severe cases. While manual classification is better than automatic one for all severity levels except the highest severity case for vowels, the difference between the two gradually reduces with an increase in severity.

Transfer Learning to Aid Dysarthria Severity Classification for Patients with Amyotrophic Lateral Sclerosis

Tanuka Bhattacharjee, Anjali Jayakumar, Yamini Belur, Atchayaram Nalini, Ravi Yadav, Prasanta Kumar Ghosh

A major challenge involved in automatic dysarthria severity classification for patients with Amyotrophic Lateral Sclerosis (ALS) is the difficulty to build a speech corpus which is large enough to train accurate and generalizable classifiers. To overcome this constraint, we employ transfer learning approaches, specifically, fine-tuning from an auxiliary task and multi-task learning. Input feature reconstruction and gender classification, on the same ALS speech dataset or other healthy speech corpora, are explored as the auxiliary tasks. We use temporal statistics of mel-frequency cepstral coefficients as the features and dense neural networks for performing the primary and auxiliary tasks. Experiments suggest that transfer learning aids severity classification with up to 11.03% absolute increase in the average classification accuracy as compared to direct single task learning. The improvement is attributed mainly to better classification of the mild class than severe/normal classes.

Relationship between LTAS-based spectral moments and acoustic parameters of hypokinetic dysarthria in Parkinson's disease

Jan Svihlik, Vojtěch Illner, Petr Kryze, Mário Sousa, Paul Krack, Elina Tripoliti, Robert Jech, Jan Rusz

Although long-term averaged spectrum (LTAS) descriptors can detect the change in dysarthria of patients with Parkinson's disease (PD) due to subthalamic nucleus deep brain stimulation (STN-DBS), the relationship between LTAS variables with measures that relate to laryngeal physiology remain unknown. We aimed to find connections between LTAS-based moments and the main acoustic characteristics of hypokinetic dysarthria in PD as the response to STN-DBS stimulation changes. We analyzed reading passages of 23 PD patients in ON and OFF STN-DBS states compared to 23 healthy controls. We found a relation between the stimulation-induced change in several spectral moments and acoustic parameters representing voice quality, articulatory decay, net speech rate, and mean fundamental frequency. While the difference between PD and controls was significant across most acoustic descriptors, only the spectral mean and fundamental frequency variability could differentiate between ON and OFF conditions.

Which aspects of motor speech disorder are captured by Mel Frequency Cepstral Coefficients? Evidence from the change in STN-DBS conditions in Parkinson's disease

Vojtěch Illner, Petr Krýže, Jan Švihlík, Mário Sousa, Paul Krack, Elina Tripoliti, Robert Jech, Jan Rusz

One of the most popular speech parametrizations for dysarthria has been Mel Frequency Cepstral Coefficients (MFCCs). Although the MFCCs ability to capture vocal track characteristics is known, the reflected dysarthria aspects are primarily undisclosed. Thus, we investigated the relationship between key acoustic variables in Parkinson's disease (PD) and the MFCCs. 23 PD patients were recruited with ON and OFF conditions of Deep Brain Stimulation of the Subthalamic Nucleus (STN-DBS) and examined via a reading passage. The changes in dysarthria aspects were compared to changes in a global MFCC measure and individual MFCCs. A similarity was found in 2nd to 3rd MFCCs changes and voice quality. Changes in 4th to 9th MFCCs reflected articulation clarity. The global MFCC parameter outperformed individual MFCCs and acoustical measures in capturing STN-DBS conditions changes. The findings may assist in interpreting outcomes from clinical trials and improve the monitoring of disease progression.

Whisper Features for Dysarthric Severity-Level Classification

Siddharth Rathod, Monil Charola, Akshat Vora, Yash Jogi, Hemant A. Patil

Dysarthria is a speech disorder caused by improper coordination between the brain and the muscles that produce intelligible speech. Accurately diagnosing the severity of dysarthria is critical for determining the appropriate treatment and directing speech to suitable Automatic Speech Recognition systems. Recently, various methods have been employed to investigate the classification of dysarthria severity-levels using advanced features, including STFT and MFCC. This study proposes utilizing Web-scale Supervised Pretraining for Speech Recognition (WSPSR), also known as Whisper, encoder module for dysarthric severity-level classification using transfer learning approach. Whisper model is an advanced machine learning model used for speech recognition, which is trained on a large scale of 680,000 hours of labeled audio data. The proposed approach demonstrated a high accuracy rate of 98.02%, surpassing the accuracies achieved by MFCC (95.2%) and LFCC (96.05%).

Automatic Classification of Hypokinetic and Hyperkinetic Dysarthria based on GMM-Supervectors

Cristian David Ríos-Urrego, Jan Rusz, Elmar Nöth, Juan Rafael Orozco-Arroyave

Hypokinetic and hyperkinetic dysarthria are motor speech disorders that appear in patients with Parkinson's and Huntington's disease, respectively. They are caused due to progressive lesions or alterations in the basal ganglia. In particular, Huntington's disease (HD) is known to be more invasive and difficult to treat than Parkinson's disease (PD), producing more aggressive motor and cognitive alterations. Since speech production requires the movement and control of many different muscles and limbs, it constitutes a highly complex motor activity that may reflect relevant aspects of the patient's health state. This paper proposes the discrimination between patients with PD, HD, and healthy controls (HC) based on different speech dimensions. Speaker models based on Gaussian-mixture model supervectors are created with the features extracted from each speech dimension. The results suggest that it is possible to distinguish between PD and HD patients using the supervectors-based approach.

Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation

Enno Hermann, Mathew Magimai.-Doss

Speakers with dysarthria could particularly benefit from assistive speech technology, but are underserved by current automatic speech recognition (ASR) systems. The differences of dysarthric speech pose challenges, while recording large amounts of training data can be exhausting for patients. In this paper, we synthesise dysarthric speech with a FastSpeech 2-based multi-speaker text-to-speech (TTS) system for ASR data augmentation. We evaluate its few-shot capability by generating dysarthric speech with as few as 5 words from an unseen target speaker and then using it to train speaker-dependent ASR systems. The results indicated that, while the TTS output is not yet of sufficient quality, this could allow easy development of personalised acoustic models for new dysarthric speakers and domains in the future.

2024

Clustering approaches to dysarthria using spectral measures from the temporal envelope

Eugenia San Segundo Fernández, Jonathan Delgado, Lei He

Several clustering techniques were used for finding subgroups of speakers sharing common characteristics within a sample of 14 dysarthric speakers and 15 non-dysarthric speakers. Our classifying variables were five spectral measures computed from the temporal envelope of each of the four sentences read by the participants. The unsupervised k-means clustering algorithm showed that the optimal number of clusters in this dataset is two, with Cluster 1 matching almost exactly the dysarthric population and Cluster 2 the non-dysarthric population. As for the importance of each variable, a PCA analysis revealed that centroid, spread, rolloff and flatness contribute equally to the first component, and entropy contributes to the second component. Hierarchical agglomerative clustering further supported the separation into two main clusters (highlighting the relevance of these rhythmic measures to characterize dysarthria), but also allowed us to detect possible subgroups within each main speaker group.

Use of Natural Anchors for Improving Rater Reliability in Dysarthria Assessment: An Exploratory Study

Thushani Munasinghe, Deepthi Crasta, Kaila L Stipancic, Mili Kuruvilla-Dugdale

The aim of this project was to determine whether the use of anchors improves interrater and intrarater reliability when nonexpert listeners rated five features salient to hypokinetic dysarthria: overall severity, reduced loudness, articulatory imprecision, short rushes of speech, and monotony. Fourteen nonexperts rated 82 sentences recorded from individuals with Parkinson's disease and healthy controls using five separate equal appearing interval (EAI) scales to indicate their perception of the five features mentioned above. The listeners rated the samples twice, once without and once with external anchors. Interrater reliability and intrarater reliability were calculated using intraclass correlation coefficients (ICCs). Findings revealed an overall increase in both interrater and intrarater reliability for most features in the anchor condition, except for monotony, where a decrease in single-measures ICC was noted for the anchor compared to non-anchor condition. These preliminary findings highlight how external anchors can benefit interrater and interrater reliability when rating perceptual dimensions of dysarthria.

Exploring Pre-trained Speech Model for Articulatory Feature Extraction in Dysarthric Speech Using ASR

Yuqin Lin, Longbiao Wang, Jianwu Dang, Nobuaki Minematsu

Most speech technologies are beneficial for normal speakers, but less effective for speakers with dysphonia. Dysarthria is a motor speech disorder, involving some impairments in the process of speech production. Therefore, articulatory information is important for speech techniques for this special group.However, articulatory features are difficult to extract due to challenges in annotating articulation. Recent studies explored phonemic features in Wav2vec 2.0 pretrained speech models and found they carries some articulatory-related information. Based on this investigation, this paper proposes DS-AAFE to extract more accurate articulatory features from the pretrained speech model based phonemic features. In DS-AAFE, partial articulatory features are isolated from phonemic features by being jointly optimized with ASR. Articulatory attribute detection is employed to evaluate the articulatory information in the proposed features, demonstrating a notable enhancement in the accuracy of articulatory attribute detection. Furthermore, experiments on the UASpeech and TORGO dysarthria datasets showed that the proposed features improved the ASR performance for dysarthric speech.

Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Ming Gao, Hang Chen, Jun Du, Xin Xu, Hongxiao Guo, Hui Bu, Jianxing Yang, Ming Li, Chin-Hui Lee

Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments. MDSC encompasses information on age, gender, disease types, and intelligibility evaluations. Furthermore, we perform comprehensive experimental analysis on MDSC, highlighting the challenges encountered. We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance. MDSC will be released on https://www.aishelltech.com/AISHELL_6B.

Exploring Syllable Discriminability during Diadochokinetic Task with Increasing Dysarthria Severity for Patients with Amyotrophic Lateral Sclerosis

Neelesh Samptur, Tanuka Bhattacharjee, Anirudh Chakravarty K, Seena Vengalil, Yamini Belur, Atchayaram Nalini, Prasanta Kumar Ghosh

We explore the discriminability among /pa/, /ta/, and /ka/ syllables, spoken during diadochokinetic (DDK) task, at varied severity levels of amyotrophic lateral sclerosis (ALS) induced dysarthria. Though DDK rate is known to decline with increasing severity, the extent to which the discriminability among the syllables gets impacted at each severity level is not well understood. We perform manual and automatic classification of these three syllables on 100 ALS and 35 healthy subjects. Manual classification is done through listening tests. Spectral and self-supervised speech cues with deep neural classifiers are used for automatic classification. Manual classification accuracies decline from 84.07% on healthy utterances to 27.41% on utterances of the most severe patients. Automatic methods are found to outperform humans achieving 15.93% and 50.37% higher accuracies (absolute), respectively. Thus, discriminative acoustic cues seem to persist among the syllables, which automatic methods capture.

DysArinVox: DYSphonia & DYSarthria mandARIN speech corpus

Haojie Zhang, Tao Zhang, Ganjun Liu, Dehui Fu, Xiaohui Hou, Ying Lv

This paper introduces DysArinVox, a new pathological speech corpus in Chinese. It included 173 participants from 27 healthy individuals and 146 voice disorders, whose various types and severities of vocal impairments as diagnosed by speech pathology experts via auditory perceptual evaluations and laryngoscopic imagery. DysArinVox is designed to provide a high-quality Chinese resource for AI-driven diagnostics and prognostics. To ensure the efficiency of corpus collection, we meticulously crafted recording scripts represent Mandarin phonetically, ensuring comprehensive syllable representation with minimal lexical complexity. Additionally, incorporating laryngoscopic images of patients into the dataset offers extra visual information, facilitating the development of advanced diagnostic frameworks. To our knowledge, this database represents the most comprehensive corpus of Chinese pathological speech to date.

Improving Speech-Based Dysarthria Detection using Multi-task Learning with Gradient Projection

Yan Xiong, Visar Berisha, Julie Liss, Chaitali Chakrabarti

Speech analytic models based on deep learning are popular in clinical diagnostics. However, constraints on clinical data collection and sharing place limits on available dataset sizes, which adversely impacts trained model performance. Multi-task learning (MTL) has been utilized to mitigate the effect of limited sample size by jointly training on multiple tasks that are considered to be related. However, discrepancies between clinical and non-clinical tasks can reduce MTL efficiency and can even cause it to fail, especially when there are gradient conflicts. In this paper, we enhance the performance of dysarthria detection by using MTL with an auxiliary task of learning speaker embeddings. We propose a task-specific gradient projection method to overcome gradient conflicts. Our evaluation shows that the proposed MTL paradigm outperforms both single-task learning and conventional MTL under different data availability settings.

CDSD: Chinese Dysarthria Speech Database

Yan Wan, Mengyi Sun, Xinchen Kang, Jingting Li, Pengfei Guo, Ming Gao, Su-Jing Wang

Dysarthric speech poses significant challenges for individuals with dysarthria, impacting their ability to communicate socially. Despite the widespread use of Automatic Speech Recognition (ASR), accurately recognizing dysarthric speech remains a formidable task, largely due to the limited availability of dysarthric speech data. To address this gap, we developed the Chinese Dysarthria Speech Database (CDSD), the most extensive collection of Chinese dysarthria data to date, featuring 133 hours of recordings from 44 speakers. Our benchmarks reveal a best Character Error Rate (CER) of 16.4%. Compared to the CER of 20.45% from our additional human experiments, Dysarthric Speech Recognition (DSR) demonstrates its potential in significant improvement of communication for individuals with dysarthria. The CDSD database will be made publicly available at http://melab.psych.ac.cn/CDSD.html.

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis

Wing-Zin Leung, Mattias Cross, Anton Ragni, Stefan Goetze

Automatic speech recognition (ASR) research has achieved impressive performance in recent years and has significant potential for enabling access for people with dysarthria (PwD) in augmentative and alternative communication (AAC) and home environment systems. However, progress in dysarthric ASR (DASR) has been limited by high variability in dysarthric speech and limited public availability of dysarthric training data. This paper demonstrates that data augmentation using text-to-dysarthic-speech (TTDS) synthesis for finetuning large ASR models is effective for DASR. Specifically, diffusion-based text-to-speech (TTS) models can produce speech samples similar to dysarthric speech that can be used as additional training data for fine-tuning ASR foundation models, in this case Whisper. Results show improved synthesis metrics and ASR performance for the proposed multi-speaker diffusion-based TTDS data augmentation for ASR fine-tuning compared to current DASR baselines.

Automatic Assessment of Dysarthria using Speech and synthetically generated Electroglottograph signal

Fathima Zaheera, Supritha Shetty, Gayadhar Pradhan, Deepak K T

The formants are flat and dispersed in the short-term magnitude spectra (STMS) of dysarthric speech. This paper investigates the possibility of enhancing the performance of an automated dysarthric assessment by exploiting the complementary perceptual cues present in the STMS of speech and synthetically generated Electroglottograph (EGG) signal. To capture the complementary information through a single acoustic feature representation, the log Mel filterbank energy (LMFE) computed from both kinds of signal is averaged. The resulting LMFE is then used for the computation of Mel frequency cepstral coefficients (MFCCs). The analytical and experimental results presented on the UA-Speech corpus validate the efficacy of the proposed approach. For the x-vector-based automated dysarthric assessment system the accuracy and F1 score improved from 73% and 64% to 78% and 71%, respectively, in a speaker and text-independent mode when the MFCCs are computed from the averaged LMFE.

相关推荐
机器之心10 分钟前
统一细粒度感知!北大&阿里提出UFO:无需SAM,16个token让MLLM实现精准分割
人工智能
.Boss.14 分钟前
【高端局】组合多个弱学习器达到性能跃升的硬核集成算法
开发语言·人工智能·python·算法·机器学习
量子位20 分钟前
谷歌对齐大模型与人脑信号!语言理解生成机制高度一致,成果登 Nature 子刊
人工智能·openai
量子位23 分钟前
苹果新表被曝加摄像头,让 AI 有空间感知能力,中国小天才笑而不语
人工智能·openai
你觉得20524 分钟前
DeepSeek自学手册:《从理论(模型训练)到实践(模型应用)》|73页|附PPT下载方法
大数据·运维·人工智能·机器学习·ai·自然语言处理·知识图谱
KL_lililli25 分钟前
AI 生成 PPT 网站介绍与优缺点分析
人工智能·powerpoint
神经星星36 分钟前
AlphaFold应用新里程碑!剑桥大学团队提出AlphaFold-Metainference,精准预测无序蛋白质结构集合
人工智能·深度学习·机器学习
weixin_4341696037 分钟前
【机器学习】机器学习工程实战-第3章 数据收集和准备
人工智能·机器学习
梦想是成为算法高手37 分钟前
带你从入门到精通——自然语言处理(十. BERT)
人工智能·pytorch·python·深度学习·神经网络·自然语言处理·bert