Thanasis Tsanas

Professor of Digital Health and Data Science | Director of Knowledge Exchange and Research Impact

Background

Thanasis originally trained in Biomedical Engineering, and completed a PhD in Applied Mathematics at the University of Oxford (2012). Since 2022 he holds the Chair (Full Professor, tenured) in Digital Health and Data Science at the Usher InstituteUniversity of Edinburgh.

He received the Andrew Goudie award (top PhD student across all disciplines, St. Cross College, University of Oxford, 2011), the EPSRC Doctoral Prize award (2012) as one of only 8 Oxford PhD students across 11 departments, the young scientist award (MAVEBA, 2013), the EPSRC Statistics and Machine Learning award (2015), and the BIOSTEC/Biosignals Best paper award (2021). He is Co-founder of the NHS Digital Academy, the first national digital health informatics leadership programme, where he led one of the six core Themes (Clinical Decision Support and Actionable Data Analytics', 2018-2022). Outputs of his work have been used by the NHS and industrial partners including Intel Corporation, LSVT Global, Roche, Abbott, GSK, Aculab, and Mirador Analytics.

He was a Turing Fellow (2021-2024) and a Bayes Innovation Fellow (2023-2024). He is a Founding Fellow of the Academy for the Mathematical Sciences, a Fellow of the Royal Society of Edinburgh, and a Fellow of the Royal Society of Medicine (exceptionally considered due to his significant contributions to Medicine).

Qualifications

BSc, BEng, MSc, PhD, FHEA

Open to PhD supervision enquiries?

Yes

Areas of interest for supervision

Please check my research group website for further information, including for openings: https://www.darth-group.com

Research summary

My research focuses on developing novel tools for data mining and extracting domain information through time series analysis, signal processing, and statistical machine learning. I have worked primarily on applications in healthcare and in particular neuroscience and mental disorders. For example, my PhD work focused on using speech to (a) differentiate healthy controls from people with Parkinson’s disease, and (b) to replicate a Parkinson’s disease symptom severity metric. Other applications of my work on speech-related projects include voice forensics, and even attempting to understand mouse communication through processing their vocalisations.

I have also developed generic tools for feature selection, that is identifying a robust subset of characteristics which is jointly maximally predictive of the outcome of interest. This can have diverse applications in cases where there are multiple characteristics or genes which are collected, and we are interested to focus on the most likely subset of the collected characteristics for further processing. Furthermore, I have developed a robust information fusion scheme aggregating multiple sources (or experts) to get the best outcome when sources (or experts) do not agree on the best diagnosis/course of action: under certain assumptions, I have shown it can outperform the single best expert in the cohort.

More recently, I have been working on developing algorithms to mine data collected from wearable sensors such as smartwatches. The aim is to provide an efficient, robust, objective way to characterise activity, sleep, and circadian rhythm patterns and assist people in daily living or monitor patient groups through treatment. Mining the data collected from these wearable sensors along with data collected from mobile phones offers an unprecedented opportunity to monitor longitudinal patterns at a large scale, and can help revolutionise contemporary healthcare.

Key results of my work have been featured in the news such as Reuters, ScienceDaily and MedicalXpress, and one of my research papers has been highlighted as a “key scientific article” in Renewable Energy and global innovations.

Current research interests

For research work, projects, and interests please visit https://www.darth-group.com/projects

For an up to date list see:

https://scholar.google.com/citations?user=61e_0BIAAAAJ

 

For a list which I try to keep reasonably up to date:

https://www.darth-group.com/publications 

This is a collection of datasets I have used in my research and which are made freely available. Please cite the relevant papers if you use these datasets in your research. The datasets have also been deposited in the standard UCI Machine Learning Repository.

 

Parkinson’s telemonitoring

[MATLAB]          [Excel]          UCI

This study looked into the problem of mapping dysphonia measures (speech signal characteristics) to a standard clinical metric of Parkinson’s disease symptom severity. The dataset comprises 5875 samples and 16 features to predict a real valued response (regression problem). It can also be used as a multi-class classification problem if the response is rounded to the nearest integer. More details can be found in my IEEE Transactions on Biomedical Engineering 2010 paper. Please include the following citation if you use it in your work:

======================================================================================================================================================================

Energy efficiency

[MATLAB]          [Excel]          UCI

This study looked into the problem of assessing heating load and cooling load (that is, energy efficiency) as a function of some building parameters. The dataset comprises 768 samples and 8 features to predict two real valued responses (regression problem). It can also be used as a multi-class classification problem if the response is rounded to the nearest integer. More details can be found in my Energy and Buildings 2012 paper. See also this Supplementary Material with additional information. Please include the following citation if you use it in your work:

======================================================================================================================================================================

LSVT Voice rehabilitation

[MATLAB]          [Excel]          UCI

This study uses 309 speech signal processing algorithms to characterize 126 signals from 14 individuals collected during voice rehabilitation. The aim is to replicate the experts’ assessment denoting whether these voice signals are considered “acceptable” or “unacceptable” (binary classification problem). More details can be found in my IEEE Transactions on Neural Systems and Rehabilitation Engineering 2014 paper. Please include the following citation if you use it in your work:

======================================================================================================================================================================

Sustained vowels /a/ with F0 ground truth

[Zipped data]

The accurate estimation of the fundamental frequency (F0) is a well-known challenging problem in the speech signal processing research community. Unfortunately, it is difficult to obtain objective ground truth values with contemporary approaches which rely on EGGs. Here, we used a sophisticated, state of the art physiological model of voice production to construct sustained /a/ vowels, where the exact ground truth of F0 values is known. We benchmarked 10 established F0 estimation algorithms, and proposed a novel fusion approach to further improve F0 estimates. We would like to encourage researchers to use this database when evaluating F0 estimation algorithms in order to benchmark results in this application. More details can be found in my IEEE Transactions on Neural Systems and Rehabilitation Engineering 2014 paper. (Note that here I am providing 130 *.wav files, and the ground truth values are provided in an Excel spreadsheet). Please include the following citation if you use it in your work: