Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 11;20(9):2734.
doi: 10.3390/s20092734.

Identification of Risk Factors Associated with Obesity and Overweight-A Machine Learning Overview

Affiliations

Identification of Risk Factors Associated with Obesity and Overweight-A Machine Learning Overview

Ayan Chatterjee et al. Sensors (Basel). .

Abstract

Social determining factors such as the adverse influence of globalization, supermarket growth, fast unplanned urbanization, sedentary lifestyle, economy, and social position slowly develop behavioral risk factors in humans. Behavioral risk factors such as unhealthy habits, improper diet, and physical inactivity lead to physiological risks, and "obesity/overweight" is one of the consequences. "Obesity and overweight" are one of the major lifestyle diseases that leads to other health conditions, such as cardiovascular diseases (CVDs), chronic obstructive pulmonary disease (COPD), cancer, diabetes type II, hypertension, and depression. It is not restricted within the age and socio-economic background of human beings. The "World Health Organization" (WHO) has anticipated that 30% of global death will be caused by lifestyle diseases by 2030 and it can be prevented with the appropriate identification of associated risk factors and behavioral intervention plans. Health behavior change should be given priority to avoid life-threatening damages. The primary purpose of this study is not to present a risk prediction model but to provide a review of various machine learning (ML) methods and their execution using available sample health data in a public repository related to lifestyle diseases, such as obesity, CVDs, and diabetes type II. In this study, we _targeted people, both male and female, in the age group of >20 and <60, excluding pregnancy and genetic factors. This paper qualifies as a tutorial article on how to use different ML methods to identify potential risk factors of obesity/overweight. Although institutions such as "Center for Disease Control and Prevention (CDC)" and "National Institute for Clinical Excellence (NICE)" guidelines work to understand the cause and consequences of overweight/obesity, we aimed to utilize the potential of data science to assess the correlated risk factors of obesity/overweight after analyzing the existing datasets available in "Kaggle" and "University of California, Irvine (UCI) database", and to check how the potential risk factors are changing with the change in body-energy imbalance with data-visualization techniques and regression analysis. Analyzing existing obesity/overweight related data using machine learning algorithms did not produce any brand-new risk factors, but it helped us to understand: (a) how are identified risk factors related to weight change and how do we visualize it? (b) what will be the nature of the data (potential monitorable risk factors) to be collected over time to develop our intended eCoach system for the promotion of a healthy lifestyle _targeting "obesity and overweight" as a study case in the future? (c) why have we used the existing "Kaggle" and "UCI" datasets for our preliminary study? (d) which classification and regression models are performing better with a corresponding limited volume of the dataset following performance metrics?

Keywords: BMI; Prisma; Sklearn; calibration; classification; data visualization; deep learning; discrimination; eCoach; gradient descent; hypothesis test; lifestyle diseases; machine learning; model performance; monitoring; normal distribution; obesity; overweight; python; regression; sensor data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Prisma flowchart for the article selection process [16].
Figure 2
Figure 2
The focused epidemiological study triangle [33].
Figure 3
Figure 3
Correlation heatmap and classification accuracy of ML models to classify “BMI” data.
Figure 4
Figure 4
Performance metric of “SVM” classification with 5-fold cross validation.
Figure 5
Figure 5
(a) Reliability curve to classify the “BMI” data with different ML classifiers. (b) Reliability curve to classify the “BMI” data with “Calibrated Decision Tree”.
Figure 6
Figure 6
Correlation heatmap and classification accuracy of ML models to classify “Insurance” data.
Figure 7
Figure 7
Relationship between “smoker” and “charges”.
Figure 8
Figure 8
Performance metric of “Decision Tree” classification with 5-fold cross validation.
Figure 9
Figure 9
Relationship between “age category” and “BMI”.
Figure 10
Figure 10
Relationship between “age category” and “charges”.
Figure 11
Figure 11
(a) Obese condition by smoking status; (b) distribution of obese patient group by smoking status.
Figure 12
Figure 12
(a) Reliability curve to classify “Insurance” data with different ML classifiers. (b) Reliability curve to classify “Insurance” data with the “Calibrated Decision Tree”.
Figure 13
Figure 13
Performance metric of the “Decision Tree” classification with a 5-fold cross validation.
Figure 14
Figure 14
(a) Reliability curve to classify the “Eating-health-module” data with different ML classifiers. (b) Reliability curve to classify the “Eating-health-module” data with the “Calibrated Decision Tree”.
Figure 14
Figure 14
(a) Reliability curve to classify the “Eating-health-module” data with different ML classifiers. (b) Reliability curve to classify the “Eating-health-module” data with the “Calibrated Decision Tree”.
Figure 15
Figure 15
(a) Relationship of the outcome (obesity) with blood glucose; (b) relationship of the outcome (obesity) with blood pressure; (c) relationship of the outcome (obesity) with age.
Figure 16
Figure 16
(a) Reliability curve to classify the “Diabetes” data with different ML classifiers. (b) Reliability curve to classify the “Diabetes” data with the “Calibrated LR”.
Figure 16
Figure 16
(a) Reliability curve to classify the “Diabetes” data with different ML classifiers. (b) Reliability curve to classify the “Diabetes” data with the “Calibrated LR”.
Figure 17
Figure 17
Correlation heatmap and classification accuracy of ML models to classify the “Cardiovascular-disease” data.
Figure 18
Figure 18
(a) Performance metric of the “SVM” classification with a 5-fold cross validation. (b) Performance metric of the “Logistic Regression” classification with a 5-fold cross validation.
Figure 19
Figure 19
(a) Reliability curve to classify the “Cardiovascular disease” data with different ML classifiers. (b) Reliability curve to classify the “Cardiovascular disease” data with the “Calibrated LR”.
Figure 19
Figure 19
(a) Reliability curve to classify the “Cardiovascular disease” data with different ML classifiers. (b) Reliability curve to classify the “Cardiovascular disease” data with the “Calibrated LR”.

Similar articles

Cited by

References

    1. Willett W.C., Hu F.B., Thun M. Overweight, obesity, and all-cause mortality. JAMA. 2013;309:1681–1682. doi: 10.1001/jama.2013.3075. - DOI - PubMed
    1. GBD 2015 Obesity Collaborators Health effects of overweight and obesity in 195 countries over 25 years. N. Engl. J. Med. 2017;377:13–27. doi: 10.1056/NEJMoa1614362. - DOI - PMC - PubMed
    1. Ward Z.J., Bleich S.N., Cradock A.L., Barrett J.L., Giles C.M., Flax C., Long M.W., Gortmaker S.L. Projected US State-Level Prevalence of Adult Obesity and Severe Obesity. N. Engl. J. Med. 2019;381:2440–2450. doi: 10.1056/NEJMsa1909301. - DOI - PubMed
    1. WHO Page. [(accessed on 18 March 2020)]; Available online: https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight; https://www.who.int/nmh/publications/ncd_report_chapter1.pdf.
    1. CDC Page. [(accessed on 18 March 2020)]; Available online: https://www.cdc.gov/obesity/adult/index.html.
  NODES
chat 5
INTERN 1
Project 1
twitter 2