Introduction

I am a type 1 diabetic and the biggest interruption to my day-to-day life is the onset of low blood sugar (hypoglycemia). Symptoms I most often experience when hypoglycemic include:

  • Fatigue
  • Shakiness
  • Sweating
  • Irritability
  • Fast heartbeat

If left untreated, hypoglycemia can lead to unconsciousness and death. The inability to react to low blood sugar while sleeping or away from emergency glucose reserves is a nagging worry for diabetics and their friends and family. Thankfully, continuous glucose monitoring technologies can alert diabetics to the onset of hypoglycemia, providing more ease of mind.

I first started using a continuous glucose monitoring device last year and it has been life changing. A sensor is attached to my abdomen and transmits a glucose level reading to an app on my phone every 5 minutes. The app alerts me whenever my blood sugar goes too low or too high. In addition, the app plots my sugar levels over the past few hours, meaning I can see if I am trending high or low and can react accordingly. I receive far more data about my sugar levels than I ever did when testing with blood sugar meters and I can easily export my data on the company’s website. The granularity of data and ease of accessibility mean my doctor can better track patterns in my blood sugar and form a plan to better manage my long-term health that works with my lifestyle.

Project Description

One service not part of my device is hypoglycemia prediction. While the app alerts me when my blood sugar is currently low, it does not indicate if I am going to be low. Thus, if I am not paying attention to the app, low blood sugar can strike. Even with immediate treatment, it can take me 15-60 minutes to recover from the sweaty, shaky, energy-sapping symptoms.

low-blood-sugar-and-recovery

For this project, I built a classifier that can predict whether my blood sugar will fall below 70 mg/dL 15 minutes into the future. 70 mg/dL is the (typical) level below which hypoglycemic symptoms arise and is the same threshold set on my app to trigger a low blood sugar alert. This work could be wrapped with the app’s alert system to broadcast incoming lows, allowing diabetics to react more quickly and limit the time spent hypoglycemic.

  • Find the project on my Github

Exploratory Data Analysis and Feature Generation

Glucose data are taken every five minutes. The only times with missing data are periods between sensors. Sensors last ten days and a new sensor takes two hours to calibrate, during which time no data is transmitted. For a given point, I grabbed the previous 30 minutes worth of data (6 points) and the point 15 minutes later (the value to be predicted). If no data were missing in this 45 minute window, I retained the data; else I threw it out. I made the final dataset by sampling points that were at least 15 minutes apart from every other point, ensuring that no two series in my set were too correlated. Finally, the app reports blood sugar levels below 40 mg/dL as “LOW”. I replaced all instances of “LOW” with 40 in order to make all data numeric.

example-low-to-predict

The final dataset contained 17,869 non-low and 502 low response values, meaning the classes are heavily imbalanced. Thankfully, I don’t go low that often!

I generated dozens of features that could be used to predict an incoming low. From these, I formed a final list of features that were not strongly-correlated to one another (to not violate logistic and linear regression assumptions), were easily interpretable, and inexpensive to compute with real-time data.

Final feature list:

  • LAST_VALUE: the last value in the time series, the point 15 minutes before the value to be predicted.
  • SLOPE: the slope of the line fitted to the 6 points in the time series. A more negative slope means my blood sugar is dropping quickly.
  • R_SQUARED: the r-squared value of the fitted line. An r-squared close to 1 indicates that the line is a good fit to the data.
  • DIFF_SLOPE: the slope of the line fitted to the successive differences of the data. This approximates the data’a curvature. A more negative curvature means my blood sugar is rising then falling quickly.
  • DIFF_R_SQUARED: the r-squared value of the fitted line to the successive differences.

feature-corrs

Hypoglycemia Prediction

I split the data into 70-30 training-test sets for logistic regression and random forest classifiers. Due to the class imbalance, I assigned class weights inversely proportional to their frequency. For the random forest classifier, I utilized hyperparameter tuning to maximize recall (minimizing false negative low predictions). Below is the resulting confusion matrix for each classifier. While recalls of >90% are achieved, there are a large number of false positives.

classifier-confusion-matrices

The high false positive rate is not too surprising since the low glucose threshold is ‘fuzzy’ and somewhat arbitrary. The vast majority of false positives have true glucose values between 70-90 mg/dL. In addition, even if my blood sugar is not low 15 minutes after the data in question, I will often trend low 20+ minutes later. As a result, I personally find the false positive rate acceptable since it still provides useful information about the upcoming state of my blood sugar. However, other users may be turned off by the frequency of false predictions.

example-false-positive

How does this result compare to other baselines? I tried two methods. The first fit a second-degree polynomial to each 30 minute glucose series and used the fit to predict the value 15 minutes later. The second built a linear regression model from the generated features to predict my future blood sugar, achieving an r-squared of 0.97. Both methods predicted true lows at a worse rate than the classifiers, but had fewer false positives. Linear regression outperformed polynomial fitting in all analyzed metrics.

poly-regression-fit

Since identifying true lows was my own priority, I stuck with the classifiers for the final implementation as those had the best recall. However, other users may be interested in a balance between low-prediction false positives and negatives. Random forest hyperparameter tuning on the relevant metric or using linear regression may be more appropriate for these users. The linear regression loss function could be custom-tuned to more harshly penalize inaccurate low predictions. Moreover, linear regression should be used if the forecasted glucose values are useful to report.

Summary

Low blood sugar prediction is important for diabetics who want to avoid the amount of time spent with hypoglycemic symptoms. Data from automated glucose monitoring devices can be used to generate features that are good predictors of impending low blood sugars. These features are interpretable and real-time performant. Future work should investigate just how far into the future we can predict an incoming low, and whether or not accurate predictions would require more data and features.