Monday, November 30, 2015

Dexcom's temperature compensation issues?

I wrote previously about the issues we had with warm baths and the predictive part of the Libre algorithm. You can have a look at what I wrote about warm baths in Some Libre peculiar behavior you should know about: temperature and I Love the Libre when it misbehaves. Temperature compensation is a bit tricky for CGMs. To some extent, the Dexcom G4 non AP algorithm's averaging behavior and the fact that packets were not received when the transmitter was submerged masked the effect with the G4. One had to dig into the details of the Dexcom patents to see it was also affected (as expected as it is a physico-chemical issue).

For some reason (I speculate, but don't have rock solid data yet), the G4 AP is clearly more sensitive to temperature changes than the non AP version. Here's a dump of some this last two week's traces. In all cases, after the drop of signal that indicates submersion, we observe false higher reported ISIGs (confirmed by BG tests). See for yourself.






At this point, I am simply reporting the fact and considering the hypothesis that it is indeed caused by the temperature increase. There could be another cause, for example water changing the conductivity between the two transmitter plots (resistance should be infinite).  We'll try to run an additional test to invalidate the temperature hypothesis, but that will not be a cold bath. I love experiments, but will not torture the kid.



Thursday, November 12, 2015

Diabetic Autonomic Neuropathy - RR Intervals analysis part 2

Basic ECG Info.


When you record an ECG, you record the electrical activity generated by the heart's contraction. Each beat should produce something like this


The P Wave represents the atrial depolarization (contraction) triggered by the autonomic system. The PR segment measures the time it takes for the influx to travel from the atria to the ventricules. The QRS complex represents the ventricular depolarization (contraction) and the T wave represents the ventricular repolarization (relaxation). A beating heart produces (hopefully) a stream of PQRSTs. The autonomic "balance" controls the heart rhythm from a bundle of nerve tissue called the sinoatrial node (SA). Ideally, it is at the SA that the autonomic triggers occurs and where it should be measured but that would require intra-cardiac electrodes... That is why the very characteristic tip of the R wave is used as a proxy. RR Intervals variability analysis is simply a fine grained analysis of your base heart rythm variation.

Normal vs Diabetic vs poorly controlled diabetic.


Let's start by the chart showing the main point in Clarke's paper about RR Interval analysis in diabetics.
Non diabetics have a SDNN - which stands for Standard Deviation Normal (R) to Normal (R) - between 50 and 100 milliseconds. The diabetic population as a whole shows a very different distribution, from 0 to 100 milliseconds, heavily skewed to the sub 50 milliseconds variability zone. (with a possible outlier). Diabetics with proliferative diabetic neuropathy show a drastic reduction in RR variability. That looks simple enough...

But why the SDNN and not simply the SD? Well, even a normal healthy heart can offer some spectacular, but harmless, non normal beats such as ventricular extra systoles. At some point, your ventricule decides it has waited enough and contracts spontaneously. Here is one of those PVC (premature ventricular contraction) on my own heart. You have probably had those and call them "palpitations". Such a spectacular beat, however, introduces a significant variability in your heart rhythm and must be excluded for the analysis. SDNN is not always SD. Add other potential rhythm troubles and you realize that whole recordings used in RR analysis must be reviewed by a cardiologist in order to exclude abnormal beats. At that point, the optimal method to "fill the blanks" also starts to matter. No extra beats allowed - that is the first pit in which you must not fall when you are doing a RR analysis. (and if any MD happens to read this, yes, I know about my P-Wave)


Here is the recording of Max heart I used for the rest of our tests. I could use one of the many semi arbitrary reject filter used in the literature, manual beat by beat review. Depending on the threshold, 1 to 3 beats would be rejected, I had doubts on one of them visually and went for one reject.


How did I get that ECG? With an ECG machine, obviously. A somewhat amateurish Prince 180D. Does it matter? Yes and no.

Yes, it does matter because the consumer ECG machines on the market (very good site about consumer ECG machines) have one fundamental limit: their sampling frequency. The ECG signal is measured 150 times per second. To put things in perspective, an entry level decent professional machine will take 960 samples per second... but will cost ten times as much. (There are also other differences: denoising and signal cleanup algorithms, number of simultaneous channels etc...). Today, you wouldn't find a cardiologist that would consider 150 Hz sampling useful.

 But no, it does not matter because a lot of the RR interval analysis done in the late 80s and 90s was done on what would be considered an unacceptable sampling rate today. 128 Hz holters were frequent. ECG sampled at a low frequency: their signal only looked smooth because it was drawn by multiple moving pens...

Why worry about the sampling frequency? Because in order to find the exact timing of the R peaks, the signal must be processed. It is very hard, in a blog post, to detail that processing but lets simply say that you have to slice and dice the signal until you manage to transform the peaks into zero crossings that you can accurately measure. One of the best known algorithm used for that purpose is the Pan Tomkins algorithm (details here) and that is the one I used. Here is the result.


The red dots are the data points measured by the Prince 180D. The sampling frequency problem is obvious. Sampling 150 times per second gives enough resolution for most of the ECG, except the QRS. The trigger of the ventricular contraction happens in around 80 mseconds. That means you'll only get a dozen data points during a QRS... not exactly an optimal resolution. That's the "yes it does matter" part.

However, the green lines are the tips of the Rs detected by the Pan Tomkins algorithm. And they are extremely accurate. That's the "no, it does not matter" part. A higher sampling rate may provide a slightly more accurate result, but we don't need that for our purpose.

Sanity check: when I started this project, I had decided that I would be satisfied with an OK result given the confidence levels I could estimate (a couple of milliseconds) and that I would, of course, contact a specialized cardiologist to investigate further any abnormal results. I believe it is great to understand as fully as you can a clinical test or procedure, but one must be very careful not to overreach when it may matter.

Sampling frequency will matter more for spectral analysis (which I will possibly cover later) as explained here. So, remember, sampling frequency is the second pitfall to avoid (for "amateurs"). A decent RR interval analysis can not be achieved with any ECG that samples below 100Hz: no Arduino project, no cell phone without specialized hardware. Top of the line heart rate monitors might work (I haven't tried) and RR interval analysis can also be used to detect over training.

The third dark area is the lack of standard protocols, reference values or consistent results in the literature. Detailed methods descriptions have only recently been required/mandatory for publications. A lot of the medical literature is very fuzzy in that respect. Some attempts have been made to improve the situation over the years, but the RR Interval test isn't the recommended first test for autonomic neuropathy exploration (it is an eventual subset of the "breathing test" part of the Mayo recommended test array). A detailed look at why RR analysis is so interesting and why it hasn't been used in general can be found in this great article "Tests for early diagnosis of cardiovascular autonomic neuropathy: critical analysis and relevance."


But there is more! As we have previously seen, the autonomic balance is in a constant state of flux. It defines the ability of our cardio-vascular system to react to the thousands of events in our lives. The downside is that variability is affected by a lot of things: climb stairs to visit your doctor? stressed by the examination protocol? annoyed to undressed partly? cold? had coffee before the test? insulin? All these factors and dozens of others can significantly modify your RR variability test.

And, of course, age does matter! There are a lot of variables to take into account.

Some results

To conclude that RR analysis part 2, here are some of Max's results with a few comments.

File Size: 100240 bytes. It contains 10 pages
Hardware Version: 2.6
ECG Recorded on: 14/8/2015 at 20:38:0
Total ECG run time: 300.0 seconds
Number of samples:  45000 150 Hz

The Prince 180D ECG file format is totally non standard, some minor reverse engineering was required. If there is some interest, will detail in another post.

1st pass analysis
---------------------
Detected Heart Beats : 402
Average FC (run length) : 80.4
Average FC(RR) : 80.48

First pass with Pan Tomkins. His resting heart rate is a bit higher than usual, probably because I had to run after him to organize the test and the novelty of it added some excitement. A lower frequency would have increased SDNN somewhat.

Cleaning artifacts
-----------------------
1 beats rejected. Reject List: [187] ...

One beat will be rejected on the criteria that it introduces a RR that is outside the 75% to 125% of the RR average of surrounding beats. Visually, it looks like a normal beat, but is a bit noisy from an electrical point of view. As I said, consumer ECG denoising isn't optimal.

Removing Extra beats
--------------------
Duration of clean run 297.272 secs

Preparing RR Data
-----------------
MRR (mean of RR intervals) : 745.18 msec
RMS Intervals : (RMSSD) 43.4575816574 msec
SDNN (standard deviation of normal to normal): 58.1472062957 msec
NN50, pNN50 (86, 21.55388471177945) n, %

SDNN is what we are after here. With a 58 +/- 2 msec SDNN, Max would be in the lowest bucket of the normal population and in the top half of the diabetic population. In other words, at this stage, no indication of autonomic diabetic neuropathy.

Final words for this part

There are tons of other interesting things you can do in RR interval analysis. Here is again a striking example of what it can achieve (Yang, 2006).


and the same vizualization for Max's recording which shows a healthy spread (but indicates he isn't totally well rested - again another story).

For the astute reader, look at Max's Poincare plot below, centered on 750 ms / 750 ms then look at the plot above. Can you guess Max's age? Correct, almost 15


The next part (whenever it comes) will cover more advanced analysis methods, a great professional free software that I used to double check my results and possibly some info on reversing the binary file format of the Prince 180D ECG and its interpretation.




Sunday, November 8, 2015

Comparing the "nonAP" Dexcom G4 with the "505 AP" Dexcom G4

I am delaying, once again, the RR variability ECG post: while it was probably one of the most enjoyable thing I did, both from a "minor hacking" and from a learning point of view, it probably doesn't interest many readers.

G4 "non AP" vs the G4 "505 AP"


As you probably know if you follow the CGM world, the Dexcom G4 is currently available with two different algorithms.

The first one which, for convenience I will call "non AP" can be found in all G4 receivers sold outside the US and in all "pediatric" receivers currently sold in the US. It is the original algorithm the G4 used when it was first released. 

The second one which I will call the "505" can be found in the firmware updated original receivers, the adult "share" receivers and, of course, the newly released G5.

It is widely assumed that the "non AP" algorithm relies on an average of the previous secondary raw values it as received while the "505" algorithm is, at least in part, influenced by the collaboration Dexcom had with the Padua University and tends to use secondary raw values more aggressively than the "non AP" algorithm, at least when they are marked clean...

Why only now?

 

Since I live in Belgium, the "505" algorithm has not yet officially been made available to us: it will, when the G5 hits our shores, which could be any time soon... or not. To be honest, I could and probably should have made this comparison earlier. By the way, I want to take this opportunity to thank reader "D." who offered to send me a "505" share receiver as soon as it was released (but I was busy with the Libre back then), reader "J." who offered to send G5 sensors for my "mad scientist experiments" and the many readers who offered tips on how to bypass the restrictions and re-flash the non US G4 firmware to the latest version. The T1D communities I have joined are full of wonderful people.

So why take the bait now? The first reason is that is that reader's "K." offer to send me a G4 Share came at the right moment... exactly when Dexcom was releasing the G5. The second reason is that I don't plan to upgrade (or should I say downgrade) to the G5 any time soon, or until I have no choice. While the G5 runs the 505 algorithm, it would, at this point, be a step forward and three steps back from my point of view.

Test Setup and Limitations

 

  • We decided to run a "non AP" and a "505" version side by side. Max was extremely reliable during the test and both receivers were with him at all times. There's a very small difference in the number of packets received (data below).
  • Both receivers were started at the same time and have been calibrated, per manufacturer's instructions, at the exact same second using both hands to press OK simultaneously.
  • Limitation: we did not use our non standard - but for us optimal - calibration strategy. That strategy has served us well with the "non AP" but I wasn't sure it would help the "505" as much. We skipped it to keep the field as even as possible.
  • The sensor we used lasted the whole seven day period but will not be remembered as the best sensor we ever had. Post insertion, it showed quite a bit of that oscillation/secondary level noise on data marked as clean. 
  • Our ISIG profile during the period wasn't probably a typical Type 1 Diabetic profile. That may have limited the benefit we derived out of the "505"
  • While I try to do my best not to make unsubstantiated claims and present only data I have enough confidence to use for myself, keep in mind I have one subject and one glycemic profile. I exclude data I could not defend (see the accuracy comment below), I go through a ton of double checks, confidence factor calculations and other goodies on my data set but my goal is NOT to publish rock solid authoritative stuff. I just want to look at things less subjectively than what is seen in the average user report.

 

 Results 

 

!!! important note: when reading chart, keep in mind that the data is interpolated every 30 seconds between actual data points !!!
The results given by both sensors were extremely close. The Pearson correlation coefficient for the whole period was 0.97559. Most of the discrepancy came from the first day post insertion where the "505" would happily display "clean" but jumpy data whereas the "non AP" would average the jumps out. Here is a zoom of one of such events.

The "505" may actually have spotted the rise sooner that the "non AP" but it spoiled its advantage by tracking the jumpy secondary raw too closely because it was marked "clean".

Here is the whole period display.
The first thing to note is that, when calibrated at exactly the same time, the two algorithms will produce results that are extremely close. Except for the startup issues shown above, we could not find a single situation where one algorithm became so confused that it differed markedly from the other.

The second aspect I wanted to look at was the speed of reaction - how much faster was the "505" compared to the "non-AP"? In a similar comparison, the Libre beat the G4 non AP by no less than 9 minutes (this correlates well with the Dexcom reported "G4 vs YSI" data and the Abbott reported "Libre vs YSI" data). This type of comparison or time delay determination is usually done by shifting the signal and finding the best correlation.

In this test average, the "505" beat the "non AP" by only 3 minutes (6 period of 30 seconds).  Cherry picking periods of rapid changes, I could find runs where the optimal correlation time shift was 5.5 minutes. I could also find runs where it lagged (on average) by 30 seconds. I decided not to cherry pick as it is a slippery slope: selective cherry picking could be used to demonstrate anything. Here is the zoom on a tennis afternoon where we keep falling and correcting as the exercise went on (keep in mind that in this zoom, there are again 10 points for a standard Dexcom data point). This particular exercise occurred 20 hours after insertion and the jumpiness of the signal is still somewhat present. In some circumstances, such as the last fall and rise, the 505 clearly reacted more quickly and accurately. But the first peak is clearly a draw.



Here  is another 23 hours period where the "505" algorithm is mostly (on average), but not always, ahead of the "non AP" one (on average, one minute or two 30 seconds readings).
 

Again, two points worth noting
  • as far as use in sports is concerned, the "505" is marginally better than the "non AP". However, that improvement is not nearly as spectacular as it is with the Libre and, in particular, the Libre spot checks (which I assume to be in part predictive).  While we could play a whole tennis tournament using the Libre as only BG measuring tool, we were forced to use BG tests to check that out re-carbing was sufficient during tennis training sessions. 
  • the "data reality" differs markedly from our perception. I believe this is caused by the following subjective factor: if the "505" picks up a trend faster than the "non AP", it will remain ahead for the rest of the trend and the user who compares will be constantly reminded that the "505" is ahead. That creates a positive reinforcement and falsifies our perception a bit.
While I was a bit disappointed - I expected something like a systematic 5 minutes and an optimal 7.5 minutes advantage for the "505" - by the numbers, let's keep in mind that every occasion where the 505 is ahead is a bonus for the user: knowing you are falling more quickly than you expect 5 mins in advance is significant, knowing your re-carbing out of hypo worked 5 mins earlier is reassuring.

Accuracy

 

As I said above, our sensor wasn't a stellar performer (we averaged a 14% MARD for both algorithms, which is on the very low end for us). This is why I won't provide a detailed accuracy analysis but just impressions: in a "gut feeling but not statistically significant way" I'd say that the "505" showed better accuracy in the low range (below 80 mg/dl) but worse accuracy in the high range (above 150 mg/dl). This is somewhat visible in the global view where you can see the "non AP" climb above the "505" on several occasions: in all the cases we tested, the "non AP" was closer to the BG meter. Both the "non AP" and the "505" underestimated the BG value.

Closing Thoughts

 

The "non AP" approach is better for the possibly unstable conditions that characterize post-insertion. The "505" algorithm is generally better in all other cases, especially in the low range. It will, in most fast changing conditions, flag the rise or the fall more quickly than the "non AP".

This being said, the 5 minutes sampling frequency severely limits the CGM usefulness for sports or any other activity where ups and downs are to be expected. While Dexcom seems to have cornered the market at the moment (late 2015) I can't help thinking about how much better the Libre sensor...
  • the fact that the Libre can be factory calibrated means that, in most cases, Abbott is able to produce and characterize sensors more accurately and consistently than Dexcom.
  • the fact that the Libre is able to deliver a decent raw value, from close to off the shelf TI components,  almost every minute while Dexcom needs 5 minutes to deliver a secondary raw value that it does not always characterize accurately (jumpy secondary raw data marked as clean)
  • the fact that the Libre, in our experience, shows almost no drift until around the 12th day of its wear period
seem to indicate that, at the core, they have a much better technology.

In the world of my dreams, Abbott would have a very good customer service, would tell customers that it steals their data upfront, would have no supply chain issues and would provide more user friendly remote data transmission.

About once a month, the conspiracy theorist in me wonders what exactly has been agreed between Dexcom and Abbott when they dropped their mutual lawsuits just before the Libre hit the market...

Even if Abbott is prevented from going full CGM by itself by some agreement, I still hope that when the supply chain issues are resolved, some slight Libre hardware modification could increase the transmission range somewhat in a way that could be practically exploited by the community to develop what Abbott doesn't want or can't put on the market...

Additional Data (and minimal comments)

 

Desync is equal to:   0:02:55
(clocks were desynchronized at start, packets were realigned)
New start after resyncG4 Non AP   :   2015-10-30 19:49:41
G4    505        :   2015-10-30 19:49:41
(lets force synchronization)
New 505 end after period selection :     2015-11-06 17:43:58New nAP end after period selection:     2015-11-06 17:44:19
(sensor was stopped a bit early for a more convenient restart/reinsertion)
WTF, out of sync by 0:00:21
(receiver internal clocks drifted by 21 seconds over seven days - resynchronizing by 3 secs per day)
(a better approach would probably be to use identical time buckets, on the to-do list as impact is minimal)
Length of period 9954 mins - should have 1990 packets
Length of nAP 1939
Length of 505 1945
(nAP lost 51 packets, mostly water submersion during bath)
(505 lost 45 packets, same reason, no statistical difference)

Start Time: 2015-10-30 19:49:41
Correlation: (0.97559478434207947, 0.0)
505 Mean 106.337201125
G4 Mean 107.807866184
(mean values for the period almost identical, so are SD and other indices, not shown here)

(best correlation data shown below - the differences were so small that I switched to a 30 seconds resolution for the virtual sensor)
Shifting G4 non AP data by 0.5 minutes
Correlation: 0.97679314605
Shifting G4 non AP data by 1.0 minutes
Correlation: 0.9775384715
Shifting G4 non AP data by 1.5 minutes
Correlation: 0.978129017586
Shifting G4 non AP data by 2.0 minutes
Correlation: 0.978566125344
Shifting G4 non AP data by 2.5 minutes
Correlation: 0.978851137811
Shifting G4 non AP data by 3.0 minutes
Correlation: 0.978985400009
Shifting G4 non AP data by 3.5 minutes
Correlation: 0.978970258929
Shifting G4 non AP data by 4.0 minutes
Correlation: 0.978807063511
Shifting G4 non AP data by 4.5 minutes
Correlation: 0.978497164627
Shifting G4 non AP data by 5.0 minutes
Correlation: 0.978041884366
Shifting G4 non AP data by 5.5 minutes
Correlation: 0.977444434606
Shifting G4 non AP data by 6.0 minutes
Correlation: 0.97670490467
Shifting G4 non AP data by 6.5 minutes
Correlation: 0.975823208612
Shifting G4 non AP data by 7.0 minutes
Correlation: 0.974799259945
Shifting G4 non AP data by 7.5 minutes
Correlation: 0.97363297164
Shifting G4 non AP data by 8.0 minutes
Correlation: 0.972324256124
Shifting G4 non AP data by 8.5 minutes
Correlation: 0.970873025277
Maximum correlation  0.978985400009 with delay  3.0 mins