Wednesday, August 2, 2017

Clean, but shorter, Dexcom G4 (505) Freestyle Libre comparison

Since my previous post has triggered a few private reactions. Here’s another comparison on a fairly standard situation, with clean data: clocks are in perfect synchronisation, there are climbs (pre-game carb loading) and falls, including a severe low (delayed hypo).

On the left, the data as downloaded. On the right, the data shifted for the best correlation (which basically means that the Dexcom data is rolled back in time to erase the delay). That post-mortem analysis is both realistic and a bit unfair to the Dexcom. Realistic because the Libre raw data matches historical data quite well. A bit unfair because the Libre only provides delayed and adjusted historical data. Adjusted relative to what? The spot checks. As I have shown many times on this blog, spot checks are typically even faster than the Dexcom in practice, with the drawback that they are really inaccurate at times, especially on the high side.

lookingatsensors

In this case, the best correlation is found with a shift of 5-6 minutes (Libre ahead of the Dexcom by 5-6 minutes). This is fairly typical of what we see with the Libre vs the 505, when everything works well for both sensors. That’s the tricky part in practice of course: adhesion issues, desynchronisation between insertions (ie comparing a fresh Dexcom to a Libre in its second week) all play a role.

Broadly speaking, the sensors see the same thing. The 505 data is a bit more bumpy: that is a consequence of the adaptive 505 algorithm and, of course, of the smoothing introduced by the Libre historical data.

One important point: as you can see in the left Bland Altman plot, two well working sensors can show very significant differences based on timing and rate of change.

Regardless of the absolute magnitude of the differences, a consistent behavior emerges: the Libre overshoots highs compared to the Dexcom and undershoots lows to a lesser (absolute) extent. This type of behavior could be the consequence of the calibration slope of the BGM used to calibrate the Dexcom, but we have observed the same behaviors with different BGMs (Menarini Glucomen LX, Roche Accucheck Mobile, Abbott’s Libre BGM). If you are interested in that behavior, the 2014 and 2015 posts on this blog provide additional insight.

The third screen is a log/log plot privately suggested by L. and is basically a Bland Altman on steroids that amplifies the visualization of the differences in behavior in a way that is less dependent on absolute differences. (I am sure I will be corrected if I didn’t get that right).


Beautifying the data


Now, let’s look at the old Clarke plot of the Dexcom vs the Libre. (yes, I know, Clarke plots are out of fashion, but I have had the function for ages, so why not…

First the un-shifted data plot.

beforeshift


Quite decent match, you would not have killed yourself by relying on either device.

Now, the delay corrected data plot.
aftershift

Isn’t that something? We have gained almost 8% in the A zone.

Now, this doesn’t mean anything in absolute terms. For all we know, the Dexcom could have been right and the Libre could have been overshooting. Only one thing is certain: the delay.

But this tells us something else: it is extremely easy to tweek test results to your liking. Something as simple as asking patients to tests 2 hours after a meal vs asking them to test 1.5 hours after a meal, something seemingly as innocuous as using standard meals or standard sport sessions can have a drastic impact on the numbers. In a market where T1D fanboys love to argue about the 1% MARD advantage of their sensor (while at the same time losing 10% MARD or more through home made hacks), a couple of percent of differences can mean a huge amount of good publicity…

Tuesday, August 1, 2017

Non clean Dexcom vs Libre comparison

Real life has interfered – that would probably be a good “psychological burden of chronic disease” post is I was in the mood – and, while the blog hasn’t been updated, it isn’t dead yet.
Here’s a new comparison between the Libre and the Dexcom 505. Unlike one of the previous comparison posted here, this one is utterly “unclean”. In short
  • this was a tennis tournament week, with frequent games.
  • Max forgot to scan with the Libre, or simply forgot the Libre reader. The straight green lines are those no data periods.
  • both sensors were on the arms: we experienced several adhesion issues and patched as we went.
  • variability is much higher than usual because we were “pre-loading” a bit for games (not very useful, but better than starting too low anyway) and experienced severe delayed hypos on a couple of occasions, despite minimal levemir doses (5U / 24 hours)
In other words, ultra messy real life…

ERRATUM: G4 505 vs Libre - legend copy paste error. Thanks to KS for spotting it.
image
While I would not draw too many conclusions out of such an awful data set, some comments

It is good to have backup. We lost a Dexcom sensor almost at once (not shown here) and the Libre started dangling after a few days. Interestingly, the Libre started to read a bit too low and sensing delay increased a lot. The yellow marker on the above chart marks the near sensor loss moment. When Max noticed (or paid attention), we used a bit of opsite to stabilize the sensor and normal operation resumed.

The Libre remains, in general, faster than the Dexcom 505 algorithm, and even more so if one looks at spot checks (with the draback that those can be off when the trend changes suddenly). We now have a year or so of side by side data and experience and the result is always the same. Yes, on occasions the Dexcom will pick up a trend before the Libre does (as reflected in historical data) but I don’t remember seeing it picking up a trend before Libre spot checks. Depending on the data set, the optimal correlation between the two signals consistently gives a 6 to 10 minutes advantage to the Libre.

Note: I am not really that interested in collecting additional very clean data. In order to make a rigorous comparison, we need to sync the device clocks on a regular basis, we need precise reference points such as “timecode” BG tests, we need mechanically stable sensors, reminders to scan at least once every 8 hours, etc… All of this adds to the management burden of a teen T1D and that is something I don’t really need.

In practice, that speed advantage needs to be taken with some caution:
  • the Libre historical data is computed and corrected a posteriori (as shown here). It is not useful in real time.
  • the Libre spot checks are typically faster than historical data, but the delay compensation (combined to the so-so temperature compensation) often introduces overshoots.
Still, the Libre remains our favorite sensor for sports.
Excluding the excursions introduced by the interpolation, the Bland Altman plot is relatively flat. Still I wouldn’t draw any conclusion in terms of absolute slopes/biases because the G4 505 depends to a large extent on the calibration it receives (the nasty non linearity of the original G4 has been reduced in the current sensors/algorithm combo).

I realize quite a few issues I addressed here need a more detailed discussion, more data and detailed examples. Please treat this post as a simple keep-alive ping.

Thursday, May 11, 2017

Just a "standard" situation...

For some reason, even though we have a fairly strict rotation routine when it comes to Max's Levemir injection, we are now often confronted to frequent situations where the slow acting insulin seems to fail to act... I do not have a clear explanation for that: Max doesn't seem to skip his injection and there's no site/situation/meal/physical activity that I can correlate the rises with.

Anyway, here's such a situation, but also an illustration of many of the practical issues we face.




























green segment: flattish around 100 mg/dl with a couple of mild compressions, no big deal.

By the way, a word about compressions: I often read very specific descriptions of compressions (transient sensor attenuations) in the T1D forums and groups. The compression should be abrupt, deep, and should end with a rebound. That is partly true: a major compression may indeed so unfold. But in practice, the compressions we detect and visually confirm can take almost any form. They can be partial, lead to fairly minor atenuations with no rebounds. They can be masked, as it is almost the case here, by a simultaneous increase. Be open: observe and learn: you may encounter compression lows, but also compression steady states or even compression highs (where the compression attenuates the ongoing rise)

third compression: that one is a major PITA. While it is detected, it masks - in a plausible way the rise that is happening at that moment.

compression exit: the trend starts to appear. But we need a few packets to make sure it is not one of those post-compression rebounds we see now and then. Unfortunately, another mild compression confuses the situation even more (and at that point, the compression detection algorithm, lacking a clear trend, has given up).

correction: the trend is now clear. Since we have seen such situation get out of hand quickly, the time has come for a quick Libre and blood check (see below): the Libre reports 230 mg/dl. The Roche Accu-check reports 225 mg/dl. The Dexcom still lingers at 160 mg/dl, one arrow up.

effect: as expected, around 6 packets later, the correction effect shows up.

Here's what the BG Meter and the Libre showed. Disregard time differences: both the BGM and the Libre are still running on winter time and both have drifting clocks. The actual time is 01:20 for everything.


A couple of comments on the sensors and accuracy.
  • the dexcom is running the G4 share 505 algorithm. The sensor is 5 days old.
  • the dexcom has been calibrated with the Roche Accu-Check BGM used here.
  • the dexcom is on the right arm.


  • the Libre is on day 12 of its life cycle.
  • that particular Libre sensor has been eerily accurate through the session.
  • the Libre is on the left arm.


I could be tempted to blame the Dexcom and praise the Libre and, to be honest, to some extent, I do.

However


  • this is the ideal situation for the Libre "delay compensation" algorithm. None of the fancy factors where it goes a bit crazy are present.
  • the Libre hasn't been compressed.
  • this Libre sensor has been noticeably better than average (MARD of 5% vs Accu-Check over the whole period, but not enough data to be statistically significant). 
  • that Dexcom sensor has been underperforming a bit for reasons that I can't be certain of.


And what about the correction?

I hit hard. Very hard. Based on our experience, when the Levemir injection seems to fail, EGP can spiral out of control (we did get our first even 400 mg/dl on such an occasion). I used about 2.5 times more insulin that I would use to correct that trend in daytime.

There's always a bit of anxiety when using such a relatively high dose (8U) in the middle of the night. I do want to avoid the yo-yo situation where I have to correct a low later. And, at first, the huge drop after the plateau isn't reassuring. What is the fall accelerates? That is always a question that lingers.

As it turns out "insulin resistance", or EGP, or a mix of both is so high in those circumstances that the situation should evolve well. But that is an opinion based on our fuzzy experience and gut feeling, not a computable one, if only because the previous nights were OK and we have no definite idea about the current insulin sensitivity level.

As you can see, the trend settles quickly.

And even if I am usually very confident with my decisions, I will lose a few hours of sleep, keeping an eye on the situation just in case... and write this blog post to kill time.

Sunday, February 19, 2017

“Zero Carb” day on a non T1D person

I have already posted a few non diabetic CGM/FGM response patterns to food and exercise and even made my 2014 complete 14 days run available on this blog. In this very quick post, I will simply share the result of a full “strict zero carb” day on a non diabetic (your servant, now almost 54yo). A few BGM test strips were spent to ensure the CGM/FGM was working perfectly. The minimum of 62 mg/dL probably wasn't reached and came in what definitely looks like a prolonged compression.image
I felt a bit dizzy around 15:00. My urinary ketones were positive at 16:00

I will stubbornly avoid discussing my opinions on that type of diet, short term or long term, in adults or kids. A comprehensive review of its issues and merits can be found here (pdf) on the paleomom blog

Tuesday, February 7, 2017

Libre Clinical Study and discussion


The blog has slowed to a crawl, I apologize. The reasons behind my relative silence are


  1. Max has reached the tender age of 16. That means that teen issues and behaviors have become more common, impacting his control and our mood. I believe every T1D or T1D caregiver can relate to that situation which means I will leave it at that. Our latest HbA1c, a week ago, was still 5.5% but I believe this will be one of the last time we’ll see values below 6%. I will try not to despair, as there definitely are trade-offs one must accept if a kid is to have a semi-normal adolescence.
  2. We are going through an extensive remodeling of our environment and that takes time.
  3. rant alert
    Finally, as much as I hate to write this, I have lost interest in most open-source, community driven projects. I think I need to qualify that statement a bit before I get a lot of flak. As far as making data accessible everywhere and anywhere, I am still extremely grateful to the community as a whole, and especially the core members of the Nightscout project who made that data conveniently and cheaply available. The open source, or semi-open source community is great at developing features that actual T1D and T1D caregivers need or want. What really deeply annoys me, however, is how little attention is paid the the delivery of accurate results. Adding a new display device, check. Adding new minor features or screens, check. Accuracy, not so much. Assuming one want to deliver accurate results from raw data, there is a bit more to it than jumping from a single point calibration to another, or calculating an arbitrarily constrained slope. Occasionally, two open or semi-open source solutions are compared: they show a 50 mg/dL difference, eventually absurdly amplified by the lever effect of a bad slope, devices are rebooted, restarted and the community moves on. That is not to say that I would or that I privately do better, at least in a way that is applicable to a general population but that is precisely because I am aware of the potential issues that I decided not to inflict my experiments on innocent bystanders. On top of that, in the Libre world, the “semi-open source” approach, consisting of an incomplete github source dump that often misses all the computation parts, irritates me. Don’t think for a minute that those effectively closed source solutions are hiding some miraculous sauce: they aren’t.  The reason for the omission is often that they simply want to hide how they turn a very nice sensor like the Libre into something that behaves and performs like a second generation Medtronic sensor…
    end rant alert


The study


Let’s now have a look at the study, recently published in the British Medical Journal, “An alternative sensor-based method for glucose monitoring in children and young people with diabetes.” which you can download here.



The work was sponsored by Abbott: they were involved in the planning, the funding and the provision of devices used in the study. Except from the possible cherry picking of sensors used in the study, slight cherry picking of the competitors studies cited I did not spot any obvious red light. The population studied was a set of 4-17 yo children and teens that, according to the additional data (for example 7.6% mean HbA1c) seems to be a bit better controlled than the average population since 75% of that normal population does not meet the 7.5% target. Such a small bias may have had some impact on the study (more on this below) but it is probably because the authors of the study deliver better than average care. 

The conclusion of the study were, in short, MARD vs SMBG (capillary) 13.9% in that population (vs a previous 11.4% in a previous adult study) and 99.4% in the AB zone of the CEG. That is in line with the reported accuracy of the Dexcom G4 505 in some studies, although Dexcom likes to focus on its best study exclusively. 

The general conclusion was that the device could be trusted, was well accepted and, usual scientific caveat, could be beneficial long term. Well, there is nothing groundbreaking here, we all knew that, didn’t we? The benefit of that study is to be found elsewhere: respected researchers and clinicians, a fair number of cutaneous adverse effects (unlike in some previous studies), a protocol that does not smell of manipulation – that will drive acceptance and adds argument for funding and full coverage. 

Some personal comments




We do consistently get better accuracy than what the study reported on average. This is probably attributable to the fact that our “bad” weeks were 80% in range, our “good“ weeks were 90% in range while the population studied only stayed 50% in range. Incidentally, as a non T1D, when I ran sensors we had purchased in France on myself, I stayed at an 8% MARD for 12 days before the sensor started to drift. Variability, and the more frequent and usually rapid change of range it implies, definitely affect the CGM accuracy numbers.

The “acceptance” part of the study is very positive for Abbott. Again, we all know that. In fact, despite the overwhelming satisfaction expressed by the participants in the study, I believe the benefits to be understated. I always come back to our tennis experience on that issue: being able to play a full tennis tournament on a single daily SMBG check (as opposed to 10 to 15 checks per match) was just amazing. This was due both on the general accuracy of the device but also on its delay which was, in our carefully documented experience, 9 minutes shorter than the Dexcom G4 delay. For us, the Libre wasn’t merely a well accepted replacement, it changed our experience of T1D for the better.

On the delay side, the authors of the paper note “no delay”. This is really where I want to nitpick a bit. There definitely is a delay (quite visible in RAW data at stable temperature). It is simply partially compensated and partially obfuscated by the behavior of the Abbott’s algorithm.

It is extremely visible in chart B of the paper
image

as you can see, the sensor is – on average, note this is MRD not MARD – essentially perfect in stable or near stable conditions. The most significant relative differences occur in dynamic conditions and in the same direction.

In other words, when you are falling quickly, the Libre trails the fall and reads higher (probably missing some hypos), almost as a non delay compensated CGM would do. When you are rising quickly, the Libre leads and overshoots the rise (overestimating some hypers).
This is a behavior we noticed immediately (see herehere and here for some of our 2014 reports) and have consistently observed since.

I believe, just as I believed in 2014 that this is mostly the result of the Abbott delay compensation algorithm. It is not necessarily a failure of the algorithm (although looking at the raw data is appears it could be improved) but possibly a conscious decision by Abbott, either based on a technical issue such as an eventual lower signal to noise ratio in low ranges, or based on physiological issues they have identified in the BG to IG dynamics on falls. 

I am of course quite happy and a bit proud to have identified the issue in 2014 , while remaining aware our test population was n=2.

One last point on the delay issue is that the authors noted that the granularity of their time measurement was 5 mins. Timing issues are really critical as far as delay computations are concerned, which is why when we tested SMBGs vs Libre we always used immediate spot checks (because that is what matters to the patient) and I had to programmatically resynchronize the clocks on each checks (both the Libre and our BG Meter had drifting internal clocks). I used the same constant resynchronization technique with the Libre vs Dexcom comparison in order to maximize accuracy. Ballpark figures give a 15 minutes delay on the Dexcom G4, with a 9 minutes advantage on the Libre you end up with a six minutes average delay for the Libre vs SMBG (confirmed by our Libre vs SMBG tests in slow rises and slow drops), which would be hard to demonstrate with a 5 minutes granularity, especially if the comparison is not versus spot checks but versus inferred values from the 15 minutes averages.

Last comment: in absolute terms, you should keep in mind that the MARD given in that paper is most probably Libre CGM vs Libre BGM (or other Abbot BGM) and might be a bit biased as the same fundamental decisions have obviously driven the design of both devices. I do like that bias myself as the use of different BG meters would have muddied the algorithmic issue even further and would probably have required a set of Bland Altman plots to debias/detrend the data.

Apologies if I sound obsessed by speed issues, but as far as we are concerned, that was and probably remains (until a full CGM is available one way or another) the defining advantage of the Libre versus the Dexcom G4 or Dexcom G4 505.

Monday, January 9, 2017

It’s always compression, duh!

Let’s be honest, I’ve been waiting for a moment like this one, where algorithms trump my quick visual assessment.

Here’s the situation on a relatively standard scale: can you spot anything? Don't cheat and look below.

compression1

21:00 Max starts climbing slowly. Because he had a small hypo earlier, the light evening meal, mostly protein (salmon) he took starts showing up.
21:30 recalibration, the Dexcom was a tad too high, the Libre (on the other arm) was spot on at 108 mg/dL. The decision is made to take no action because we know that the 20:30 will start pushing BG back down a bit roughly 3-4 hours after injection (20:15 in this case).
Sure enough, we seem to be on a small downtrend starting around 23:15, for a 155 mg/dL high. Yes, that is not ideal, but at some point we have to consider the trade-off between undisturbed sleep and perfect BG. Today, undisturbed sleep was the intent.
At first sight, this slope still looks like a mild downtrend, with a bit of noise.
However, this is what I get in another view: my compression detection algorithm has triggered!

compression2

Interesting… Time to look at the decision parameters

Parenthesis: my compression algorithm isn’t wildly different from what has been published in the literature. I developed it independently in 2015, as a toy project. In a nutshell, the algorithms examines the last few hours available (at least an hour although I can fine tune the parameters conveniently), assesses noise, overall trend and builds “confidence” on those values. It’s a bit of a cookbook of hacks and rules. For example, the SD a detrended trend gives a good indicator of the current “meta” noise level in the signal: a drop caused by a compression should, obviously, by more important than the SD of that detrended signal by some factor (one that I tuned based on experience and visual assessment). It also goes without saying that the delta must be negative. On top of that, a few rules have been added here and there for experimentally observed special cases.

At that 00:53, the new value enters the “hour buffer” which happens to have an extremely high level of confidence. Note that the algorithm did not have that level of confidence at 23:23, post peak, where the hourly trend was less clear and a bit of noise (maybe a transient compression) pushed the detrended SD a bit higher.

That being said, the case isn’t settled at this point and I decide to zoom on the chart and go up and check. Max is indeed leaning on his Dexcom, and not leaning on his Libre. The Dexcom, which had been tracking the Libre after recalibration is actually 10 points below.

The acid test is of course to move Max a bit from leaning on its Dexcom to not leaning on either devices. Here is a zoom on what happened: the very mild compression recovered almost immediately.


compression3

  • Instead of being slightly down, the trend is actually either stable of very slightly trending up. Knowing this allows me to push a 1 U correction (being extremely conservative here in order to avoid any hypo risk.
  • the scale at which we are looking at our CGM signal impacts our perception and our assessment of a situation. (which is one of the reason I developed my own “in-the-cloud” visualization, which I can tweak and zoom to my liking)
At this point, I can already hear the dissenters saying “How in the world can you tell it is a compression on such a small variation ? My Dex can be off by xx points or jump around”

Good question: let’s answer this methodically
  • the custom “artificially intelligent” algorithm says so Winking smile
  • the Libre says there was no drop.
  • I have confirmed visually that Max was sleeping on its Dexcom side.
  • I have confirmed that, by relieving the compression, the sensor recovers as expected and resumes cruising.
  • yes, I have no idea that the “real” level is actually 137, 127 or 147 mg/dL but it does not matter: the relative change does matter.
  • yes, there are situations where the Dexcom is too noisy, the trend is unclear and the decision is ambiguous, if possible at all.
but when the Dexcom (or Libre) is tracking smoothly, there is very little variation in the signal, or detrended signal if in a clear trend. That is that consistency, when the signal is good, that allows Dexcom executives to claim their technology is already much better than BG Meters. That is a statement I can totally agree with… until real life interferes (micro traumas, compressions, failing sensor, encapsulation…) and, of course, except for the fact that the baseline Dexcom values depend to a large extent of the performance of your BG Meter.

Anyway, this blog post is almost live because this must be the first time an algorithm tells me something I may have not noticed. 3 years ago I had a quick look at Neural Networks and AI but, while I got them to tell me interesting things and issue decent predictions, they never told me anything I wouldn’t have noticed or predicted by myself. That one is a first!

Ah, and one more thing – let me reassure all the hydrophiles out there, no glass of water was harmed in this experiment.

Wednesday, January 4, 2017

Pre-bolus: rationale and examples

 

I am always surprised at the number of patients or caregivers who are either unfamiliar with or afraid of the “pre-bolus” technique. “Pre-bolusing” means injecting insulin 25, 20, 15, 10 minutes before a meal rich in carbohydrates. While I am sure this post will be very basic for most of the readers of this blog, I feel it could be useful for occasional readers.

What is the reason behind pre-bolusing?

It is very simple. Ideally, you want your insulin injection to match as closely as it is possible the insulin secretion of a non T1D person. Unfortunately, this is not possible with insulin that is injected subcutaneously. When you are not diabetic, a meal will trigger an immediate insulin secretion in the bloodstream (in some studies, the mere thought of a meal was enough to trigger an insulin secretion).  As a T1D, the insulin you inject (or push through your pump) lands in the peripheral subcutaneous tissue and needs to be picked up. That takes a while. This is not speculation, not something “one needs to consider” – it is a fact and it has been extensively studied.

In the graph below, you can see the three essential differences between a physiological response and an injection.  (source: recovered data from https://www.ncbi.nlm.nih.gov/pubmed/26041603) Note: these activity curves are always a bit approximate in terms of absolute activity as, even in healthy volunteers, clamp studies area bit imprecise. Model curves don’t take some practical parameters into accounts. What matters is the notion of delay (due to absorption and transport) of the injected insulin peak and, later, the residual tail which is often ignored.

After an injection, even if you have matched your insulin dose and meal perfectly

  • you start by having a relative lack of insulin
  • after an hour or so and for the next two hours, you have a relative excess of insulin
  • your short acting insulin typically has a longer “tail” than a physiological response
    insulin-vs-bolus

And here is a video showing what happens when the timing of the insulin injection is adjusted

Effect of pre-bolusing on relative insulin activity.

Here are a few real life examples, all of them are high carbs breakfasts.

Injection at the start of the meal: the relative lack of insulin at the start of the digestion process leads to an excessive peak at 200 mg/dL. The relative excess of insulin after the meal has been digested slowly but surely leads to an hypo that will require a correction.

nopreinjection

 

Late bolus: in this situation, the injection came during the meal (as in “f***, I forgot my insulin”). The relative lack of insulin leads to a higher peak, that is to be expected. But the late relative excess is also more pronounced. It leads even more quickly to a potentially severe hypo that requires a couple of corrections.

delayed

Pre bolus: here, the insulin was taken some 10 minutes before the meal. The difference is drastic. The initial peak is greatly reduced and the relative excess of insulin has a minor impact. Yes, the timing possibly could have been a bit better, maybe 15 minutes. And, yes, there is still what could be considered a mild hypo. But in this case, a couple of dextrose tablets was all we needed to get back on track.

almost

20 mins prebolus:  In this case, Max woke up a bit late (holidays…) with a dawn phenomenon already significant. This gave us the opportunity for a longer wait. It ended up so well that the insulin action tail (and another small prebolus) took care of the light 2PM lunch.

prebolgood

Important note: we typically don’t prebolus if we are trending down or already below 80 mg/dL. We obviously don’t want any additional insulin activity pre-meal in those cases. Common sense, as usual, applies.