Saturday, February 28, 2015

Dexcom G4 (non AP) - calibration, simplified tech explanation and consequences

Puzzled by your Dexcom?

The Dexcom sensor can be infuriating at times, especially during its first days. In this post, I will try to explain simply the reasons why it may seem to behave in such unpredictable ways. I will (hopefully) reach that goal by using a realistic, but simplified, calibration mechanism example that is too not far from the real calibration algorithm used in the non AP G4.

Quick reminder.


The sensor wire measures the interstitial glucose through a chemical reaction (glucose deshydrogenase) that produces electrical current. That current is measured in a complex process that results in a summary value that we usually call "raw value". It is a numerical value that is supposed to be directly proportional (ok, strictly speaking it is not because there is an intercept/offset but it doesn't really matter) to the interstitial tissue glucose concentration. In this example, we will assume the following

  • we have constant magical access to the real interstitial fluid glucose concentration.
  • our sensing wire is perfect and should always return 650 "ticks" per mg/dL of glucose over the whole range.
  • when exposed to a zero glucose concentration, our sensor will return a value of 30000 "ticks" - that is the initial offset/intercept
And make the following simplifications
  • the insertion wound has no impact.
  • the environmental conditions will not change.
  • our sensor will not age.

This will help us visualize what may happen, why it happens and what we can do to avoid it

Meet our ideal sensor



Our ideal sensor responds perfectly and linearly to the Interstitial fluid glucose concentration. What's not to like? Unfortunately, that perfect view doesn't work in real life (at least with the Dexcom G4) and we have to calibrate it...

First calibrations


At the first initial double calibration, our real interstitial glucose is 80 mg/dL, our perfect sensor returns 82000 (30000+80*650) and our meter returned 65 and 75 mg/dL, which average nicely to 70 mg/dL. Our Dexcom has been told the value it is reading is 70 mg/dL and it recalculates the number of ticks per mg/dL it should use: 743 ticks per mg/dL of glucose (instead of the correct 650 mg/dL). A bit later, the meter returns 155 mg/dL (for a real 150 mg/dL and a real 127500 ticks). The Dexcom computes a value with the scaling factor and displays 131 mg/dL. Not great, but still acceptable. We are just reading too low.

We enter our second calibration point but, unfortunately, traces of sugar on the finger lead to an artificially higher value. Instead of seeing 101500 raw as the equivalent of 110 mg/dL, it is now told that it is worth 140 mg/dL and duly calculates a new value for the number of ticks per mg/dL: 511.

That error starts biting at the next check, where a real IG of 215 mg/dL, seen as a reasonable 205 mg/dL by the meter, sensed as 169750 raw at the wire, is displayed as 273 mg/dL. We are now reading way too high! At that point, we are likely to hit the Internet forums with something like: "I don't understand, we were a bit low and we are now way too high!"

Note: this is a bit simplified, as the Dexcom doesn't use a single value, we'll get to that below.

Let's now quickly look at the opposite case: this time we washed our hands but we have a bit of water on the finger, which leads to a meter under-estimation.


Nothing changes except that the real 110 mg/dL is not metered at 140 mg/dL but at 80 mg/dL. We end up with a new tick value that will ultimately lead to the display of 157 mg/dL for the real 215 mg/dL value, seen as 205 by the meter. We are tempted to conclude that "this sensor is definitely running a bit low" and we make a mental note of it, no big deal, just remember. (really, do remember, you will be surprised later)

Compression

Let's now have a look at the dreaded compression event. It is a fairly typical situation. Your kid sleeps on the sensor, the area is compressed, glucose doesn't reach it, the real local IG is at 40, wich the sensor reports as LOW, alarm ring, and we blood check at 90 mg/dL. We are fed up with the beeps, fed up that the Dexcom is so wrong, and recalibrate.


That is a huge mistake, I repeat, A HUGE MISTAKE: the Dexcom is perfectly correct and consistent in its view of its limited world. By recalibrating at that point, we are feeding it utterly wrong data. Just as the Lannisters, the Dex always pays its debts, and pay we will. What we have done is force the Dexcom to recalculate his scaling factor which now stands at a paltry 289 raw ticks per mg/dL. As soon as the compression subsides, and we actually hit that perfect 90 mg/dL value, sensed perfectly at 88500, the new scaling factor leads to a display of..... 202 mg/dL! If we also have treated the false low, we could even be much higher. But even without treating, we could have gone from LOW to 200 in less than an hour while staying at 90 mg/dL all the time....

Lines, Lines, Lines


In each of the above cases, whenever we adjusted the scaling factor, we adjusted the slope of our calibration line. Our second example (post-water) is shown as the green line below. In that case, we will always read too low (until we recalibrate) because we have overestimated the scaling factor: we need higher raw values for each ideal value. The compression example is shown in red. Its effect is quite dramatic as you can see: our scaling factor is so underestimated that we will always read way too high. There is another sever adverse effect in that situation: minor changes in raw values will lead to major changes in displayed values. This will, without any doubts, make an already annoying situation, completely crazy (did I say we would pay for this?).



In fact, fortunately and unfortunately, the Dexcom G4 non-AP doesn't rely on a single value, but it does use up to 6 of the last calibrations to make its mind. Let's see what that means.

In the "post-sugar" situation, the two values we have entered define a new calibration slope, not horrible, but quite far from the ideal line. You'll note that the error introduced remain acceptable in the 50 to 150 mg/dL range and then slowly starts becoming unacceptable.



The "post water" situation may have looked more benign initially (after all, the Dexcom seemed to be just running a bit low) but is in fact much worse than the "post sugar" situation. Look at the chart. In that scenario, you could be at 300 mg/dL with a displayed value below 150 mg/dL. It does not take a PhD in rocket science to realize that this situation can be dangerous.


And finally, lets have a look at the compression event after two calibrations. The calibration slope does not make any sense. Fortunately, in most cases, the Dexcom will come to that conclusion by itself and will ask for a new calibration value or go "???".


But there is hope...

You can't be unlucky all the time. As you enter calibrations, you might be spot-on the real value, a bit above, a bit below. As the calibration builds itself, a better approximation of our ideal curve begins to appear, even if we have been unlucky to introduce a very bad value in a critical range such as the 54 instead of the real 40 here.





Based on this simplified explanation, I hope it has become a bit clearer why
  • the first day or couple of days is typically less accurate than the rest of the week.
  • we sometimes get seemingly absurd "too high - too low - too high" situations.
  • seemingly innocuous situations can suddenly turn into dangerously inaccurate ones.
  • not recognizing compression events and calibrating at that time is a killer.
One other side effect of such a calibration linear technique is that, even if we assume a perfect sensor, the difference between 40 and 70 is 19500 raw ticks. The same difference we find between 300 and 330. With, of course, a completely different clinical signification. 

Since the raw value we see are actually not pure raw values, this may have been addressed by Dexcom in its black box. If it has not, or has only been partially corrected, it is a direct explanation of the relative lack of accuracy in the low range visible on the large scale statistical analysis. I will not, however, take a firm stand on this until I have confirmed there is no correction.

In practice


In practice, for my own use, one of the tips I have developed is to sail through that stormy sea as quickly as possible. At the first hint I am not in a stable situation - or almost by default on the first day - I will enter the first 6 calibrations as quickly as it is possible, possibly more if my analysis indicates that the first 2 or 3 calibrations need to be purged (typically, 6 calibrations during the first 12 - 16 hours) and I will then smoothly sail at 1 or 2 calibrations per day.

PS: I'd like to insist once again that this is not the complete picture. The reality is more complex. Neither the sensor, nor the insertion are perfect. The precise calibration algorithm used by Dexcom in the G4 non AP is a bit different but very similar in principle. The Dexcom AP algorithm and the Dexdrip algorithm are a bit different



Thursday, February 26, 2015

Le tatouage magique: mais bien sûr...

Note: this is a French translation of a previous post.

Depuis quelques semaines, dans le monde entier, les forums dédiés au diabète s'enthousiasment au sujet du dernier buzz, le tatouage temporaire qui mesurerait la glycémie au travers de l'épiderme. Malheureusement, il semble que les diabétiques ne lisent que les gros titres... (les universités adorent faire les gros titres avec des tatouages qui mesurent la glycémie - voir ici par exemple pour une annonce de 2009 basée sur une autre technologie)

L'article originel n'est plus en accès libre, mais son abstract se trouve ici



Laissez-moi vous donner un indice sur la suite de ce post en vous montrant la partie du senseur Abbott Libre correspondant au tatouage montré sur les photos de presse. Vous voyez ces petits anneaux métalliques au centre de l'image? Et oui, comme dans le tatouage, vous avez une anode et une cathode. Il ne reste plus qu'à ajouter un petit fil sous-cutané et vous aurez également remplacé le gel qui se trouve sous le tatouage.



Comment le tatouage mesure-t-il le glucose?

Le tatouage essaie de mesurer la concentration de glucose grâce à une réaction chimique bien connue basée sur la glucose oxydase. Cette réaction est utilisée depuis au moins 1957 pour mesurer la glycémie. (ici) et est celle que votre glucomètre utilise probablement. Certains glucomètres utilisent une réaction similaire basée sur la glucose déshydrogénase (voir ici). C'est aussi la réaction exploitée par les CGMs actuels. L'oxydation de glucose génère un très petit courant électrique - un flux d'électrons - proportionnel à la quantité de glucose qu'elle oxyde. Ce flux d'électrons est mesuré par un senseur ampérométrique et finalement corrélé avec la concentration de glucose. Jusqu'ici, rien d'inhabituel: la technologie a fait ses preuves après avoir mûri pendant 50-60 ans. C'est solide, mais pas très nouveau.

D'où vient le glucose?


Quand vous mesurez votre glycémie avec un glucomètre, la réponse est évidente: la glucose-oxydase réagit avec le glucose présent dans la goutte de votre sang. Dans le cas d'un CGM, la glucose-oxydase qui couvre le fil du senseur mesure la quantité de glucose présente dans le liquide interstitiel, le liquide qui rend les humains doux et spongieux. La concentration de glucose dans le liquide interstitiel dépend directement, avec un retard de quelques minutes, de la concentration sanguine de glucose. La dynamique extrêmement complexe des transferts de glucose entre le sang et le liquide interstitiel est décrite ici. Ce processus fait l'objet de recherches intense car la bonne compréhension de cette dynamique est essentielle au développement d'un pancréas artificiel efficace.

Mais comment le tatouage obtient-il le glucose? Malheureusement pour lui, les humains ne sont pas particulièrement renommés pour leur capacité à exsuder du glucose. La molécule doit être extraite - de force du liquide interstitiel sous-cutané vers la surface de l'épiderme. Comme le rôle de la peau est précisément de maintenir les liquides et molécules essentielles à l'intérieur de notre corps, une technique assez brutale appelée iontophorèse inverse doit être utilisée pour extraire le glucose. Cette une technique plus récente, datant de 1995 au moins (lien) et qui a déjà conduit au développement, à l'approbation et à la commercialisation de produits permettant la mesure non-invasive de concentration de glucose dans les années 2000. Ces produits, dont le représentant principal était la "glucowatch", étaient lents (plusieurs minutes d'extraction), peu fiables (MARD > 30%) et provoquaient des effets secondaires cutanés importants à cause de l'intensité des courants électriques nécessaires à l'iontophorèse inverse.

Forcer une molécule de glucose à travers une membrane qui lui est normalement étanche n'est pas facile... Le communiqué de presse mentionnait ce problème, tout en expliquant que des courants moindres étaient appliqués. Cette petite précision n'était évidemment pas présente dans les gros titres.

Donc, le tatouage n'est pas vraiment révolutionnaire à ce niveau là non plus. Si l'on ajoute au retard entre le compartiment sanguin et le compartiment interstitiel celui induit par l'iontophorèse inverse, on se rend vite compte que loin de se rapprocher d'une boucle de contrôle plus rapide, nous nous en éloignons à grands pas.

Quels résultats? Quelle technique?


Il faut bien comprendre que le tatouage montré sur la photo de presse n'est que la cathode et l'anode de tout un système. Connecté à une alimentation de laboratoire, le tatouage a pris une dizaine de minutes pour extraire le glucose à la surface de la peau. Ce glucose a réagi avec un tatouage préparé avec un gel contenant de la glucose oxydase (comme attendu depuis 1957) et n'a pas réagi avec un tatouage de contrôle dépourvu de glucose-oxydase (heureusement). Le senseur ampérométrique connecté à l'anode et à la cathode du tatouage a montré la présence d'un courant électrique corrélé avec l'absorption de glucose (un soda) après un retard attendu (digestion, sang->liquide interstitiel->iontophorèse inverse). C'est tout. Comme les auteurs le notent à la fin de l'article: "il reste du travail"


J'en veux un quand même!!!


D'accord, le tatouage est si mignon que vous pourriez en vouloir un de toute façon. Avez-vous besoin d'autre chose en plus du tatouage et du gel préparé à la glucose oxydase? Oui...


  • une alimentation de laboratoire
  • un ampèremètre
  • du câble
  • un ordinateur
  • un thermomètre
  • une licence mathematica ou la compétence nécessaire pour développer vos méthodes avec numy par exemple.
  • un sens de l'observation développé pour corréler l'ampérométrie avec votre glycémie.

N'oubliez pas que les glucomètres et CGML actuels intègrent tout cela dans un ou deux centimètres cube.

Vous ferez face aux problèmes additionnels suivants (liste non limitative)


  • la peau n'est pas une constante
  • vous ajouterez systématiquement le retard de l'iontophorèse inverse aux retards déjà présents
  • votre glycémie variera pendant que vous la mesurez
  • la concentration locale de glucose variera car vous la modifiez activement
  • vous travaillerez avec des concentrations de glucose inférieures d'un facteur 100 aux concentrations invasives et serez donc en conséquence et à technologie de traitement de signal équivalente, limité intrinsèquement au niveau rapport signal bruit.
Que gagnerez-vous?


  • vous éviterez une insertion de CGM toutes les unes ou deux semaines et un ou deux tests capillaires par jour.

Si je n'ai pas été surpris par la couverture médiatique généraliste, j'ai néanmoins été déçu par les articles sur les sites dédiés au diabète. Je m'attendais à mieux. Un minimum de mise en contexte et d'esprit critique. Leur rôle n'est peut-être finalement pas d'informer, mais plutôt de nous livrer de temps en temps une petite nouvelle qui suscite l'espoir; l'espoir que "dans cinq ans, tout ira mieux". Pour le moral, l'ignorance est peut-être la meilleure des choses.

Tuesday, February 24, 2015

Dexcom G4 (non AP) - calibration in real life (part 2)

No time to read the blog post? Want a summary? Here you go: do not calibrate your Dexcom in when in low range.

After looking at the impact of blood meters consistency in the first part of this look at the Dexcom calibration issues, let's have a look at the average observer errors and their importance in different ranges. (two sets of data collection processes were run here: 1) comparing the calibration value entered with the last CGM value available in the 5 previous minutes 2) comparing the calibration value entered in the CGM with an extrapolated value based on the slope of the 3/4 previous CGM values to take into account the 15/20 mins delay - initial calibrations obviously don't have anterior CGM values, their impact was solely evaluated on the calibration chain that followed)


When all the calibration points were compared to the immediate value before calibration, I was surprised to discover the above result. On average, the Dexcom was higher by 20 mg/dl than the blood meter in the low to 60 mg/dL range and lower by 50 mg/dL in the 300-400 mg/dL range. In fact, the bias seems really linear, as if the meters and the Dexcom had been tuned to a different response slope.

The root cause could, of course, be somewhat comportemental: for example, users could tend to correct a low and then recalibrate after a while. Or act on a high and again recalibrate a bit later.

But lets have another look at the Bland - Altmann plot of the Libre vs the Dexcom (which I didn't have at the time of the Dexcom calibration test)


 You can't avoid noticing that the Libre systemic bias compared to the Dexcom (on another data set) goes in the opposite direction. In other words, the Libre is a bit more trigger happy than the Dexcom in lows and a lot more trigger happy in high ranges. And that happens to be the exact behavior we observe when we compare lots of independent calibration blood meter values and the Dexcom data at that point.

Now, let's look at the error in terms of percentage of error. This can be deduced from the first chart but it doesn't hurt to visualize it.

At 20 mg/dL difference is, relatively to a 50 mg/dl value a 40% difference, while a 60 mg/dL difference on a 300 mg/dL value is only 20%....

In low ranges, whether for intrinsic technical reasons or for behavioral reasons, the Dexcom is much less accurate than it is in physiological or high ranges. Core technical reasons could be similar to what Emil Martinec described in his Noise, Dynamic Range and Exposure paper (a very good read). There are dozens of possible behavioral reasons. But, in our daily lives, we don't care about the real primary cause, the practical result is the same.

Yet another view of the issue can be found below.

What if we look at the MARD of all individual data sets data sets and see if it is correlated with the frequency of calibrations in a certain range. We discover that, the frequency at which you calibrate in low range is quite positively correlated with the magnitude of the average error of the data set and that the frequency of calibrations in physiological range is negatively correlated with the magnitude of the error. In other words, the more you calibrate in normal range, the more accurate your global CGM session will be.

After looking at the data, my work hypothesis became "low calibrations are worse than about anything else"

Experiments

To test that hypotheses, I decided to run a few experiments. While I could post dozens of similar examples, here are two examples taken from my own pre-experiment Dexcom traces.

 Here is a not so ideal double initial calibration in high range. The consequences aren't too bad and the Dexcom tracks the following calibrations decently. (I have multiple similar examples)



Here is an extremely accurate initial calibration in low range (that one was triple checked actually). The error is extremely significant on the next calibration and still significant the next evening (I'll get back to why this happens later)



 And here are a few random samples extracted from the global data set. Look at example 1 and 5 for large errors. (EDIT - CHART ONE DUPLICATE OF ABOVE - IMAGE UPLOAD ERROR - WILL FIX)


Conclusions

The main lesson for me here has been to avoid low calibrations like the plague, especially low double initial calibrations.

This is not a revelation, the typical calibration guidelines that could be summarized as "calibrate when stable and in range" implicitly tell you not to calibrate when you are not in range. Stick with that simple rule, and you will be fine. However, that rule does imply that being out of range or unstable is equally bad, wherever you stand. The data shows that this is not the case....

If I have to choose between calibrating stable in low or calibrating unstable in high, I definitely know what I will do...


In one of the next posts, I will look at the effect of the rate of change on the calibration accuracy. Much to my surprise, it had much less impact than what I would have thought.

Additional notes:


I am aware that I could go in more details about the numbers analyzed and the statistics applied. I have a ton of spreadsheets and "pickles' of the data, ANOVA, MANOVA, etc... but the goal of this experiment was to tackle the issue from a practical point of view, not to publish a scientific paper.

I am also aware that there is a certain circularity in removing inaccurate low calibrations with large errors in order to obtain a more "correct" global file. I am not solving the issue, I am just avoiding it so it does not impact ulterior accuracy. As long as we need to calibrate CGMs, there should be a standard method to evaluate the impact of calibration (in)accuracy on the data stream that follows. I suspect some artificial pancreas teams are looking into this and I really hope they do: user calibration will be, I believe, a significant obstacle on the road to an autonomous AP.


Wednesday, February 18, 2015

Another view of the Freestyle Libre vs Dexcom comparison

Here is another view of the Freestyle Libre vs Dexcom G4 non AP comparison.

The first graph shows the 8000 min comparison run previously described.

The second graph shows shows the ideal time shift determination by optimal correlation (see here) that corroborates the impression that the Libre is faster than the Dexcom non AP algorithm and puts a value of 9 to 10 minutes on that reaction speed difference.


The third chart shows the Bland - Altman plot (see here and here) for that period, with the Libre as S1 and the Dexcom G4 as S2, time shifted by 9 mins for optimal correlation. I used Bland Altman plots in me Dexcom real-life calibration analysis project before but have not posted any of them as they are possibly a bit harder to immediately understand than glucose as a function of time charts, numbers such as MARD and SD or Clarke plots.

The general idea behind a Bland Altman plot is to compare the results of two unreliable measurement techniques by plotting their differences as a function of the average of the measured values. In this case, that plot confirms the subjective impressions which were.

  • The Libre is more "trigger happy" than the Dexcom, especially when IG is rising quickly, a bit less so when falling. A least-squares regression line clearly shows the different characters of the sensors. The Dexcom G4 non AP and the Libre algorithm clearly implement a different philosophy: the Libre relies more often on its raw data and is more aggressive in extrapolating from it.
  • Most of the results fall in the +/- 1 SD band, which confirms that the differences between their long term views weren't clinically significant. The short term usefulness is another story: catching highs or lows earlier definitely has benefits.
  • Interestingly, the mean of differences for that period is about 11 mg/dL (Libre: 106.68 - Dexcom 95.49) which is typically what we observe, on average, when comparing with BG meter tests. Our latest HbA1c (hospital lab blood test) turned out to be 5.2%, a value that correlates well with the 106.68 mg/dL Libre result whereas the Dexcom 95.49 mg/dL should have resulted in a value of 4.9 to 5.0. 
This being said, let's not read too much in those values as the differences are clinically very minor and within the range of acceptable laboratory errors.

One last comment: whenever I publish a post that can be interpreted as saying that the Libre is better and faster, I receive at least one comment implying that I am an Abbott shill. Of course, whenever I am negative about Abbott on the spyware issue, some think I must be a Dexcom stooge.

Let me re-state that I am totally independent and self funded, not even covered by social security. We pay for our sensors and haven't received any gifts or review units.









Monday, February 16, 2015

Dexcom G4 (non AP) - calibration in real life (part 1)

Intro


Sometimes, I am a bit of a maniac. At the beginning of the 2014 Summer, the constant stream of posts about the accuracy of the Dexcom G4 systems and its calibration strategies started to irritate me. Some users were reporting that their Dexcom was always "spot-on", which was of course impossible, if only because of the blood glucose to interstitial glucose delay. Other users were reporting huge errors and complaining loudly. What annoyed me was not that users seemed to have wildly different experiences but rather the utter lack of rationality in the reporting and testing.

With the help of some of the nicest people in the "CGM in the Cloud" Facebook group who were kind enough to share their data with me, I decided to dig a bit deeper. My goal was to develop somewhat substantiated hypotheses and test them. During that test phase, Dexcom released a software upgrade that, to some extent, changed the landscape. For this reason and to avoid adding confusion in the mind of users, I decided not to release my findings. However, I have now learned that non-US users and pediatric US users will probably not be getting the 505 upgrade. This is why I am finally publishing my results.

The method


I used anonymized user submitted Dexcom exports that provided me with CGM values and calibration points. Ideally I would have liked to use BG meter values that were not entered as calibration as well (as I do on my own data) but that introduces complex time synchronization issues if the date and time of the meters do not match the Dexcom's time, are changed during the period, etc.

I received different amounts of data: some users sent a year, some users sent a month. In some cases, I used random selection to prevent a long set of results having the same impact as twelve other sets of results. In other cases, when it was interesting to run some tests on larger amounts of data, I adjusted the impact of larger data sets (for example, a user sending 12 months would have its stats calculated over the whole period then their impact on the total data divided by 12 if compared, summed or averaged with other users sending only one month of data)

I mostly used established libraries (Scipy, Numpy, Pandas, Statsmodel...) and custom code. In order to detect bugs in the data handling, I compared all basic data sets computed characteristics with the values calculated by Dexcom's own software.

I advanced step by step, zooming on issues that appeared interesting. I could have missed some. And there could be a selection bias because of that. You have been warned...

Finally, I tested my hypotheses new data that I acquired after the test.

What you'll see here is a "best attempt" at getting a better understanding of the Dexcom G4 calibration behavior in real life. While it has not been executed with extreme scientific rigor, it is certainly better than anecdotal evidence.

Let's begin with the BG Meter

In real life, users don't have the option of checking their results with a glucoscout or YSI. The BG meter is the only tool available for calibration and accuracy checks. Unfortunately, we do not have the data to estimate the absolute BG meter accuracy. But, with the initial double calibrations, we do have enough data to estimate their precision (how consistent they are) and see if has an impact on the subsequent correlation with the CGM data stream.

Here is the result of that analysis on 566 double calibrations

  • On average (whole data set - y axis), the consistency of the double calibration points was good (6.25% MARD).
  • On an individual basis, using a consistent meter leads to a better correlation between all the future calibrations of that CGM run. That is, of course, a bit of an obvious circular argument: if your meter isn't precise, it is unlikely to track the CGM consistently and the CGM is less likely to behave well.
There are two clear interesting outliers (possibly 3) here, which I have marked with a blue and a red arrow.
  • The blue arrow shows that a really inconsistent BG Meter leads to a poor CGM behavior. This was caught by my initial data consistency check when data was sent. I contacted that particular user and he/she confirmed that, indeed, their BG meter had been awful. (either because it is intrinsically bad or because the procedure isn't correct). 
  • The red arrow was much more interesting: the BG meter seemed almost perfect, yet the CGM correlation was awful. What could be the reason? It turns out that more than 90% of the double calibrations introduced by that user were identical. Given the typical consistency of BG meters, this is too good to be true and indeed was: a single finger prick value was entered twice for the initial calibration.
Let's stop here for a moment: that behavior is perfectly understandable from a real life point of view. The patient might hate finger pricks, which may be one of the reasons why he/she got interested in the CGM in the first place. Why not do one and enter it twice? While the manual advises against it, it seems at first sight harmless enough if one feels the value is ok...

The problem is that the initial double calibration serves a very specific and important purpose: it increases the precision of the inherently imprecise BG meter measurement by a factor of 1.4 by diminishing the measurement noise. This is, in general, called oversampling. The complete mathematical justification of this is beyond the scope of this blog post but, if you are interested by the issue, you can start by reading the "noise" section of that wikipedia article on oversampling. Each BG meter measurement contains a part of signal and a part of noise. It is not a magic bullet: if the noise is uncorrelated (for example depends on the intrinsic imprecision of the behavior of the measuring system) you get the full benefit, in other cases it depends. If you have glucose on one finger and not on the other, the impact of the error will be reduced.


Note 1: this is probably the reason why the new Medtronic sensors have 4 sensing wires. Medtronic is clearly behind Dexcom and others in terms of the quality of its single wire CGM. Having 4 sensors improves the consistency of the signal by a factor of 2.

Note 2:  by itself, oversampling does not improve the accuracy of a measure: if you have an inaccurate measure (for example bad sensor or glucose on both fingers) you will get a better (less noisy) bad measure. Oversampling reduces the noise in the measurement which improves consistency - if your sensing process is accurate, your measure becomes more consistently accurate.

First practical conclusions
  • use a consistent BG Meter with a decent procedure (wash hands, enough blood...)
  • do not skip the initial double calibration.


Saturday, February 7, 2015

Dexcom G4: bad initial calibration and wound "oscillations"

Post Insertion Issues

I suspect most Dexcom users have faced situations like the one shown below. Let's look at it in details

A fresh sensor was inserted a little bit after 7:00PM and calibrated around 9:30PM. Unfortunately, the initial double meter calibration turn up to be a double 54 (actually a triple, we rechecked). Based on our experience and a previous analysis on additional user submitted data, I positively hate low calibrations, especially initial ones. We hate them so much that we call them the "low calibration lottery". I'll get back to that issue in a future post.


We correct the 54 with 2 or 3 dextrose tablets, aiming for 100 mg/dl. We are never very aggressive in our corrections and, in that case, the Novorapid injection is 3 hours old and shouldn't have a drastic impact anymore. For some reason, the Dex decides to start at 45 mg/dL and seems to correct itself after 30 minutes. We appear to reach the expected 100 mg/dL but then start dropping quickly. As I am about to check on and correct the alert (no obvious compression event), the Dex shoots back up. At that point, we still have a choice: either a compression event that I did not spot or the post-insertion chaotic oscillation we have often noticed during the night that follows a fresh insertion. (btw, I do not have any idea about the root cause of that oscillation - there's some micro edema and healing going on, for sure, but why does it oscillate?). Twenty minutes later, we are in insanity territory at 180 mg/dL. A blood check is obviously required as the extremely quick raise doesn't make sense. The meter gives us a mid 60ish value (consistent with a conservative correction and the tail of the insulin activity). I decide to recalibrate the Dex and it almost immediately goes in the dreaded "???" mode. Max takes another 3 dextrose tablet "better safe than sorry" correction and I double check on the meter at 1:00 AM - at this point, I do calibrate the Dex again, even if it is still in "???" mode. I know BG is not wildly fluctuating, I know Max often trends up slowly over the night. About 2 hours 30 later, the Dex picks displays a single point (according to raw data, the oscillations are decreasing in amplitude) on the way down, then goes "???" again as the raw data doesn't correlate with the trend. Finally, while it keeps oscillating, raw data stops being in profound disagreement with the trend and the sensor starts behaving as it should.

How was the following week? Quite good actually as you can see from the chart below. Days 3 to 7 were essentially perfect until, of course, the sensor was ripped off during a basket-ball game at the school... (yes, I felt a hint off despair that day)



As the weekly Clarke grid shows, the only really problematic value came from the initial oscillation.


Post-mortem

Errors we made

  • calibrating in low range. We know it is a lottery, we should NEVER do it.
  • playing basket-ball :-)

Errors we haven't made

  • over correction in the 60sh range.
  • acting on senseless CGM data
  • removing a sensor that seemed to be misbehaving badly

Notes

The Dexcom here is still running the "old" algorithm, whose calibration behavior I have analyzed extensively based on user submitted data. It probably does not apply to the new 505 algorithms.

As far as the calibration analysis is concerned, I am aware that I haven't released the full results in a single place. The reason is as follows: I examined the data, came to some (non ground breaking) conclusions and detected a few peculiar behaviors that I wanted to test experimentally with Max. When those tests were completed, as I was about to summarize them, Dexcom released the 505 updates to its US customers. The analysis I had done was potentially obsolete and I decided not to release a final report.

I have now learned that the 505 algorithm will never be released in Europe and that Dexcom users are likely to remain stuck with the old algorithm until the new Dexcom products (probably the Dexcom+Share and the Apple integration) are released. Because of that delay, I will probably post some kind of final report in the coming weeks.


Friday, February 6, 2015

Black boxes can cut both ways

In just a few years, the widespread use of the cloud has changed a lot of things: well connected people have seen their social lives change completely. Privacy has all but disappeared, We are constantly under the eye of big data black boxes that monitor, more and more invasively, every aspect of our lives. Users are usually on the receiving end of the abuse and, the sad reality is that most of them don't care. But this isn't where the "fun" ends.

Black boxes can cut both ways...


Let's imagine that company A wants to collect usage and results data from a device they just put on the market in order to demonstrate the benefits of that device, which could be useful to get the product approved by the relevant agencies. The product would be released in limited test markets and the data collection could be as discreet as possible. A very cheap large scale test. And also a mild case of "security by obscurity"

Let's imagine that company B, a competitor of company A, enjoying a near monopoly in the device category that company A is trying to enter, is extremely annoyed by that new arrival on the market. A typical delaying tactic would be to launch a bunch of lawsuits, for example for patent violation, and drag the cases for years. But that is so visible and so 20th century... (See Sanofi and the Lantus saga for example)

In our fictitious 21th century world, there could be more subtle ways to derail or delay an unwelcome competitor. Company B would, most certainly, have had a very close look at its future competitor's device. They might notice the discreet data transfer.

And decide to exploit it.

They could, for example, upload totally plausible but bogus data into the big black box and skew the results in a negative way. That would be a very smooth and subtle attack. It could apply not only to devices, but to drug trials. Company C could be sending patients home with a new class of antiantiarrhythmics and continuous ECG monitors. Blood pressure monitors could automatically upload the phase III results of a new renin inhibitor...

If they were more aggressive, they could eventually contract some unethical party just to grab the data cache and run away.

Then, if anything goes wrong, one can always blame the Chinese...

Important note: the above is absolutely fictitious. It is just one of the dozens of possible exploitation scenarios in a world where big data collides with poor or even decent IT security practices. The possibilities are endless and we don't know where we are going.

Wednesday, February 4, 2015

Thank you Jenkins!


My privacy concerns have probably been heard. I won't be unknowingly uploading our glucose history, our treatment choices, our exercise schedule to Abbott's R&D Dallas center anymore.

An update? No, a country block (but someone has forgotten to block the update server, feel free to correct that - I normally use my own software, thank you).

I'll be sure to post a notification if I travel to Germany, UK or Spain so those countries can also be blocked when I travel.

That's what I call nuking an issue...

Maybe I should open a clinic for German endocrinologists in Belgium if they don't want to upload the data and treatment of their patients?

The upload server won't be missed. If I ever become nostalgic, I can always replay my local dumps archive.

And finally, let me repeat once more that another solution, that would make a nice product even better, would be to do the natural thing: ask doctors and patients if they want to share their data instead of... hmmmm. borrowing it.

Anyway, I wanted to thank someone at A. L. and decided, not totally randomly, to thank Jenkins.



What Abbott probably should say...

There has been a lot of background activity on my posts about the Libre data uploads. On one hand, I felt it was my duty to the patient and medical community to at least inform them of what was happening. On the other hand, I never wanted to create a big scandal that could impact the availability of what I feel is an essential technology for diabetics. For this reason, I decided not to go through my IT security contact list. Neither did I seek press coverage in the way IT security experts do when, for example, they stumble on "spying smart TVs". Now things seem to be out of my hands...

Abbott has not contacted me (and, if you guys read this, there really is no need, my life is busy enough). Patients, Doctors and other interested parties have. Some reported what Abbott representatives told them or told audiences. Some shared documents they had received...

From what I have seen, read or heard, the Abbott's responses have evolved somewhat, from pure denial that anything happened (objectively false), to admission that very small Q&A technical uploads happened now and then (false, unless you consider your glucose levels, treatment, exercise, etc... to be "technical"), to kind of vaguely admitting that some uploads could indeed occur but neither confirming nor denying their extent.

At all times, the fact that everything that happened was in compliance with such and such data privacy law was stressed. While I am not a lawyer, I feel that this is a bit unconvincing.

Assume I walk into the houses of all the diabetics I know and covertly take a copy of all the notes related to their diabetes treatment, their exercise notes, their insulin use patterns, etc... I believe this can be described as an unwanted data theft.  But also assume that, once I get back home, I take the utmost care in storing your data properly and protecting it.

Does that change the nature of the initial act? I don't think so.

If I rob a bank, store the money I robbed in a "better" bank and then pay my taxes on the interest I collect, does it mean I am not guilty of theft anymore?

I don't think so. But that is only my opinion.

Coincidentally, the Nuffield Bioethics commission published this week its report on the use of health data.What do they think? Here are some of the key points they develop.

"The UK Government should introduce robust penalties, including imprisonment, for the
deliberate misuse of data, whether or not it results in demonstrable harm to individuals."

"There should be complete audit trails of everyone who has been given access to the data,and the purposes to which they have been put. These should be made available to all individuals to whom the data relate or relevant authorities in a timely fashion on request."

Ouch, that hurts!

Do I believe anyone at Abbott should go in jail? No, I don't. I believe they all should be working hard, busy preparing the next generation of products that will make the life of diabetic patients easier.

But I do believe that Abbott should simply have said something like  

"Ooops, sorry, we were in such a rush to deliver a great product that we overlooked a few things. We'll fix that as soon as possible and release an update respecting your privacy and offering you a choice. Accept our apologies." 

 Yes, definitely.

They could even have claimed it was a "bug", just as LG did when its TVs were found to spy on their users. Techies would have smiled quietly and moved on...

Monday, February 2, 2015

Some thoughts and experiments on using the Freestyle Libre as a CGM

I continue to be impressed by the Libre, even if we now have our first "bad" sensor that reports values that are 20 mg/dL too low. It is wrong - consistently so - which means that it is still useful with a bit of mental adjustment.

Running the Libre as a CGM?

A few months ago, when we started with the Libre, I was so impressed by its accuracy and reaction speed that I immediately decided I should try to turn it into a full CGM system that would replace our Dexcom. The basic idea was to put a small NFC reader in an armband for the night and collect data  every 5 minutes or so which would then be pushed to our Nightscout setup in place of the Dexcom data. As far as my personal use is concerned, I was never interested in pulling data to a phone replacing the Abbott reader. After all, having a dedicated reader has advantages for a teen. It will not, for example, run out of battery because of Clash of Clans... With that goal in mind, I started looking at the data that was stored and updated in the tag (here) and slowly plowed my way through its interpretation.

An unexpected journey

When I started, I expected to face a system roughly similar in its mechanisms to Dexcom's data collection and interpretation. While there is of course a fundamental similitude, there are also significant differences. In a typical usage scenario, the Libre beat our non-AP Dexcom algorithm by a large margin. That is what matters for T1Ds in their every day life.

But once you dig a bit deeper, you realize that this impressive practical result is not solely achieved by exploiting a sensor that would be much better than the Dexcom sensor. Abbott fully exploits the adjustment opportunities offered by the typical usage scenario, offering the best info it can summarize at any given moment (or just declining to offer it when it can't). Once I understood the broad lines, I was a bit disappointed to notice that the data marked as fully usable wasn't as frequent as the relatively smooth user experience seemed to indicate (this would probably have a negative impact on an Libre controlled AP). That also meant that even with a full understanding of the data, going CGM wasn't going to be a short or smooth ride.

The chart below (spot checks = orange circles, historical data = small red dots) clearly shows that the Libre massages/cherry picks its data quite intensively. The spot reported sustained spike isn't consistent with the historical data and the historical data at the end has not been updated even if more than 15/16 minutes have elapsed. Interestingly, in both cases, the spot checks that weren't actually taken into account in the historical data fit nicely a linear trend extrapolation. This could be a coincidence, it happens very often, but not 100% of the time.


Texas Instruments bio-sensors

At the core of the Libre NFC tag, we find a new Texas Instruments chip (the FAL...) that has been customized to Abbott's specifications. That chip, based on a MSP 430 core, includes a set of ADC, a  thermistor, a communication stack, a ROM, some SRAM and a FRAM. FRAM is an interesting type of RAM that offers a lot of the advantages of Flash memory (persistance) and of RAM (directly accessible, low power needs). If my Libre investigations had only led me to discovering the existence of that line of TI micro controllers, I would already have been happy. These small wonders will certainly lead to a lot of interesting applications in the near future.

FRL152 sample project - unrelated to the Libre


Libre Tag
Unfortunately, while the main members of the family are now well documented, Abbott uses a custom version of that controller. I am now almost 100% certain that the AL in the processor name stands for Abbott Laboratories. There are other indications that this was a custom design. For example, a program written for the FRL line typically contains a command table delimited by a start and an end marker (0xCECE in both cases) while the controller used in the Libre has a 0xABAB pattern that, once you have seen it, becomes glaringly obvious (and so do the En and An commands and their addresses). Obviously Abbott got TI to customize that for them...

FRL152 Sample project - unrelated to the Libre
The philosophy behind the micro-controller is simple and elegant. You decide what you want to sample, set up a sampling scheduler that does its job for a while at defined interval. Once the loop is complete, the values collected are posted to FRAM and wait for collection. (possible issue here for a custom scanner: check if the sampling has been completed or not. Reading a partially collected sample blindly will introduce very significant errors). The fact that FRAM doesn't require bank rewrites like flash and that it can sleep for a while is a big plus for the duration of the sampling process (TI has a sample power usage analysis here) (again, possible issue for a custom scanner, how long will a battery last if the device is constantly or very frequently active)



Temperature

The TI chip also contains a thermistor which Abbott uses to monitor the sensor temperature. A reasonable assumption here is that the glucose oxydase reaction slows as the temperature goes down and accelerates as it goes up (I am not a bio-chemist but this is what I remember in general from the mandatory courses I took in med school). In that situation, you either have to correct for the reaction speed or decide to discard the value received as not valid. We can be sure that Abbott uses the latter as many users have reported their meter refusing to display a value in either warm or cold conditions. ( a custom reader would have to detect those conditions). We can't exclude a temperature based correction either, which would eventually give up when outside its comfort zones. That could lead to interesting situations when the sensor is used in warm weather or exposed to the sun when sunbathing. Since the Libre isn't available in the Southern Hemisphere yet, we'll have to wait for the Summer to find out how big an impact it has.

Shutdowns

In addition to the temperature issue, the TI chip has the ability to shut itself down when a certain threshold has been exceeded. One of our sensors seems to have shut down itself a minute after an extremely intense physical exercise. Since extreme movement and muscular activity are known to produce triboelectric effects (see for example here for an example of skin energy scavenging) I assume that some threshold was exceeded in the TI sensing configuration and that either the sensor stopped itself or that the meter shut it down based on the report of the condition. At least, a sensor that shuts itself down stops being an issue for a custom scanner, but it would of course be nice to detect the condition as well.

Changes and rapid changes

I was surprised to see that almost all sudden increases or change of directions of the measured values are treated as suspect. In some of the experiments in data interpretations I have conducted, almost 50% of the measures were flagged as less trustworthy or requiring special attention. In extreme cases, the Abbot meter will even decline to show historical values. This behavior is almost invisible in normal use. There is the occasional "wait ten minutes and try again" message or the unexplainable gap in historical data later. In less extreme cases,  data still gets flagged as unreliable now and then. This would definitely impact a third party scanner or a CGM.

Putting it all together - the good.

When the sea is calm and clean data comes in on a regular basis, using the Libre as a real CGM is definitely possible for short periods of time. Here is such an example where data was collected every minute both on a phone and meter, interpreted through a small custom algorithm that yielded a +/- 1 mg/dL correspondence. So far so good.


The bad

Let's now consider a situation where IG is changing a bit differently. Here is a simple interpretation of the raw data compared to what Abbott  makes of it. IG is slowly rising and so are the reported TI values. It looks as if the official meter first ignores the increase and then, when some threshold is exceeded, kicks in predictive extrapolation to reach the plateau before the actual data. Then, something changes (at this point I am not sure what changes exactly) and the straightforward interpretation of the raw data starts to diverge (the BG meter reported a value exactly between the raw value and Abbott's interpreted value). This behavior could arise from, for example, the application of an approximated sigmoid calibration curve where a change in IG range would switch to a different part of the curve. It could also be the consequence of a change in a scaling factor at certain levels. After a while, the traces will re-correlate until the next "incident".


 The ugly

 But what happens to a naive interpretation when a severe error goes undetected? Well, bad things. And bad things that can happen at less than ideal moments. Here is a result of a non official interpretation when an error is induced and not detected. One can easily imagine the panic of a non technical user if he saw an worrying but stable 50ish situation crash abruptly in real time.  A technical user would think "hey, this can't be - this is either a bug or an undetected error". A mother at night might be racing to the fridge for the Glucagon. This is a BAD THINGTM


Additional examples

I have dozens of extremely good CGM runs.

This one is meter 137 - home CGM 138.75


This one is meter 124 - home CGM 121 (the measure as -5 was "flagged" in the tag data)











This one is meter 156 - home CGM 159
This one is real time tracking of my son going a bit low












And this one is real time tracking of the effect of two Dextro tablets. It was, by the way, a fascinating experience to observe and measure in real time, minute by minute, what we usually learn to approximate by experience.


And a copy of the "lab notes" - not sure anymore this is an exact match, the notes usually end up in a spreadsheet and I ran multiple experiments.



Wrapping up

Wearing my techie's hat, I could decide that this is wonderful, start to party thinking the problem is solved and unleash code onto the world. That would be irresponsible.

Wearing my MD's hat, I say - wait a minute, this is not good at all. There are too many unknowns, too many poorly understood situations or possible error conditions that could lead, if displayed to real patients, to absolutely disastrous situations.

This is why I limit myself to providing hints and tips. I do hope, however, that other people who may have easier sensor access than I do and are better and more motivated than I am may eventually find useful information here that would ultimately lead to much better implementations and a better understanding of the system.

Possible future investigations

  • Not feeding intrinsically tainted data to any algorithm, in other words, detect 100% of the data that should be ignored on technical/physical grounds (sampling issues, TI error conditions, temperature...)
  • Deal with borderline data with the same caution as Abbott.
  • eventually implement a custom system + algorithm, possibly similar to what Dexdrip does on the Dexcom raw data.
Note: I may post the listing of a typical FRL152H based application (non Abbott) as an addition to this post. Posting it in the midst of this post would have made it extremely hard to read.

Regardless of where my investigations lead, I enjoyed tremendously the time I spent looking at this system. The journey is the destination!

Addition: 

I have been asked about what happens in higher ranges - we don't often see them, but here is such a case which we monitored today. Please be aware that the data below is presented in a slightly misleading way in terms of alignment because I have not yet modified the display code.