Imagine you changed insurance, health provider, or moved to a new city. You’re going in for a routine check at your new hospital, you just met your new physician 10 mins ago and they ask for your medical history – Just in case it’s needed for an emergency.
Oh, surely your old hospital has them. Maybe you can call them and ask? Well, they have it in digital format. Can your old hospital print them? Surely your new hospital doesn’t have a basement full of papers with the medical history of the hundreds of thousands of people they serve… They probably have everything in digital format as well. How do those computer systems talk to each other?
Welcome to the problem of Health Data Interoperability – A fancy way of saying that hospital computer systems should be able to understand each other, and move your data around. But this topic is important not only for you, but to the future of medicine as well! It impacts how doctors and researchers gather and analyse millions of clinical records to design better treatments and better medicine.
—
“Data is the new oil” has become a common expression in the media. It is generally used to express how data will fuel the new economy, and to express its value.
What people don’t realise is that this analogy holds very well in other senses: Oil is not usable in its raw form. It needs to be extracted, transported, refined into raw materials, and only then used to produce value (plastics, fuel for transportation and heating, etc).
Let’s look at data in the same light. Data in the real world is mostly unorganised and unstructured – A mess of different components, just like crude oil. It needs to be processed in order to get real value.
Take this example of 2 laboratory test results:
> Leukocytes in blood was at 7.3 10^9/L on 10/12/2019 at 16:49
> 12/10/19 4:49 pm – Leucócitos no sangue a 7.3 10*3/µL
Both these phrases have the same information, yet they are in different languages, the dates are in different formats (mm/dd/yyyy and dd/mm/yy), and the units are different (although their magnitudes are the same). The pieces of information come in different order, and there might be typos in the text. This information is in its most common raw form – natural language – because that’s how we, people, communicate.
Now imagine we want to store millions of these records, analyse them and produce a very simple report – say, you want to know the average leukocyte count in a certain population before, during and after the flu season. Unlike a human, software has significant problems with reading and analysing data in natural language. Most technology requires some sort of structure to store and query data.
So, let’s assume that by some magic [1] we get the information in a more structured format:
date: 10/12/2019 16:49
type: Leukocytes in Blood
value: 7.3
unit: 10³/µL
This makes things much better. We have the type of information on the left side, followed by a colon, and the value on the right. With these rules computers can read the information better.
However, just because the data is structured doesn’t mean it’s standardised. Dates can still be in different formats, units can appear in wildly different ways (eg. 10*3/µL, 10³/µL, 10³/uL), and the type of the laboratory test can have many variations (what techniques / chemicals were used, sensitivity of the test, where samples were taken from, what fluid was tested etc).
So let’s look at how we can standardise medical data.
If you have worked in technology, chances are you’ve come across this XKCD comic – different standards exist, with different rates of adoption, most of the time depending on the region. Picking a standard is no easy task, and you have to make an educated guess given your constraints.
For this case, let’s pick a couple of standards that are well known for the use case of laboratory results:
- ISO 8601 – A standard for representation of dates and times
- UCUM – The Unified Code for Units of Measure – A standard to represent units of measure
- LOINC – The international standard for identifying health measurements, observations, and documents.
Now we know not only in which structure, but also in which format we want our data.
date: 2019-12-10T16:49:41Z
type_code: 6690-2
type_code_system: LOINC
type_description: Leukocytes [#/volume] in Blood by Automated count
value: 7.3
unit: 10*3/uL
If we know, and communicate, that we’re using these standards, anyone can unequivocally know that this test was taken on the 10th of December, 2019, that the test was for Leukocytes count per volume of blood via an automated counting method, and that the unit is 10³/µL.
Just to be clear, we’re simplifying things a lot, searching for Leukocytes in the LOINC standard yields 1019 types of tests, including different fluids, different assays, different test areas, combination of values through rates, and lab panels.
But we could even go further in our laboratory test case:
Imagine that we would like to tell where the blood was drawn? How would you unequivocally tell without a doubt that it was on the left arm? We could use SNOMED-CT standard, with the code 368208006 for left arm.
What if we would like to include that the reason why this test was taken was due to suspected ear infection? We could use ICD-11 with the code AB00&XK8G to specify Otitis on the left ear. And so on…
Playing nicely together
Ok, we now have a structure, and a format that computers understand. Well, a structure that our computers understand… What about other computers?
There are many different systems, form different manufactures, in hospital environments. And there are multiple hospitals in a country, each using a different combination of systems. How do they get and communicate data to our system?
Do they know we have this specific structure of the data? That we use the field type_code to specify the code, and type_code_system to tell what standard we’re using? And how are they communicating those fields? As JSON (like this {“field”: 123}) or XML (like this: 123)? or even in some structure that humans can’t even read?
Well, it’s time to introduce… you guessed it, another standard: FHIR (read like “fire”) standing for “Fast Healthcare Interoperability Resources”.
FHIR is an international effort from multiple organizations, healthcare IT providers and individuals to make a standard structure for how to communicate health data between hospital and health systems. It defines a set of resources (chunks of information), as well as their structure, validations, and in some case formats, to ensure different systems can receive and process data unequivocally.
If we get back to our lab result example, and look at the FHIR standard, we see it falls into the DiagnosticsReport resource, and that it can specify, among others things, fields like:
* identifier
* status
* code
* performer
* conclusion
* … etc
It also shows how that data can be expressed in XML, JSON and other formats.
Is it that simple?
I think you might know the answer… Not really…
As mentioned before, standards are a complicated topic. They vary from country to country and there are sometimes political and monetary considerations (not all standards are free for use without a license).
Another important thing to mention is that standards change, as all technology does. The predecessors of FHIR (eg. HL7 v2 from 1995) will make a lot of software engineers of today cringe, and ICD-11 is the 11th revision of the ICD standard. Today there are legacy systems being used daily that contain outdated standards. And some countries still insist on creating and maintaining their own standards, isolating themselves from the international community and other IT systems.
So, what can we, technology companies, do to help in making health data more accessible to researchers and technologies like artificial intelligence?
How are we tackling this challenge at Kaiku?
Simply putting it: Designing systems with a standards first mentality.
We have decided to adopt FHIR as our starting point for modelling new data models, and all integrations we do use FHIR as an intermediary dialect to our core system. All integrations are mapped to FHIR based dialect, and all systems within Kaiku consume those messages.
For software engineers adopting a standard can feel like losing freedom: “Oh, it would be so easy to just make my own data structure, one that I could change any time I want“, some people might think.
But ignoring the thousands of hours experts have poured into defining a standard, for what feels like freedom, is a deadly trap (sometimes quite literally). It’s only later on, when the complexity goes beyond the engineer’s grasp, or when data needs to be shared or exported, that the standard’s benefits become overwhelming apparent.
But don’t get me wrong: even when knowing the importance of developing with a standard first mentality, the temptation is quite often still there. Like on certain use cases that just scream for us to add that just one extra data field that does not comply with the standard. But after that one there would be another one… and another…
At the end of the day, we need to remind ourselves that having the discipline to work with the standards and not around them, contributes not only to the success of our projects. It contributes to the greater goal of pushing medical data and medical research in the right direction by allowing data to be free and mobile, if their owners allow and request so.
Sérgio Isidoro, Software Engineer
[1] – This is a topic to be covered on its own. If you want to go down the rabbit whole, brace yourself: https://medium.com/curai-tech/nlp-healthcare-understanding-the-language-of-medicine-e9917bbf49e7