With regulatory push for EHR adoption, there is an impending avalanche of healthcare data coming in the next few years. Some believe it’s already here. But data can come in different flavors: from the frighteningly common free text to loosely categorized documents to well structured messages. The less structure it has, more hard it becomes for a machine to understand the real meaning (semantics) of the content. The combined effect of increasing quantity and poor quality makes this a bigger problem than what most anticipate.
Apixio is one of the few startups tackling this issue. Their analytics engine indexes the underlying data, processes queries and provides context-relevant results. The core technology is supposedly based on Apache’s Pig (a data-flow language and execution framework for parallel computation), Hadoop (a framework that allows for the distributed processing of large data sets across clusters of computers) and Cassandra (a scalable multi-master database).
There are a number of terminologies (read ontologies) in healthcare, trying to specify the concepts and relationships from a particular perspective. LOINC, ICD, SNOMED, CPT are common examples, but see a pretty comprehensive list of all human-related ontologies at BioPortal (filter by category ‘Health’).
So a medical-grade search service offering would need to traverse such terminologies and surface all relevant, normalized data related to the query. For example, a search for keyword “breathlessness” in a patient with long, complicated medical history would bring up documents and encounters that mention items like wheezing, PEFR, smoking, asthma management. It’s no short order to do all that analytical crunching.
Sophisticated data transformation and abstraction offerings are certainly needed for making sense of complex healthcare data. Niche efforts like Apixio, 360Fresh, are signs of growing market realization that the era of just trying to digitize healthcare data is getting over. Now we start figuring out what the heck to do with all the incoming bytes.
PS: Advanced analytics offerings in healthcare are an interesting topic. See this wiki page for a living list of relevant companies in this space.