By: Michael Covert, CEO of Analytics Inside
Hospital readmission is an event that health care providers are attempting to reduce, and it is a primary target of new regulation from the US Affordable Care Act. A readmission is defined as ANY reentry to a hospital 30 days or less from a prior discharge. A financial impact is that US Medicare and Medicaid will either not pay or will reduce the payment made to hospitals for expenses incurred. By the end of 2015, over 2600 hospitals will incur these losses from a Medicare and Medicaid expense that exceeds $24B annually.
Using predictive analytics and big data solutions from Hortonworks and Concurrent, Analytics Inside has created MedPredict™, a system that allows for health care providers to create complex predictive models that can assess who is most at risk for such readmission. This technology uses LACE – Length of Stay, Acuity, Comorbidity, and Emergency admittance, along with over 120 other types of patient information to categorize the risk. MedPredict allows all patient data to be scored, and also allows for many models to be created broken down by categories such as heart attack (AMI), heart failure (HF), pneumonia, hip/knee replacement, stroke, and chronic obstructive pulmonary disease (COPD).
Hospital readmission and calculating LACE score is a big data problem. Why? Typical patient can have several gigabytes of data and much information is hidden in unstructured chart data and clinical notes. Additionally 68,000 diagnosis codes and a very large number of modifiers, and new diagnosis codes are constructed to contain a lot of information. Also 87,000 procedure codes, again now highly encoded with information. Data variation is quite large – HCPCS, ICD-10, CPT, and more, and researchers want to add more data!
The computational aspects of computing LACE scores makes it ideal for Cascading as a series of reusable subassemblies. We chose Cascading to help reduce the complexity of our development efforts and Driven, also from Concurrent, to monitor performance and troubleshoot run-time issues. MapReduce provided us the scalability that we desired, but we found that we were developing massive amounts of code to do so. Reusability was difficult, and the Java code library was becoming large. By shifting to Cascading, we found that we could better encapsulate our code and achieve significantly greater reusability. Additionally, we reduced complexity. Cascading has allowed us to produce a reusable assembly that is highly parameterized, thereby allowing hospitals to customize its usage. Cascading local mode allows for easy testing, and it also provides a scaled down version that can be run against a small number of patients. However, by using Cascading in Hadoop mode, massive scalability can be achieved against very large patient populations and against very large ICD-9/10 code sets.
In conclusion, Cascading and Driven have been cost-effective and has preserved a large initial investment for us. The Cascading API provides simplification and understandability, which accelerated our development velocity metrics, and also reduced bugs and maintenance cycles. Driven provides performance monitoring and management so we can ensure the application is meeting production SLAs and rapidly troubleshoot problems, as needed, without wading through logs and code.
We will be doing a deep dive discussion about the architecture of the solution in a webinar on September 1st, 2015. Please join us if your curious about:
- Creating a deep learning architecture for sophisticated predictive analytics
- Implementation considerations for a working predictive analytics solution: MedPredict™
- Best practices for building, monitoring and managing your Big Data analytics applications