Nationwide Healthcare Price
Transparency Solution

Cloud-native application

Business need

Each year, millions of Americans receive surprise medical bills. The Transparency in Coverage (TiC) Final Rule was meant to put an end to the age of opaque pricing. However, it has created a vast, complex, and inconsistent dataset that users couldn’t easily interact with.

Result

MindK team has developed a cloud-native pipeline capable of ingesting over 20 terabytes of insurance data. It created a price transparency solution for 20,000+ medical procedures. The platform supports local search across the US and comparison between 250,000+ healthcare providers.

Industry
Healthcare
Location
USA
USA
Working together since
2025

HLTH Rate is a payer transparency directory that enables individuals to compare medical costs easily at various providers across all 50 U.S. states.

The challenge

The TiC regulation mandates payers to disclose pricing information. Every month, insurers publish thousands of machine-readable files (MRFs). These files are human-readable in theory only. Each MRF has a complex nested structure with over 100 GB of non-standardized data in dozens of folders.

The solution

The team created a sophisticated price transparency tool using AI and cloud-native data pipelines. The tool can process files that surpass the available memory size, ingest and normalize the data, deduplicate records, and combine fragmented billing info into comprehensive total rates. This price transparency data is further enriched with accurate coordinates and other useful info.

The entire healthcare pricing tool is architected with a focus on high data quality, modularity, and scalability to handle varying data volumes:

Automated discovery of new index files from major payers
Efficient MRF parsing: JSON boundary detection and selective extraction.
Distributed processing and quality gates throughout the data pipeline
Optimized S3 storage with Apache Parquet's 10:1 compression
Fast search with AWS Athena query engine and standard SQL
Comprehensive logging and alerting at each stage

Processing 20+ TB worth
of MRF files

The sheer size of the files is the most obvious challenge. MRFs often exceed the system memory. They exist at time-limited URLs, making it difficult to automate downloads.

Processing 20+ TB worth
of MRF files

The sheer size of the files is the most obvious challenge. MRFs often exceed the system memory. They exist at time-limited URLs, making it difficult to automate downloads.

The team had to design a scalable and resilient pipeline capable of discovering, downloading, and preparing these massive files for transformation.

First, an automated scanner catalogs thousands of MRF index files (~76,000 for UnitedHealthcare alone). It detects URL expirations, refreshes links, and follows HTTP redirects to download up-to-date MRFs.

First, an automated scanner catalogs thousands of MRF index files (~76,000 for UnitedHealthcare alone). It detects URL expirations, refreshes links, and follows HTTP redirects to download up-to-date MRFs.

The pipeline doesn't load huge MRFs into memory. Instead, we process compressed data streams incrementally using JSON boundary detection. This allows us to handle files 10 to 100 times larger than the available RAM. The pipeline also supports parallel file processing to maximize throughput and efficiency.

Quality, consistency, and duplication issues in price transparency data

The raw MRFs are full of low-quality, inconsistent records. Although regulators demand transparency in pricing, they don't specify data structures. Therefore, payer schemas have slight variations, MRFs contain duplicate records, lack key information, and have unrealistic prices.

Our transformation process cleanses, normalizes, and validates this inconsistent raw data. A distributed architecture ingests raw JSON and enforces a standardized schema. The output is a consistent tabular format. Quality gates cross-reference provider NPI against the national registry and flag any outliers.

Tackling complex billing logic

A single medical procedure may have multiple billing components. Whenever possible, our goal was to present a total rate (which is much easier to understand for regular people).

The pipeline analyzes billing code modifiers and the billing_class field to distinguish partial rates from total rates. Component rates are combined or filtered out to present the full cost of a service.

Integrating geolocation data for fast search

Our price transparency solution integrates data from the official NPI registry, filtering 8M+ providers down to just 252K relevant organizations. Unfortunately, these records lack provider longitude and latitude, slowing location-based search to a crawl.

We validate provider addresses against postal service databases to convert them into precise coordinates. A dedicated health pricing API endpoint exposes this geocoded data, making the radius search quick and convenient.

Optimizing price transparency data for cost and performance

The final dataset was too massive for traditional databases, so the team converted JSON data to Apache Parquet. This reduces storage costs by up to 90%. Using common query filters, partitioned the data into a logical S3 hierarchy based on. AWS Athena query engine limits the scanning to the necessary data subsets. The search is exceptionally fast and does not incur idle DB infrastructure costs.

Our price transparency solution allows users to

Find information on 20K+ medical procedures
Compare prices across providers on interactive map
Forecast the total treatment cost
See common treatment plans associated with the procedure

Services provided

Discovery phase

Data engineering

Web development

Quality Assurance

UI/UX design

DevOps

Tech stack

  • React React
  • NestJS NestJS
  • Google Maps API Google Maps API
  • PostgreSQL PostgreSQL
  • TypeORM TypeORM
  • Redis Redis
  • AWS Athena AWS Athena
  • OpenAI API OpenAI API
  • AWS CloudWatch AWS CloudWatch
  • Nginx Nginx
  • Bitbucket Pipelines Bitbucket Pipelines

Business value

We turned 20+ TB of disjointed pricing data into an easily searchable database. The healthcare pricing tool allows patients to see the accurate cost for select procedures. They can also see common treatment plans associated with the procedure. The platform's unconventional design reduces storage costs by up to 90% and supports ultra-fast queries.
1
of healthcare price transparency data analyzed
1
providers in the database
1
seconds to fetch accurate results

Work with
healthcare experts

Let us know about your technology challenges or request access to our health pricing API.

More cases

BIG Healthcare

Healthcare Big Data Analytics Platform

Learn more

GoodBilling

AI-Powered, End-to-End RCM Automation Platform

Learn more

The Lactation Network

The First EMR for Lactation Consultants

Learn more