Behind every autonomous coding platform is a domain trained natural language model that reads clinical notes the way a senior coder does. Here is what NLP does, what it does not do, and how to evaluate the engine inside the platform you are buying.
01 / BASICSWhat NLP does in coding
From note to code, in three jumps.
02 / STACKInside the modern stack
Five layers, each non trivial.
Domain LLM
Trained on 8 to 12 billion clinical tokens, healthcare specific.
Concept graph
UMLS plus payer specific extensions, 4.6 million concepts.
Section parser
Reads chart sections in clinical order, not text order.
Rule engine
Applies coding rules and payer overlays.
Confidence scorer
Tags every code with audit trail and confidence.
Feedback loop
Coder overrides feed retraining each cycle.
03 / HEALTHCAREWhy healthcare NLP is different
Generic LLMs miss what coders catch.
Generic LLMs trained on internet text read clinical notes poorly. They miss laterality. They confuse acute and chronic. They mis assign rule out diagnoses. Healthcare NLP needs a clinical lineage and a coder feedback loop. There is no shortcut.
A general purpose LLM can pass the bar exam but cannot code an emergency department visit. Different work, different training.
04 / TESTHow to test an engine
Twenty five charts. Two hours. Real signal.
- Pick a stratified sample of 25 charts across your top three specialties
- Have a senior coder code them blind
- Run the AI on the same 25
- Compare disagreements at the line level
- Score on three dimensions, accuracy, audit completeness, and edge case handling
05 / FUTUREWhere NLP is going next
The next 18 months.
NLP is the engine. If the engine is weak, no amount of UI polish saves the platform. Spend two hours on a real chart test before you spend two months on a deal.