RCM Automation for Telehealth: Rules vs AI — Why the Best Systems Use Both
The sophisticated approach to revenue cycle automation isn't choosing between deterministic rules and AI. It's knowing exactly when to use each—and how to make them work together safely.
The False Binary That's Costing You Revenue
Every week, we hear the same question from telehealth providers: "Should we use rules-based automation or AI for our revenue cycle?"
It's the wrong question.
Here's why: A Parkinson's disease telehealth visit generates a predictable pattern. The provider is licensed in Texas. The patient is at home in Texas. The visit is synchronous via video. These facts don't require AI to verify—they're deterministic. If patient location = Texas AND provider license state = Texas AND visit type = video, then apply POS 10 (patient's home) + Modifier 95 (synchronous telemedicine). This rule will fire correctly 100% of the time.
But that same visit also has a 32-minute consultation note with unstructured clinical documentation. Extracting "G20.A1 - Parkinson's disease without dyskinesia" from narrative text like "Patient presents with pill-rolling tremor at rest, bradykinesia noted during finger tapping, cogwheel rigidity in upper extremities" absolutely requires AI. No rule can map that clinical description to the correct ICD-10 code with sub-classification.
The insight that drives modern RCM automation: These aren't competing approaches. They're complementary capabilities that solve fundamentally different problems.
When Rules Win: The Power of Deterministic Logic
Let's start with where rules-based automation excels—and why you should use it aggressively before reaching for AI.
The Telehealth Modifier Problem
Telehealth billing has exploded since 2020, but payers have been slow to adapt their adjudication systems. The result? A minefield of easily-preventable denials. CARC 96 ("Non-covered charge: POS inconsistent with procedure") is now one of the top denial codes for virtual care providers.
The fix is mechanical:
IF visit_type = "telehealth" AND patient_location = "home"
THEN set POS = 10 AND add modifier = 95
ELSE IF visit_type = "telehealth" AND patient_location = "clinic"
THEN set POS = 02 AND add modifier = 95
This is a perfect use case for rules. The inputs are structured (visit type is a categorical field in your EHR). The logic is deterministic (the relationship between visit modality, location, and billing codes never changes). The consequences of getting it wrong are severe (automatic denial, $48 average rework cost per claim). And most importantly, the rule will never make a judgment call—it either fires or it doesn't.
Time-Based E/M Coding
Evaluation and Management coding has evolved. When providers document time-based encounters, the CPT code selection is purely algorithmic:
if documented_time >= 40 minutes: code = 99215
elif documented_time >= 30 minutes: code = 99214
elif documented_time >= 20 minutes: code = 99213
elif documented_time >= 10 minutes: code = 99212
else: code = 99211
There's zero ambiguity here. If your clinical note includes "Total time: 32 minutes," the correct code is 99214. Every time. No exceptions. Using AI to "predict" this code would be engineering malpractice—you're introducing unnecessary complexity and potential errors into a solved problem.
Interstate Licensure Validation
Here's another scenario where rules dominate: A neurology practice treats ALS patients across multiple states. Before auto-submitting a claim, you need to verify:
provider_licenses = get_active_licenses(provider_npi)
patient_state = encounter.patient.address.state
if patient_state in provider_licenses:
validation_status = "PASS"
else:
validation_status = "BLOCKED"
route_to_credentialing_team()
This is pure boolean logic. The provider either holds an active license in the patient's state or they don't. There's no "confidence score" needed. The data is structured (license records in your credentialing system). The logic is unambiguous. And if you get it wrong, you've submitted a claim for a service that wasn't legally rendered—a compliance nightmare.
Why Rules Matter for Compliance
Rules-based automation has a superpower that AI can never fully replicate: perfect reproducibility and auditability.
When a payer audits your billing practices, you need to show exactly why each claim was coded the way it was. "Our AI model predicted this code with 87% confidence" is not a defensible answer. But "This claim triggered our telehealth edit rule R-102, which deterministically applies POS 10 + Modifier 95 to all home-based video visits" is bulletproof documentation.
This matters enormously for telehealth providers who are already under heightened scrutiny. CMS and commercial payers are actively auditing telehealth billing patterns. Rule-based logic provides an audit trail that AI simply cannot match.
When AI Shines: Handling Ambiguity at Scale
Now let's examine where AI becomes indispensable—the scenarios where deterministic rules hit their limits.
Clinical Documentation Extraction
Consider this snippet from a telemedicine addiction medicine consultation:
"Patient continues buprenorphine/naloxone 16mg daily. Reports stable mood, no cravings this week. Urine tox today negative for illicit substances. PDMP check shows no concerning fills. Patient describes improved sleep since starting trazodone. Denies suicidal ideation. Will continue current regimen."
What ICD-10 codes should this encounter carry?
A human coder would identify:
F11.20 - Opioid use disorder, uncomplicated (primary diagnosis)
Z71.89 - Counseling for substance use (documented counseling)
G47.00 - Insomnia, unspecified (sleep issue mentioned)
No rule can extract this. The clinical reasoning is implicit. The provider didn't write "diagnosis: opioid use disorder." They wrote clinical narrative that implies the diagnosis based on medication choice, monitoring pattern, and treatment trajectory.
This is where modern large language models excel. They can:
Understand medical terminology and context
Recognize symptom-diagnosis relationships
Extract relevant facts even when phrased conversationally
Map clinical descriptions to standardized code sets
But—and this is critical—they do so probabilistically, not deterministically.
Handling Variability in Clinical Documentation
Real clinical notes are messy. Some providers are verbose, others are terse. Some use formal medical terminology, others write conversationally. Some include explicit diagnosis statements, others bury diagnostic information in treatment plans.
Example variations for the same condition:
Verbose: "Patient presents with bilateral lower extremity edema, orthopnea, paroxysmal nocturnal dyspnea, and fatigue. JVD noted on exam. CXR shows pulmonary congestion. BNP elevated at 892. Echocardiogram reveals reduced LVEF of 35%. Diagnosis: Heart failure with reduced ejection fraction."
Terse: "HFrEF exacerbation. EF 35%. Increase furosemide to 40mg BID."
Conversational: "Patient's heart failure is acting up again. His EF dropped to 35% on the latest echo. Legs are really swollen. Bumping up his diuretic."
All three describe the same condition (I50.21 - Acute systolic heart failure), but no static rule can reliably extract it from all three formats. You need semantic understanding, not pattern matching.
This is where AI proves invaluable: it can handle the infinite variability of human language while still maintaining reasonable accuracy across different documentation styles.
Prior Authorization Clinical Justification
Electronic prior authorization has a particularly thorny extraction problem: supporting clinical evidence.
For a buprenorphine/naloxone PA, payers typically require:
Diagnosis code confirming opioid use disorder
Previous treatment attempts (documenting medical necessity)
Current assessment of stability
The first is often explicitly stated. But the second two? They're buried in free text:
"Patient initially tried methadone maintenance at outside clinic 2019-2020 but had difficulty with daily witnessed dosing due to work schedule. Transitioned to buprenorphine in 2021 with significant improvement in adherence and social functioning."
An AI can extract "previous therapy: methadone (failed due to adherence barriers)" and "current status: stable on buprenorphine with good compliance." These aren't keyword matches—they require understanding temporal relationships, causal reasoning, and clinical context.
This extraction unlocks enormous value: instead of billing specialists manually reading notes and copy-pasting information into PA forms (90+ minutes per PA), the system can auto-populate 80-90% of fields and route only ambiguous cases for human review.
The Hybrid Architecture: Engineering Trust Through Layered Safety
Here's where sophistication enters the picture. The best RCM automation systems don't just "use AI sometimes and rules sometimes." They architect multi-layered validation that provides defense-in-depth against errors.
Layer 1: Provenance Labeling
Every automated decision must carry metadata about how it was made:
{
"field": "icd10",
"value": "G20.A1",
"confidence": 0.88,
"provenance": "LLM",
"source_text": "Parkinson's disease without dyskinesia (line 12-14)",
"extraction_timestamp": "2025-10-06T14:23:11Z"
}
Compare that to a rule-based decision:
{
"field": "modifier",
"value": "95",
"confidence": 1.0,
"provenance": "Rule",
"rule_id": "R-TH-001",
"rule_description": "Telehealth synchronous: add Modifier 95",
"execution_timestamp": "2025-10-06T14:23:09Z"
}
Notice the critical difference: Provenance labeling makes the decision-making process auditable. A billing specialist can see immediately whether a value came from deterministic logic (confidence = 1.0, provenance = Rule) or probabilistic extraction (confidence = 0.88, provenance = LLM). This transparency builds trust and enables intelligent human review.
In our system at Foresight, we use three provenance tags:
Rule: Deterministic logic applied to structured data
LLM: AI extraction from unstructured clinical text
Denial Playbook: Automated fix derived from payer denial patterns
Each carries different implications for review requirements.
Layer 2: Confidence-Based Routing
Not all AI predictions are created equal. A well-engineered system needs to route low-confidence decisions to humans while auto-processing high-confidence cases.
Here's the architecture:
# Field-level confidence scores
field_confidences = {
"cpt": 0.94,
"icd10_primary": 0.88,
"icd10_secondary": 0.63, # LOW
"pos": 0.98,
"modifier_95": 0.91
}
# Aggregate confidence (weighted average)
aggregate_confidence = calculate_aggregate(field_confidences)
# Threshold-based routing
CONFIDENCE_THRESHOLD = 0.88
if aggregate_confidence >= CONFIDENCE_THRESHOLD:
route_to_auto_submission()
else:
route_to_human_review(
reason="Low confidence on icd10_secondary",
suggested_value="Z71.89",
confidence=0.63
)
This creates an intelligent triage system. In our production deployments, we're seeing:
85% of claims auto-submitted (high confidence across all fields)
12% require human review (one or more low-confidence fields)
3% blocked (missing required data or credentialing issues)
The key insight: You can tune the confidence threshold to match your risk tolerance. Conservative practices can set threshold = 0.95 and route more items to review. Aggressive practices can set threshold = 0.80 and maximize automation. The system adapts to your operational priorities.
Layer 3: Type Checking and Validation
Even when AI extraction has high confidence, you need downstream validation. This catches LLM hallucinations and formatting errors:
def validate_icd10(code: str, confidence: float) -> ValidationResult:
"""Multi-layer validation of extracted ICD-10 codes"""
# Layer 1: Format validation
if not re.match(r'^[A-Z]\d{2}\.?\d{0,2}[A-Z]?$', code):
return ValidationResult(
valid=False,
reason="Invalid ICD-10 format",
provenance="Type Check"
)
# Layer 2: Code set membership
if code not in VALID_ICD10_CODES:
return ValidationResult(
valid=False,
reason="Code not in ICD-10-CM 2025 code set",
provenance="Code Set Validation"
)
# Layer 3: Clinical plausibility checks
# Example: If procedure is neurology consult,
# flag if primary diagnosis is obstetric code
if is_clinical_mismatch(code, procedure_codes):
return ValidationResult(
valid=True, # Don't auto-reject
needs_review=True,
reason="Diagnosis-procedure mismatch detected",
confidence_penalty=-0.15
)
return ValidationResult(valid=True)
This catches multiple error modes:
Format errors: "G20A1" → should be "G20.A1"
Hallucinated codes: "G20.99" → doesn't exist in ICD-10-CM
Clinical implausibility: Obstetric code on male patient
Each validation layer provides a safety net. Even if the LLM produces a malformed output, type checking prevents it from reaching the clearinghouse.
Layer 4: Few-Shot Prompting for Domain Specificity
Generic AI models aren't optimized for medical coding. Few-shot prompting grounds the model in domain-specific patterns:
You are a medical coding assistant specializing in telehealth neurology.
Examples of correct ICD-10 extraction:
Input: "Patient with long-standing Parkinson's disease, no dyskinesia noted"
Output: G20.A1 (Parkinson's disease without dyskinesia)
Input: "ALS patient, bulbar onset, significant dysphagia"
Output: G12.21 (Amyotrophic lateral sclerosis)
Input: "Mild cognitive impairment, likely early Alzheimer's"
Output: F06.7 (Mild neurocognitive disorder without behavioral disturbance)
Now extract ICD-10 from this note:
[ACTUAL CLINICAL NOTE]
Output only valid ICD-10 codes with brief justification.
Few-shot prompting dramatically improves accuracy on domain-specific tasks. In our testing with specialty neurology practices, we saw:
Without few-shot prompting: 73% accuracy on sub-classified ICD-10 codes
With few-shot prompting: 89% accuracy on the same task
The improvement comes from grounding the model in specialty-specific patterns rather than expecting it to generalize from its broad pre-training.
Layer 5: LLM-as-a-Judge for Quality Control
For critical decisions, a second AI can validate the first AI's output:
def validate_with_judge(
extracted_code: str,
clinical_text: str,
extraction_confidence: float
) -> JudgmentResult:
"""Use a second LLM to validate code extraction quality"""
judge_prompt = f"""
A medical coding system extracted the ICD-10 code {extracted_code}
from the following clinical text:
{clinical_text}
Is this code appropriate? Respond with:
- AGREE: Code is correct
- DISAGREE: Code is incorrect (provide correct code)
- UNCERTAIN: Cannot determine from available information
Explain your reasoning.
"""
judge_response = llm.complete(judge_prompt)
if judge_response.verdict == "AGREE":
return JudgmentResult(
validated=True,
confidence_boost=+0.05
)
elif judge_response.verdict == "DISAGREE":
return JudgmentResult(
validated=False,
alternate_code=judge_response.suggested_code,
route_to_human=True
)
else: # UNCERTAIN
return JudgmentResult(
validated=False,
route_to_human=True,
reason="Judge uncertain, requires specialist review"
)
This creates a consensus mechanism. When two independent AI systems agree on an extraction, confidence increases. When they disagree, the case routes to human review. This is particularly valuable for complex multi-code scenarios where no single ground truth exists.
Layer 6: Complementary Keyword Matching
AI excels at semantic understanding, but simple keyword matching still has a role for high-value flags:
CRITICAL_KEYWORDS = {
"suicidal": {"flag": "psychiatric_emergency", "block_auto_submit": True},
"suspected abuse": {"flag": "mandated_reporting", "block_auto_submit": True},
"terminal": {"flag": "hospice_consideration", "review_level": "clinical"},
"experimental": {"flag": "coverage_risk", "review_level": "billing"}
}
def scan_for_critical_flags(clinical_text: str) -> List[Flag]:
"""Keyword-based safety net for high-risk scenarios"""
flags = []
for keyword, metadata in CRITICAL_KEYWORDS.items():
if keyword.lower() in clinical_text.lower():
flags.append(Flag(
keyword=keyword,
**metadata
))
return flags
This catches edge cases that AI might miss or contextualize incorrectly. If a note mentions "suicidal ideation," the claim should route through clinical review regardless of AI confidence scores. Simple keyword matching provides an additional safety layer.
The Denial Resolution Loop: Where Hybrid Automation Gets Sophisticated
Here's where truly advanced systems differentiate themselves: automated denial remediation that combines historical pattern matching (rules) with adaptive problem-solving (AI).
Building Denial Playbooks
After processing thousands of claims, patterns emerge in payer denial codes. These patterns can be codified:
DENIAL_PLAYBOOKS = {
"CARC-96": {
"description": "Non-covered: POS inconsistent with Mod 95",
"common_causes": [
"Missing Modifier 95 on telehealth claim",
"POS 11 (office) used instead of POS 10 (home)",
"Modifier 95 present but visit not actually telehealth"
],
"auto_fix": {
"conditions": [
"visit_type == 'telehealth'",
"patient_location == 'home'"
],
"actions": [
"set_pos(10)",
"add_modifier(95)",
"add_note('Video visit conducted per telehealth guidelines')"
]
},
"confidence_threshold": 0.95
},
"CARC-197": {
"description": "Precertification/authorization absent",
"auto_fix": {
"conditions": [
"epa_status == 'approved'",
"epa_number IS NOT NULL"
],
"actions": [
"attach_auth_number(epa.pa_number)",
"add_note(f'PA #{epa.pa_number} approved {epa.approved_date}')"
]
},
"confidence_threshold": 0.99
}
}
These playbooks enable automatic resubmission for common denial types. CARC 96 (POS mismatch)? The system knows to check if it's a telehealth visit, apply the correct modifiers, and resubmit. CARC 197 (missing auth)? The system links to the approved ePA record, attaches the authorization number, and resubmits.
In production, this translates to:
68% of denials auto-resolved on first resubmission attempt
$0 human time for straightforward denial types
<24 hour turnaround from denial receipt to corrected resubmission
When Playbooks Aren't Enough: AI-Assisted Resolution
Some denials are novel or complex. CARC 11 ("Diagnosis inconsistent with procedure") doesn't have a one-size-fits-all fix. The issue could be:
Wrong procedure code for the documented diagnosis
Missing secondary diagnosis that justifies the procedure
Clinically appropriate but poorly documented in the note
This requires reasoning, not pattern matching:
def resolve_complex_denial(
claim: Claim,
denial_code: str,
denial_reason: str
) -> ResolutionStrategy:
"""Use AI to suggest fixes for non-standard denials"""
# Pull relevant context
clinical_note = claim.encounter.clinical_note
original_codes = claim.get_all_codes()
payer_guidelines = fetch_payer_policy(claim.payer, denial_code)
# Construct reasoning prompt
prompt = f"""
Analyze this claim denial and suggest resolution:
Denial: {denial_code} - {denial_reason}
Procedure codes: {original_codes['cpt']}
Diagnosis codes: {original_codes['icd10']}
Clinical documentation:
{clinical_note}
Payer guidelines:
{payer_guidelines}
Suggest:
1. Most likely cause of denial
2. Recommended code changes
3. Supporting documentation needed
4. Confidence in resolution strategy
"""
ai_suggestion = llm.complete(prompt)
if ai_suggestion.confidence >= 0.85:
return ResolutionStrategy(
suggested_fixes=ai_suggestion.fixes,
route_to_human=False,
auto_resubmit=True
)
else:
return ResolutionStrategy(
suggested_fixes=ai_suggestion.fixes,
route_to_human=True,
reason="AI uncertain, requires billing specialist review"
)
This creates an intelligent escalation path:
Try deterministic playbook (if denial code matches a known pattern)
If no playbook exists, invoke AI reasoning
If AI confidence is high, auto-resubmit with suggested fixes
If AI confidence is low, route to human with AI suggestions as starting point
The human always has final say on ambiguous cases, but the AI dramatically reduces the cognitive load by narrowing the problem space.
Real-World Example: Processing a Telehealth Neurology Claim
Let's walk through a complete claim lifecycle to see how rules and AI work together.
Initial Encounter Data
Patient: Jane Doe, TX resident
Provider: Dr. Smith, Neurologist (licensed: TX, FL, CA)
Visit: 32-minute video consultation for Parkinson's disease
Clinical note (excerpt):
"Video follow-up for PD management. Patient reports stable tremor control on carbidopa/levodopa 25/100 TID. No dyskinesia noted during examination. UPDRS motor score 18 (improved from 24 at last visit). Continue current regimen. Next follow-up 3 months."
Step 1: Rule-Based Pre-Processing
The system first applies deterministic rules:
# Telehealth validation
visit_type = "video_consultation" # from EHR
patient_location = "home" # from encounter metadata
→ RULE: Set POS = 10, Add Modifier = 95 ✓ (confidence: 1.0)
# Time-based E/M coding
documented_time = 32 # minutes, extracted from note
→ RULE: CPT = 99214 (30-39 minutes) ✓ (confidence: 1.0)
# Licensure validation
provider_state_licenses = ["TX", "FL", "CA"]
patient_state = "TX"
→ RULE: Licensure match ✓ (confidence: 1.0)
# Service date check (claims must be submitted within 30 days for this payer)
days_since_service = 3
→ RULE: Within timely filing window ✓ (confidence: 1.0)
Four critical fields are now set with perfect confidence. No human review needed. No AI required.
Step 2: AI Extraction from Clinical Documentation
Now the system invokes AI for unstructured data:
# ICD-10 extraction
clinical_text = "... stable tremor control... no dyskinesia noted... UPDRS motor score 18..."
llm_extraction = extract_icd10(clinical_text)
→ Primary: G20.A1 (Parkinson's disease without dyskinesia)
Confidence: 0.88
Provenance: "LLM"
Source: "dyskinesia noted during examination" (line 2)
# Validate against code set
validate_code("G20.A1")
→ ✓ Valid ICD-10-CM code
→ ✓ Clinically consistent with documented procedure (99214 neurology visit)
The AI extracted the correct code with good confidence. The downstream validation confirms it's a real code and clinically plausible.
Step 3: Aggregate Confidence Check
field_confidences = {
"pos": 1.0, # Rule
"modifier_95": 1.0, # Rule
"cpt": 1.0, # Rule
"licensure": 1.0, # Rule
"icd10_primary": 0.88 # LLM
}
aggregate_confidence = weighted_average(field_confidences)
→ 0.95
THRESHOLD = 0.88
0.95 >= 0.88 → AUTO-SUBMIT ✓
Despite one field being AI-extracted, the aggregate confidence exceeds the threshold. The claim auto-submits.
Step 4: Claim Submission & 277CA Response
submit_claim(claim_id="CLM-8847")
→ Status: submitted
→ Listening for 277CA acknowledgment...
# 18 hours later
receive_277ca(claim_id="CLM-8847")
→ Status: accepted_277ca
→ Expected remittance: 7-10 days
Success. The claim was accepted on first submission. No human touched it.
Step 5: Learning Loop
# Log successful auto-submission
log_outcome(
claim_id="CLM-8847",
outcome="accepted_first_pass",
confidence_at_submit=0.95,
fields_used={
"rule_based": ["pos", "modifier_95", "cpt", "licensure"],
"ai_extracted": ["icd10_primary"]
}
)
# Update confidence calibration
if accepted_first_pass:
boost_confidence(field="icd10_primary", delta=+0.02)
The system learns from successful submissions to gradually increase confidence in its AI extractions for similar clinical patterns.
Alternative: Low Confidence Scenario
Now imagine a different clinical note:
"Patient with movement disorder, unclear etiology. May be PD vs essential tremor vs MSA. Plan: trial of carbidopa/levodopa, neuroimaging pending."
llm_extraction = extract_icd10(clinical_text)
→ Primary: R25.1 (Tremor, unspecified)
Confidence: 0.71
Reason: "Diagnosis uncertain in documentation"
validate_with_judge(code="R25.1", clinical_text=note)
→ UNCERTAIN ("Could be R25.1 or G20, insufficient information")
aggregate_confidence = 0.79
0.79 < 0.88 → ROUTE TO HUMAN REVIEW
surface_to_human(
reason="Low confidence on primary diagnosis",
ai_suggestion="R25.1 (Tremor, unspecified)",
alternate_options=["G20 (Parkinson's disease)", "G25.0 (Essential tremor)"],
clinical_context="Diagnosis not yet established per note"
)
The system correctly recognizes diagnostic ambiguity and routes to a human specialist. The AI suggestions give the human a starting point, but the final coding decision remains with someone who can apply clinical judgment.
This is the hybrid approach working as designed: automate the clear cases, escalate the ambiguous ones.
The Art of the Possible: What's Coming Next
The frontier of RCM automation isn't about replacing human judgment with AI. It's about building systems that amplify human expertise while handling routine work autonomously.
Predictive Denial Prevention
Current state: React to denials after they occur.
Near future: Predict denials before submission using historical patterns.
def predict_denial_risk(claim: Claim) -> DenialRiskScore:
"""Predict likelihood of denial based on historical patterns"""
# Feature engineering
features = {
"payer": claim.payer,
"procedure": claim.cpt,
"diagnosis": claim.icd10_primary,
"provider_specialty": claim.provider.specialty,
"patient_state": claim.patient.state,
"claim_amount": claim.charge_amount,
"days_since_service": (today - claim.service_date).days
}
# Check historical denial patterns
historical_denials = query_denials(
payer=features["payer"],
procedure=features["procedure"],
diagnosis=features["diagnosis"]
)
if historical_denials.count >= 3:
denial_rate = historical_denials.denied / historical_denials.total
if denial_rate >= 0.25: # 25%+ denial rate
return DenialRiskScore(
risk="HIGH",
predicted_carc=historical_denials.most_common_carc,
recommendation="Review before submission",
historical_denial_rate=denial_rate
)
This surfaces high-risk claims before they're submitted. A billing specialist can review the handful of high-risk claims while letting the hundreds of low-risk claims flow through automatically.
Early deployments show this reduces denial rates by 15-20% by catching issues proactively.
Specialty-Specific Rule Packs
Different specialties have different billing patterns and common denial types. The next generation of systems will ship with specialty-specific rule libraries:
Addiction Medicine Rule Pack:
Auto-populate ASAM (American Society of Addiction Medicine) levels of care
Validate buprenorphine dosing against DEA X-waiver limits
Track PDMP (Prescription Drug Monitoring Program) check requirements by state
Auto-document substance use disorder remission status based on visit notes
Neurology Rule Pack:
Validate PD medication combinations against formulary restrictions
Auto-code UPDRS (Unified Parkinson's Disease Rating Scale) scores when documented
Track DMT (disease-modifying therapy) eligibility for MS patients
Auto-generate ABN (Advanced Beneficiary Notice) when Medicare coverage uncertain
These rule packs encode specialty expertise that would otherwise require manual configuration.
Multi-Modal AI: Extracting from Visit Recordings
Today's AI extracts codes from clinical notes. Tomorrow's AI will extract directly from visit recordings:
Input: 32-minute video consultation recording
Output:
Transcribed conversation (with speaker labels)
Extracted diagnoses, treatments, and time segments
Auto-generated clinical note with key findings highlighted
Pre-populated claim with CPT/ICD-10 codes
This eliminates the "note-writing" step entirely. The provider conducts the visit, the AI generates all downstream documentation automatically.
The technical architecture:
Speech-to-text transcription (existing technology, HIPAA-compliant)
Speaker diarization (separate provider speech from patient speech)
Medical entity recognition (extract symptoms, diagnoses, treatments)
Clinical note generation (structure findings into standard SOAP format)
Code extraction (generate billing codes from structured note)
This is technically feasible today, but requires careful prompt engineering and validation. Early pilots show 85%+ accuracy on common visit types, with remaining 15% routed for human review.
Payer-Specific Fine-Tuning
Different payers have different documentation requirements, medical necessity criteria, and common denial patterns. Advanced systems will maintain payer-specific models:
# Load payer-specific model
if payer == "Medicare":
model = load_model("rcm-medicare-2025-q3")
elif payer == "Aetna":
model = load_model("rcm-aetna-2025-q3")
else:
model = load_model("rcm-generic-2025-q3")
# Payer-specific prompting
payer_context = get_payer_guidelines(payer, procedure)
prompt = f"""
{payer_context}
Given these payer-specific requirements, extract codes from:
{clinical_note}
"""
This dramatically improves accuracy by incorporating payer-specific business logic directly into the AI reasoning process.
Continuous Learning from Adjudication Outcomes
The ultimate goal: systems that improve themselves by learning from real-world outcomes.
# After 835 ERA (remittance) received
if claim.outcome == "paid_in_full":
# Positive reinforcement
boost_confidence(
clinical_pattern=claim.clinical_note_embedding,
code_combination=claim.get_all_codes(),
delta=+0.03
)
elif claim.outcome == "denied":
# Negative reinforcement
reduce_confidence(
clinical_pattern=claim.clinical_note_embedding,
code_combination=claim.get_all_codes(),
delta=-0.05
)
# Update denial playbook
update_playbook(
denial_code=claim.denial_carc,
claim_features=claim.get_features(),
successful_resolution=claim.resolution_strategy
)
Over time, the system learns:
Which clinical patterns reliably map to which codes
Which code combinations are accepted vs rejected by each payer
Which denial resolution strategies work for different denial types
This creates a flywheel: more claims processed → better training data → higher accuracy → more claims auto-processed.
Why Foresight's Hybrid Approach Works
At Foresight, we've spent thousands of hours with telehealth providers understanding what actually breaks in RCM automation. The pattern is consistent:
Pure rules-based systems work beautifully until they encounter any variability:
Notes written in conversational language instead of formal medical terminology
Diagnosis implied by treatment rather than explicitly stated
Multi-specialty practices where visit types vary significantly
They hit a ceiling around 60-70% automation rate, leaving massive manual workload.
Pure AI systems promise magical end-to-end automation but create new problems:
Unexplainable code selections that fail audits
Inconsistent handling of straightforward cases
"Hallucinated" codes that don't exist in code sets
No clear mechanism for human oversight
They achieve impressive demos but fail in production when compliance matters.
Our hybrid approach uses each technology where it excels:
Deterministic rules for:
✓ Telehealth modifiers (POS 10 + Mod 95)
✓ Time-based E/M coding (documented minutes → CPT)
✓ Licensure validation (boolean logic)
✓ Required field validation (type checking)
✓ Timely filing checks (date arithmetic)
AI for:
✓ ICD-10 extraction from clinical notes
✓ CPT selection when documentation is narrative
✓ Prior auth clinical justification extraction
✓ Complex denial resolution suggestions
✓ Missing documentation identification
Human judgment for:
✓ Low-confidence AI extractions (confidence < threshold)
✓ Novel clinical scenarios not seen in training data
✓ Payer-specific edge cases
✓ Final approval on automated upcoding suggestions
This architecture achieves what neither approach can do alone:
92% first-pass acceptance rate (clean claims submitted correctly)
85% auto-handled claims (submitted without human touch)
<1 day average time to submission (from service date to clearinghouse)
$48 saved per claim (vs manual coding & resubmission)
But more importantly, it creates trust through transparency. Every automated decision carries provenance metadata showing exactly how it was made. Audit trails are bulletproof. Human oversight is built into the architecture, not bolted on as an afterthought.
The Bottom Line
The question isn't "rules or AI?" The question is: How do you architect a system that uses deterministic logic where possible, AI where necessary, and human judgment where required?
That's the art of modern RCM automation.
The providers who master this—who build or deploy systems with the right balance—will capture enormous value:
Revenue that would have leaked through coding errors or denials
Time returned to clinical teams instead of administrative work
Confidence that automation is helping, not introducing compliance risk
The providers who don't? They'll continue manually coding claims, chasing denials, and watching revenue slip through cracks in their billing cycle.
The technology exists. The safety mechanisms are understood. The only question is whether you're ready to implement automation that actually works—not because it replaces humans, but because it amplifies their judgment and handles the routine work they shouldn't waste time on.
That's the future of revenue cycle management. And it's not coming—it's already here.
Let's Talk
At Foresight, we've built the hybrid RCM automation system we wish existed when we were running telehealth operations. Rules for what's certain. AI for what requires understanding. Human judgment for what needs expertise.
Interested in seeing how this works for your practice?
Reach out: jj@have-foresight.com
We're working with specialty telehealth providers—neurology, addiction medicine, behavioral health—who are tired of losing revenue to broken billing cycles. If that's you, let's talk.