RCM Automation for Telehealth: Rules vs AI — Why the Best Systems Use Both

Oct 6

The sophisticated approach to revenue cycle automation isn't choosing between deterministic rules and AI. It's knowing exactly when to use each—and how to make them work together safely.

The False Binary That's Costing You Revenue

Every week, we hear the same question from telehealth providers: "Should we use rules-based automation or AI for our revenue cycle?"

It's the wrong question.

Here's why: A Parkinson's disease telehealth visit generates a predictable pattern. The provider is licensed in Texas. The patient is at home in Texas. The visit is synchronous via video. These facts don't require AI to verify—they're deterministic. If patient location = Texas AND provider license state = Texas AND visit type = video, then apply POS 10 (patient's home) + Modifier 95 (synchronous telemedicine). This rule will fire correctly 100% of the time.

But that same visit also has a 32-minute consultation note with unstructured clinical documentation. Extracting "G20.A1 - Parkinson's disease without dyskinesia" from narrative text like "Patient presents with pill-rolling tremor at rest, bradykinesia noted during finger tapping, cogwheel rigidity in upper extremities" absolutely requires AI. No rule can map that clinical description to the correct ICD-10 code with sub-classification.

The insight that drives modern RCM automation: These aren't competing approaches. They're complementary capabilities that solve fundamentally different problems.

When Rules Win: The Power of Deterministic Logic

Let's start with where rules-based automation excels—and why you should use it aggressively before reaching for AI.

The Telehealth Modifier Problem

Telehealth billing has exploded since 2020, but payers have been slow to adapt their adjudication systems. The result? A minefield of easily-preventable denials. CARC 96 ("Non-covered charge: POS inconsistent with procedure") is now one of the top denial codes for virtual care providers.

The fix is mechanical:

IF visit_type = "telehealth" AND patient_location = "home"
THEN set POS = 10 AND add modifier = 95
ELSE IF visit_type = "telehealth" AND patient_location = "clinic"
THEN set POS = 02 AND add modifier = 95

This is a perfect use case for rules. The inputs are structured (visit type is a categorical field in your EHR). The logic is deterministic (the relationship between visit modality, location, and billing codes never changes). The consequences of getting it wrong are severe (automatic denial, $48 average rework cost per claim). And most importantly, the rule will never make a judgment call—it either fires or it doesn't.

Time-Based E/M Coding

Evaluation and Management coding has evolved. When providers document time-based encounters, the CPT code selection is purely algorithmic:

if documented_time >= 40 minutes: code = 99215
elif documented_time >= 30 minutes: code = 99214  
elif documented_time >= 20 minutes: code = 99213
elif documented_time >= 10 minutes: code = 99212
else: code = 99211

There's zero ambiguity here. If your clinical note includes "Total time: 32 minutes," the correct code is 99214. Every time. No exceptions. Using AI to "predict" this code would be engineering malpractice—you're introducing unnecessary complexity and potential errors into a solved problem.

Interstate Licensure Validation

Here's another scenario where rules dominate: A neurology practice treats ALS patients across multiple states. Before auto-submitting a claim, you need to verify:

provider_licenses = get_active_licenses(provider_npi)
patient_state = encounter.patient.address.state

if patient_state in provider_licenses:
    validation_status = "PASS"
else:
    validation_status = "BLOCKED"
    route_to_credentialing_team()

This is pure boolean logic. The provider either holds an active license in the patient's state or they don't. There's no "confidence score" needed. The data is structured (license records in your credentialing system). The logic is unambiguous. And if you get it wrong, you've submitted a claim for a service that wasn't legally rendered—a compliance nightmare.

Why Rules Matter for Compliance

Rules-based automation has a superpower that AI can never fully replicate: perfect reproducibility and auditability.

When a payer audits your billing practices, you need to show exactly why each claim was coded the way it was. "Our AI model predicted this code with 87% confidence" is not a defensible answer. But "This claim triggered our telehealth edit rule R-102, which deterministically applies POS 10 + Modifier 95 to all home-based video visits" is bulletproof documentation.

This matters enormously for telehealth providers who are already under heightened scrutiny. CMS and commercial payers are actively auditing telehealth billing patterns. Rule-based logic provides an audit trail that AI simply cannot match.

When AI Shines: Handling Ambiguity at Scale

Now let's examine where AI becomes indispensable—the scenarios where deterministic rules hit their limits.

Clinical Documentation Extraction

Consider this snippet from a telemedicine addiction medicine consultation:

"Patient continues buprenorphine/naloxone 16mg daily. Reports stable mood, no cravings this week. Urine tox today negative for illicit substances. PDMP check shows no concerning fills. Patient describes improved sleep since starting trazodone. Denies suicidal ideation. Will continue current regimen."

What ICD-10 codes should this encounter carry?

A human coder would identify:

F11.20 - Opioid use disorder, uncomplicated (primary diagnosis)
Z71.89 - Counseling for substance use (documented counseling)
G47.00 - Insomnia, unspecified (sleep issue mentioned)

No rule can extract this. The clinical reasoning is implicit. The provider didn't write "diagnosis: opioid use disorder." They wrote clinical narrative that implies the diagnosis based on medication choice, monitoring pattern, and treatment trajectory.

This is where modern large language models excel. They can:

Understand medical terminology and context
Recognize symptom-diagnosis relationships
Extract relevant facts even when phrased conversationally
Map clinical descriptions to standardized code sets

But—and this is critical—they do so probabilistically, not deterministically.

Handling Variability in Clinical Documentation

Real clinical notes are messy. Some providers are verbose, others are terse. Some use formal medical terminology, others write conversationally. Some include explicit diagnosis statements, others bury diagnostic information in treatment plans.

Example variations for the same condition:

Verbose: "Patient presents with bilateral lower extremity edema, orthopnea, paroxysmal nocturnal dyspnea, and fatigue. JVD noted on exam. CXR shows pulmonary congestion. BNP elevated at 892. Echocardiogram reveals reduced LVEF of 35%. Diagnosis: Heart failure with reduced ejection fraction."

Terse: "HFrEF exacerbation. EF 35%. Increase furosemide to 40mg BID."

Conversational: "Patient's heart failure is acting up again. His EF dropped to 35% on the latest echo. Legs are really swollen. Bumping up his diuretic."

All three describe the same condition (I50.21 - Acute systolic heart failure), but no static rule can reliably extract it from all three formats. You need semantic understanding, not pattern matching.

This is where AI proves invaluable: it can handle the infinite variability of human language while still maintaining reasonable accuracy across different documentation styles.

Prior Authorization Clinical Justification

Electronic prior authorization has a particularly thorny extraction problem: supporting clinical evidence.

For a buprenorphine/naloxone PA, payers typically require:

Diagnosis code confirming opioid use disorder
Previous treatment attempts (documenting medical necessity)
Current assessment of stability

The first is often explicitly stated. But the second two? They're buried in free text:

"Patient initially tried methadone maintenance at outside clinic 2019-2020 but had difficulty with daily witnessed dosing due to work schedule. Transitioned to buprenorphine in 2021 with significant improvement in adherence and social functioning."

An AI can extract "previous therapy: methadone (failed due to adherence barriers)" and "current status: stable on buprenorphine with good compliance." These aren't keyword matches—they require understanding temporal relationships, causal reasoning, and clinical context.

This extraction unlocks enormous value: instead of billing specialists manually reading notes and copy-pasting information into PA forms (90+ minutes per PA), the system can auto-populate 80-90% of fields and route only ambiguous cases for human review.

The Hybrid Architecture: Engineering Trust Through Layered Safety

Here's where sophistication enters the picture. The best RCM automation systems don't just "use AI sometimes and rules sometimes." They architect multi-layered validation that provides defense-in-depth against errors.

Layer 1: Provenance Labeling

Every automated decision must carry metadata about how it was made:

{
  "field": "icd10",
  "value": "G20.A1",
  "confidence": 0.88,
  "provenance": "LLM",
  "source_text": "Parkinson's disease without dyskinesia (line 12-14)",
  "extraction_timestamp": "2025-10-06T14:23:11Z"
}

Compare that to a rule-based decision:

{
  "field": "modifier",
  "value": "95",
  "confidence": 1.0,
  "provenance": "Rule",
  "rule_id": "R-TH-001",
  "rule_description": "Telehealth synchronous: add Modifier 95",
  "execution_timestamp": "2025-10-06T14:23:09Z"
}

Notice the critical difference: Provenance labeling makes the decision-making process auditable. A billing specialist can see immediately whether a value came from deterministic logic (confidence = 1.0, provenance = Rule) or probabilistic extraction (confidence = 0.88, provenance = LLM). This transparency builds trust and enables intelligent human review.

In our system at Foresight, we use three provenance tags:

Rule: Deterministic logic applied to structured data
LLM: AI extraction from unstructured clinical text
Denial Playbook: Automated fix derived from payer denial patterns

Each carries different implications for review requirements.

Layer 2: Confidence-Based Routing

Not all AI predictions are created equal. A well-engineered system needs to route low-confidence decisions to humans while auto-processing high-confidence cases.

Here's the architecture:

# Field-level confidence scores
field_confidences = {
    "cpt": 0.94,
    "icd10_primary": 0.88,  
    "icd10_secondary": 0.63,  # LOW
    "pos": 0.98,
    "modifier_95": 0.91
}

# Aggregate confidence (weighted average)
aggregate_confidence = calculate_aggregate(field_confidences)

# Threshold-based routing  
CONFIDENCE_THRESHOLD = 0.88

if aggregate_confidence >= CONFIDENCE_THRESHOLD:
    route_to_auto_submission()
else:
    route_to_human_review(
        reason="Low confidence on icd10_secondary",
        suggested_value="Z71.89",
        confidence=0.63
    )

This creates an intelligent triage system. In our production deployments, we're seeing:

85% of claims auto-submitted (high confidence across all fields)
12% require human review (one or more low-confidence fields)
3% blocked (missing required data or credentialing issues)

The key insight: You can tune the confidence threshold to match your risk tolerance. Conservative practices can set threshold = 0.95 and route more items to review. Aggressive practices can set threshold = 0.80 and maximize automation. The system adapts to your operational priorities.

Layer 3: Type Checking and Validation

Even when AI extraction has high confidence, you need downstream validation. This catches LLM hallucinations and formatting errors:

def validate_icd10(code: str, confidence: float) -> ValidationResult:
    """Multi-layer validation of extracted ICD-10 codes"""
    
    # Layer 1: Format validation
    if not re.match(r'^[A-Z]\d{2}\.?\d{0,2}[A-Z]?$', code):
        return ValidationResult(
            valid=False,
            reason="Invalid ICD-10 format",
            provenance="Type Check"
        )
    
    # Layer 2: Code set membership  
    if code not in VALID_ICD10_CODES:
        return ValidationResult(
            valid=False,
            reason="Code not in ICD-10-CM 2025 code set",
            provenance="Code Set Validation"
        )
    
    # Layer 3: Clinical plausibility checks
    # Example: If procedure is neurology consult, 
    # flag if primary diagnosis is obstetric code
    if is_clinical_mismatch(code, procedure_codes):
        return ValidationResult(
            valid=True,  # Don't auto-reject
            needs_review=True,
            reason="Diagnosis-procedure mismatch detected",
            confidence_penalty=-0.15
        )
    
    return ValidationResult(valid=True)

This catches multiple error modes:

Format errors: "G20A1" → should be "G20.A1"
Hallucinated codes: "G20.99" → doesn't exist in ICD-10-CM
Clinical implausibility: Obstetric code on male patient

Each validation layer provides a safety net. Even if the LLM produces a malformed output, type checking prevents it from reaching the clearinghouse.

Layer 4: Few-Shot Prompting for Domain Specificity

Generic AI models aren't optimized for medical coding. Few-shot prompting grounds the model in domain-specific patterns:

You are a medical coding assistant specializing in telehealth neurology.

Examples of correct ICD-10 extraction:

Input: "Patient with long-standing Parkinson's disease, no dyskinesia noted"
Output: G20.A1 (Parkinson's disease without dyskinesia)

Input: "ALS patient, bulbar onset, significant dysphagia"  
Output: G12.21 (Amyotrophic lateral sclerosis)

Input: "Mild cognitive impairment, likely early Alzheimer's"
Output: F06.7 (Mild neurocognitive disorder without behavioral disturbance)

Now extract ICD-10 from this note:
[ACTUAL CLINICAL NOTE]

Output only valid ICD-10 codes with brief justification.

Few-shot prompting dramatically improves accuracy on domain-specific tasks. In our testing with specialty neurology practices, we saw:

Without few-shot prompting: 73% accuracy on sub-classified ICD-10 codes
With few-shot prompting: 89% accuracy on the same task

The improvement comes from grounding the model in specialty-specific patterns rather than expecting it to generalize from its broad pre-training.

Layer 5: LLM-as-a-Judge for Quality Control

For critical decisions, a second AI can validate the first AI's output:

def validate_with_judge(
    extracted_code: str,
    clinical_text: str,
    extraction_confidence: float
) -> JudgmentResult:
    """Use a second LLM to validate code extraction quality"""
    
    judge_prompt = f"""
    A medical coding system extracted the ICD-10 code {extracted_code} 
    from the following clinical text:
    
    {clinical_text}
    
    Is this code appropriate? Respond with:
    - AGREE: Code is correct  
    - DISAGREE: Code is incorrect (provide correct code)
    - UNCERTAIN: Cannot determine from available information
    
    Explain your reasoning.
    """
    
    judge_response = llm.complete(judge_prompt)
    
    if judge_response.verdict == "AGREE":
        return JudgmentResult(
            validated=True,
            confidence_boost=+0.05
        )
    elif judge_response.verdict == "DISAGREE":
        return JudgmentResult(
            validated=False,
            alternate_code=judge_response.suggested_code,
            route_to_human=True
        )
    else:  # UNCERTAIN
        return JudgmentResult(
            validated=False,
            route_to_human=True,
            reason="Judge uncertain, requires specialist review"
        )

This creates a consensus mechanism. When two independent AI systems agree on an extraction, confidence increases. When they disagree, the case routes to human review. This is particularly valuable for complex multi-code scenarios where no single ground truth exists.

Layer 6: Complementary Keyword Matching

AI excels at semantic understanding, but simple keyword matching still has a role for high-value flags:

CRITICAL_KEYWORDS = {
    "suicidal": {"flag": "psychiatric_emergency", "block_auto_submit": True},
    "suspected abuse": {"flag": "mandated_reporting", "block_auto_submit": True},
    "terminal": {"flag": "hospice_consideration", "review_level": "clinical"},
    "experimental": {"flag": "coverage_risk", "review_level": "billing"}
}

def scan_for_critical_flags(clinical_text: str) -> List[Flag]:
    """Keyword-based safety net for high-risk scenarios"""
    flags = []
    
    for keyword, metadata in CRITICAL_KEYWORDS.items():
        if keyword.lower() in clinical_text.lower():
            flags.append(Flag(
                keyword=keyword,
                **metadata
            ))
    
    return flags

This catches edge cases that AI might miss or contextualize incorrectly. If a note mentions "suicidal ideation," the claim should route through clinical review regardless of AI confidence scores. Simple keyword matching provides an additional safety layer.

The Denial Resolution Loop: Where Hybrid Automation Gets Sophisticated

Here's where truly advanced systems differentiate themselves: automated denial remediation that combines historical pattern matching (rules) with adaptive problem-solving (AI).

Building Denial Playbooks

After processing thousands of claims, patterns emerge in payer denial codes. These patterns can be codified:

DENIAL_PLAYBOOKS = {
    "CARC-96": {
        "description": "Non-covered: POS inconsistent with Mod 95",
        "common_causes": [
            "Missing Modifier 95 on telehealth claim",
            "POS 11 (office) used instead of POS 10 (home)",
            "Modifier 95 present but visit not actually telehealth"
        ],
        "auto_fix": {
            "conditions": [
                "visit_type == 'telehealth'",
                "patient_location == 'home'"
            ],
            "actions": [
                "set_pos(10)",
                "add_modifier(95)",
                "add_note('Video visit conducted per telehealth guidelines')"
            ]
        },
        "confidence_threshold": 0.95
    },
    
    "CARC-197": {
        "description": "Precertification/authorization absent",
        "auto_fix": {
            "conditions": [
                "epa_status == 'approved'",
                "epa_number IS NOT NULL"  
            ],
            "actions": [
                "attach_auth_number(epa.pa_number)",
                "add_note(f'PA #{epa.pa_number} approved {epa.approved_date}')"
            ]
        },
        "confidence_threshold": 0.99
    }
}

These playbooks enable automatic resubmission for common denial types. CARC 96 (POS mismatch)? The system knows to check if it's a telehealth visit, apply the correct modifiers, and resubmit. CARC 197 (missing auth)? The system links to the approved ePA record, attaches the authorization number, and resubmits.

In production, this translates to:

68% of denials auto-resolved on first resubmission attempt
$0 human time for straightforward denial types
<24 hour turnaround from denial receipt to corrected resubmission

When Playbooks Aren't Enough: AI-Assisted Resolution

Some denials are novel or complex. CARC 11 ("Diagnosis inconsistent with procedure") doesn't have a one-size-fits-all fix. The issue could be:

Wrong procedure code for the documented diagnosis
Missing secondary diagnosis that justifies the procedure
Clinically appropriate but poorly documented in the note

This requires reasoning, not pattern matching:

def resolve_complex_denial(
    claim: Claim,
    denial_code: str,
    denial_reason: str
) -> ResolutionStrategy:
    """Use AI to suggest fixes for non-standard denials"""
    
    # Pull relevant context
    clinical_note = claim.encounter.clinical_note
    original_codes = claim.get_all_codes()
    payer_guidelines = fetch_payer_policy(claim.payer, denial_code)
    
    # Construct reasoning prompt
    prompt = f"""
    Analyze this claim denial and suggest resolution:
    
    Denial: {denial_code} - {denial_reason}
    Procedure codes: {original_codes['cpt']}
    Diagnosis codes: {original_codes['icd10']}
    
    Clinical documentation:
    {clinical_note}
    
    Payer guidelines:
    {payer_guidelines}
    
    Suggest:
    1. Most likely cause of denial
    2. Recommended code changes  
    3. Supporting documentation needed
    4. Confidence in resolution strategy
    """
    
    ai_suggestion = llm.complete(prompt)
    
    if ai_suggestion.confidence >= 0.85:
        return ResolutionStrategy(
            suggested_fixes=ai_suggestion.fixes,
            route_to_human=False,
            auto_resubmit=True
        )
    else:
        return ResolutionStrategy(
            suggested_fixes=ai_suggestion.fixes,
            route_to_human=True,
            reason="AI uncertain, requires billing specialist review"
        )

This creates an intelligent escalation path:

Try deterministic playbook (if denial code matches a known pattern)
If no playbook exists, invoke AI reasoning
If AI confidence is high, auto-resubmit with suggested fixes
If AI confidence is low, route to human with AI suggestions as starting point

The human always has final say on ambiguous cases, but the AI dramatically reduces the cognitive load by narrowing the problem space.

Real-World Example: Processing a Telehealth Neurology Claim

Let's walk through a complete claim lifecycle to see how rules and AI work together.

Initial Encounter Data

Patient: Jane Doe, TX resident
Provider: Dr. Smith, Neurologist (licensed: TX, FL, CA)
Visit: 32-minute video consultation for Parkinson's disease
Clinical note (excerpt):

"Video follow-up for PD management. Patient reports stable tremor control on carbidopa/levodopa 25/100 TID. No dyskinesia noted during examination. UPDRS motor score 18 (improved from 24 at last visit). Continue current regimen. Next follow-up 3 months."

Step 1: Rule-Based Pre-Processing

The system first applies deterministic rules:

# Telehealth validation
visit_type = "video_consultation"  # from EHR
patient_location = "home"  # from encounter metadata
→ RULE: Set POS = 10, Add Modifier = 95 ✓ (confidence: 1.0)

# Time-based E/M coding  
documented_time = 32  # minutes, extracted from note
→ RULE: CPT = 99214 (30-39 minutes) ✓ (confidence: 1.0)

# Licensure validation
provider_state_licenses = ["TX", "FL", "CA"]  
patient_state = "TX"
→ RULE: Licensure match ✓ (confidence: 1.0)

# Service date check (claims must be submitted within 30 days for this payer)
days_since_service = 3
→ RULE: Within timely filing window ✓ (confidence: 1.0)

Four critical fields are now set with perfect confidence. No human review needed. No AI required.

Step 2: AI Extraction from Clinical Documentation

Now the system invokes AI for unstructured data:

# ICD-10 extraction  
clinical_text = "... stable tremor control... no dyskinesia noted... UPDRS motor score 18..."

llm_extraction = extract_icd10(clinical_text)
→ Primary: G20.A1 (Parkinson's disease without dyskinesia)
   Confidence: 0.88
   Provenance: "LLM"
   Source: "dyskinesia noted during examination" (line 2)

# Validate against code set
validate_code("G20.A1")
→ ✓ Valid ICD-10-CM code
→ ✓ Clinically consistent with documented procedure (99214 neurology visit)

The AI extracted the correct code with good confidence. The downstream validation confirms it's a real code and clinically plausible.

Step 3: Aggregate Confidence Check

field_confidences = {
    "pos": 1.0,           # Rule
    "modifier_95": 1.0,   # Rule  
    "cpt": 1.0,           # Rule
    "licensure": 1.0,     # Rule
    "icd10_primary": 0.88 # LLM
}

aggregate_confidence = weighted_average(field_confidences)
→ 0.95

THRESHOLD = 0.88
0.95 >= 0.88 → AUTO-SUBMIT ✓

Despite one field being AI-extracted, the aggregate confidence exceeds the threshold. The claim auto-submits.

Step 4: Claim Submission & 277CA Response

submit_claim(claim_id="CLM-8847")
→ Status: submitted
→ Listening for 277CA acknowledgment...

# 18 hours later
receive_277ca(claim_id="CLM-8847")
→ Status: accepted_277ca
→ Expected remittance: 7-10 days

Success. The claim was accepted on first submission. No human touched it.

Step 5: Learning Loop

# Log successful auto-submission
log_outcome(
    claim_id="CLM-8847",
    outcome="accepted_first_pass",
    confidence_at_submit=0.95,
    fields_used={
        "rule_based": ["pos", "modifier_95", "cpt", "licensure"],
        "ai_extracted": ["icd10_primary"]
    }
)

# Update confidence calibration
if accepted_first_pass:
    boost_confidence(field="icd10_primary", delta=+0.02)

The system learns from successful submissions to gradually increase confidence in its AI extractions for similar clinical patterns.

Alternative: Low Confidence Scenario

Now imagine a different clinical note:

"Patient with movement disorder, unclear etiology. May be PD vs essential tremor vs MSA. Plan: trial of carbidopa/levodopa, neuroimaging pending."

llm_extraction = extract_icd10(clinical_text)
→ Primary: R25.1 (Tremor, unspecified)
   Confidence: 0.71
   Reason: "Diagnosis uncertain in documentation"

validate_with_judge(code="R25.1", clinical_text=note)
→ UNCERTAIN ("Could be R25.1 or G20, insufficient information")

aggregate_confidence = 0.79
0.79 < 0.88 → ROUTE TO HUMAN REVIEW

surface_to_human(
    reason="Low confidence on primary diagnosis",
    ai_suggestion="R25.1 (Tremor, unspecified)",
    alternate_options=["G20 (Parkinson's disease)", "G25.0 (Essential tremor)"],
    clinical_context="Diagnosis not yet established per note"
)

The system correctly recognizes diagnostic ambiguity and routes to a human specialist. The AI suggestions give the human a starting point, but the final coding decision remains with someone who can apply clinical judgment.

This is the hybrid approach working as designed: automate the clear cases, escalate the ambiguous ones.

The Art of the Possible: What's Coming Next

The frontier of RCM automation isn't about replacing human judgment with AI. It's about building systems that amplify human expertise while handling routine work autonomously.

Predictive Denial Prevention

Current state: React to denials after they occur.

Near future: Predict denials before submission using historical patterns.

def predict_denial_risk(claim: Claim) -> DenialRiskScore:
    """Predict likelihood of denial based on historical patterns"""
    
    # Feature engineering
    features = {
        "payer": claim.payer,
        "procedure": claim.cpt,
        "diagnosis": claim.icd10_primary,
        "provider_specialty": claim.provider.specialty,
        "patient_state": claim.patient.state,
        "claim_amount": claim.charge_amount,
        "days_since_service": (today - claim.service_date).days
    }
    
    # Check historical denial patterns  
    historical_denials = query_denials(
        payer=features["payer"],
        procedure=features["procedure"],
        diagnosis=features["diagnosis"]
    )
    
    if historical_denials.count >= 3:
        denial_rate = historical_denials.denied / historical_denials.total
        
        if denial_rate >= 0.25:  # 25%+ denial rate
            return DenialRiskScore(
                risk="HIGH",
                predicted_carc=historical_denials.most_common_carc,
                recommendation="Review before submission",
                historical_denial_rate=denial_rate
            )

This surfaces high-risk claims before they're submitted. A billing specialist can review the handful of high-risk claims while letting the hundreds of low-risk claims flow through automatically.

Early deployments show this reduces denial rates by 15-20% by catching issues proactively.

Specialty-Specific Rule Packs

Different specialties have different billing patterns and common denial types. The next generation of systems will ship with specialty-specific rule libraries:

Addiction Medicine Rule Pack:

Auto-populate ASAM (American Society of Addiction Medicine) levels of care
Validate buprenorphine dosing against DEA X-waiver limits
Track PDMP (Prescription Drug Monitoring Program) check requirements by state
Auto-document substance use disorder remission status based on visit notes

Neurology Rule Pack:

Validate PD medication combinations against formulary restrictions
Auto-code UPDRS (Unified Parkinson's Disease Rating Scale) scores when documented
Track DMT (disease-modifying therapy) eligibility for MS patients
Auto-generate ABN (Advanced Beneficiary Notice) when Medicare coverage uncertain

These rule packs encode specialty expertise that would otherwise require manual configuration.

Multi-Modal AI: Extracting from Visit Recordings

Today's AI extracts codes from clinical notes. Tomorrow's AI will extract directly from visit recordings:

Input: 32-minute video consultation recording
Output:

Transcribed conversation (with speaker labels)
Extracted diagnoses, treatments, and time segments
Auto-generated clinical note with key findings highlighted
Pre-populated claim with CPT/ICD-10 codes

This eliminates the "note-writing" step entirely. The provider conducts the visit, the AI generates all downstream documentation automatically.

The technical architecture:

Speech-to-text transcription (existing technology, HIPAA-compliant)
Speaker diarization (separate provider speech from patient speech)
Medical entity recognition (extract symptoms, diagnoses, treatments)
Clinical note generation (structure findings into standard SOAP format)
Code extraction (generate billing codes from structured note)

This is technically feasible today, but requires careful prompt engineering and validation. Early pilots show 85%+ accuracy on common visit types, with remaining 15% routed for human review.

Payer-Specific Fine-Tuning

Different payers have different documentation requirements, medical necessity criteria, and common denial patterns. Advanced systems will maintain payer-specific models:

# Load payer-specific model
if payer == "Medicare":
    model = load_model("rcm-medicare-2025-q3")
elif payer == "Aetna":
    model = load_model("rcm-aetna-2025-q3")
else:
    model = load_model("rcm-generic-2025-q3")

# Payer-specific prompting
payer_context = get_payer_guidelines(payer, procedure)
prompt = f"""
{payer_context}

Given these payer-specific requirements, extract codes from:
{clinical_note}
"""

This dramatically improves accuracy by incorporating payer-specific business logic directly into the AI reasoning process.

Continuous Learning from Adjudication Outcomes

The ultimate goal: systems that improve themselves by learning from real-world outcomes.

# After 835 ERA (remittance) received
if claim.outcome == "paid_in_full":
    # Positive reinforcement
    boost_confidence(
        clinical_pattern=claim.clinical_note_embedding,
        code_combination=claim.get_all_codes(),
        delta=+0.03
    )
elif claim.outcome == "denied":
    # Negative reinforcement  
    reduce_confidence(
        clinical_pattern=claim.clinical_note_embedding,
        code_combination=claim.get_all_codes(),
        delta=-0.05
    )
    
    # Update denial playbook
    update_playbook(
        denial_code=claim.denial_carc,
        claim_features=claim.get_features(),
        successful_resolution=claim.resolution_strategy
    )

Over time, the system learns:

Which clinical patterns reliably map to which codes
Which code combinations are accepted vs rejected by each payer
Which denial resolution strategies work for different denial types

This creates a flywheel: more claims processed → better training data → higher accuracy → more claims auto-processed.

Why Foresight's Hybrid Approach Works

At Foresight, we've spent thousands of hours with telehealth providers understanding what actually breaks in RCM automation. The pattern is consistent:

Pure rules-based systems work beautifully until they encounter any variability:

Notes written in conversational language instead of formal medical terminology
Diagnosis implied by treatment rather than explicitly stated
Multi-specialty practices where visit types vary significantly

They hit a ceiling around 60-70% automation rate, leaving massive manual workload.

Pure AI systems promise magical end-to-end automation but create new problems:

Unexplainable code selections that fail audits
Inconsistent handling of straightforward cases
"Hallucinated" codes that don't exist in code sets
No clear mechanism for human oversight

They achieve impressive demos but fail in production when compliance matters.

Our hybrid approach uses each technology where it excels:

Deterministic rules for:

✓ Telehealth modifiers (POS 10 + Mod 95)
✓ Time-based E/M coding (documented minutes → CPT)
✓ Licensure validation (boolean logic)
✓ Required field validation (type checking)
✓ Timely filing checks (date arithmetic)

AI for:

✓ ICD-10 extraction from clinical notes
✓ CPT selection when documentation is narrative
✓ Prior auth clinical justification extraction
✓ Complex denial resolution suggestions
✓ Missing documentation identification

Human judgment for:

✓ Low-confidence AI extractions (confidence < threshold)
✓ Novel clinical scenarios not seen in training data
✓ Payer-specific edge cases
✓ Final approval on automated upcoding suggestions

This architecture achieves what neither approach can do alone:

92% first-pass acceptance rate (clean claims submitted correctly)
85% auto-handled claims (submitted without human touch)
<1 day average time to submission (from service date to clearinghouse)
$48 saved per claim (vs manual coding & resubmission)

But more importantly, it creates trust through transparency. Every automated decision carries provenance metadata showing exactly how it was made. Audit trails are bulletproof. Human oversight is built into the architecture, not bolted on as an afterthought.

The Bottom Line

The question isn't "rules or AI?" The question is: How do you architect a system that uses deterministic logic where possible, AI where necessary, and human judgment where required?

That's the art of modern RCM automation.

The providers who master this—who build or deploy systems with the right balance—will capture enormous value:

Revenue that would have leaked through coding errors or denials
Time returned to clinical teams instead of administrative work
Confidence that automation is helping, not introducing compliance risk

The providers who don't? They'll continue manually coding claims, chasing denials, and watching revenue slip through cracks in their billing cycle.

The technology exists. The safety mechanisms are understood. The only question is whether you're ready to implement automation that actually works—not because it replaces humans, but because it amplifies their judgment and handles the routine work they shouldn't waste time on.

That's the future of revenue cycle management. And it's not coming—it's already here.

Let's Talk

At Foresight, we've built the hybrid RCM automation system we wish existed when we were running telehealth operations. Rules for what's certain. AI for what requires understanding. Human judgment for what needs expertise.

Interested in seeing how this works for your practice?

Reach out: jj@have-foresight.com

We're working with specialty telehealth providers—neurology, addiction medicine, behavioral health—who are tired of losing revenue to broken billing cycles. If that's you, let's talk.

Jose Juan Martin Quesada