The honest answer for most health plans is no. Most audit only 2 to 5 percent of their member service call volume,1 whether the calls are handled in-house, outsourced, or run on a hybrid model. The other 95 percent, including the calls that drive complaint patterns, CAHPS results for Medicare Advantage plans, NCQA accreditation outcomes, and grievance filings, is operationally invisible. This is not because plans are negligent. It is because manual quality assurance, at any scale a health plan would consider economically viable, was never going to review more than a fraction of total volume.
The model was defensible when human review of every call was infeasible. A quality assurance team, almost always internal to the plan, listens to a random sample of recordings, grades each against a rubric covering greeting, identity verification, hold etiquette, closing, and compliance disclosures, and rolls the scores up into a monthly quality report. That constraint has now changed, and the math of what the sampled program does not see has gotten more expensive.
Why a 2 percent sample cannot tell you what you need to know
Three things break the sampling model in the real operating environment of a health plan. First, the sample is too small to surface the outliers that drive dissatisfaction: the prior authorization status miscommunication, the Spanish-speaking caller transferred three times, the appeals question a non-clinical agent improvised on. At 2 percent of a million calls a year, you are reviewing 20,000 calls, and the handful that mattered are statistically invisible.
Second, the scoring rubric measures script compliance, not member outcomes. An agent can hit every checkbox on a quality assurance form and leave the member with the wrong answer. Conversely, an agent can solve a complex eligibility question with empathy and skip half the script. The first call scores well, the second scores poorly, and member experience is the inverse.
Third, and most consequentially, the quality assurance score is structurally disconnected from the outcomes the plan actually cares about. A plan can run a compliance program for three years and still watch scores drift downward, because the program is measuring what is easy to grade, not what determines plan performance.
The financial stakes vary by line of business but point in the same direction. For Medicare Advantage plans, CAHPS measures were quadruple-weighted in the 2023 through 2025 Star Ratings and remain double-weighted from the 2026 Star Ratings forward,2 and QBP eligibility is binary at the four-star threshold. A half-star drop from 4.0 to 3.5 eliminates the entire 5 percent Quality Bonus Payment on the benchmark, several million dollars in annual revenue exposure on a mid-sized MA plan. For Medicaid managed care plans, the same call patterns drive access audit findings and capitation negotiations. For commercial and ACA plans, they drive NCQA accreditation outcomes and employer client retention.
What changes when you measure every call
Multilingual speech-to-text, conversation analysis, and intent classification now run reliably at full call volume across the languages CMS requires plans to support for LEP populations. When Member Experience covers 100 percent of member calls rather than a sample:
- You see the patterns that drive member experience scores before the next survey cycle comes back. Repeated transfers on prior authorization inquiries. Unresolved questions about formulary changes. Disconnects on appeals calls.
- You can connect call content to outcomes. A member who called three times in a month about the same issue is not a satisfied member, regardless of how each individual call scored. The pattern is only visible across the full population.
- You can detect compliance risk while it is still recoverable. A miscommunication about coverage that surfaces in week one of a transition can be corrected before it becomes a grievance, an appeal, or a CMS audit finding.
- You can act on multilingual experience as a measured outcome rather than a checkbox. Surveys rarely capture whether a Spanish-speaking member actually understood the answer they got. Conversation-level analysis can.
The relationship between surveys and conversation analysis is not a replacement. It is a stack. Member experience surveys tell the plan what the member experienced and how they felt about it. Call content tells the plan why. Surveys are the outcome layer that determines QBP eligibility for MA plans and accreditation status across other lines. Conversations are the operational layer that surfaces the specific behaviors, scripts, transfers, and miscommunications driving the outcome.
Six tests for a defensible call center Member Experience program
Member Services leaders evaluating how to upgrade their Member Experience program in 2026 can apply six tests. Each isolates a structural property that determines whether the program will actually measure member experience or merely produce a number. Most arrangements on the market today, including in-house teams running on legacy infrastructure and vendor-managed programs alike, will fail at least three of these tests.
CMS is already doing this. The question is whether you are.
Every year, CMS places test calls into the prospective beneficiary and current enrollee call centers of every Medicare Advantage and Part D plan in the country. The Accuracy and Accessibility Study measures whether plan representatives provide correct benefit information, whether qualified foreign-language interpreters are reachable for LEP callers, and whether TTY services function for deaf and hard-of-hearing members. Results feed directly into the Customer Service domain of the Star Ratings calculation, and low performance can trigger compliance action.3
CMS is, in other words, already running an independent Member Experience program on every MA plan's member-facing call center, because plan-administered quality scores alone have not been considered sufficient as a regulatory measurement substitute. But the agency's sample is tiny relative to total volume, and the protocol only measures what its test scenarios cover. Real-world failure modes that fall outside the protocol, including the prior authorization status miscommunication, the formulary question handled by an undertrained agent, and the appeals call that ends in a disconnect, are not in the data CMS sees.
What Mizzeto's Multilingual Member Experience Solution does
Mizzeto's Multilingual Member Experience Solution is built for health plans, not retrofitted from a generic CX platform, and is designed to pass all six tests above by default. Concretely:
- 100 percent analysis of member interactions, with AI-driven transcription and scoring that runs at full call volume rather than a sample.
- Native multilingual analysis across English, Spanish, Mandarin, Vietnamese, Tagalog, and the other languages CMS requires plans to support for LEP populations.
- Payer-configurable scoring rubrics the plan can update in days, with Member Experience scores linked directly to CAHPS measures, grievance volumes, appeal overturns, and disenrollment patterns.
- Plan ownership of the underlying conversation data, scoring models, and historical trend lines. Whether call handling, telephony, or transcription providers change, the member experience monitoring infrastructure stays with the plan.
- Real-time alerting and direct conversation-level access for Member Services and Compliance teams, so when a grievance trend, or audit finding requires explanation, the evidence is already in the plan's possession.
The bottom line
Member Experience measurement for health plans is at an inflection point. The sampling model that defined the last twenty years cannot survive the combination of heavily weighted member experience measures in the MA Star Ratings formula, parallel quality reporting requirements for Medicaid and ACA plans, and the simple fact that the technology to measure every call exists and is being deployed by competitors. If CMS itself has concluded plan-administered scores need an independent measurement layer, the rational response is to install one at scale, on the 95 percent of calls CMS does not sample, before CMS surfaces a finding the plan has to explain. Apply the six tests above to your current arrangement, and see how Mizzeto's Multilingual Member Experience Solution measures against them.
References
- 1. SQM Group. mySQM Auto QA published guidance on manual QA sampling rates. www.sqmgroup.com/software
- 2. Centers for Medicare and Medicaid Services. 2026 Medicare Part C and D Star Ratings Technical Notes. www.cms.gov/files/document/2026-star-ratings-technical-notes.pdf. CMS Medicare Advantage and Part D Final Rule reducing CAHPS and administrative measure weights from 4x to 2x effective 2026 Star Ratings.
- 3. Centers for Medicare and Medicaid Services. Part C and Part D Call Center Monitoring, Timeliness and Accuracy and Accessibility Studies. Annual guidance memoranda and HPMS performance reports. www.cms.gov





















