Methodology

The work behind the score.

Most scoring tools give you a number. FirstPass gives you the number and shows every moment that produced it — tagged, traceable, calibrated against human judgment. Here's how it works.

01The premise

Sales calls are hard to score because most of what matters happens between the lines.

A discovery call lasts 45 minutes and contains roughly 200 turns of speech. Most scoring tools collapse all of that into a single number — sometimes a letter grade, sometimes a 1-to-10 rating, occasionally a four-quadrant rubric.

The trouble is that the number doesn't tell the rep anything they can act on. A 6.4 isn't actionable. Strong discovery, weak close is closer, but still doesn't get to the actual move that worked or the one that didn't.

We started from the opposite direction. Instead of asking what's the score for this call, we asked what specifically happened in this call that's worth noticing? The answer turned out to be a lot of specific things — a strategic question asked, a buying signal missed, a price objection acknowledged but not diagnosed. We made those specific moments the unit of measurement.

The score follows from the tags. Not the other way around.

Everything below explains how that works — the taxonomy, how tags become a score, the role of human annotation, and what happens when a team needs the system tuned to their specific motion.

02The taxonomy

110 tags, 12 categories, both sides of the conversation.

70 sales-rep tags organized by call stage. 40 customer tags organized by what the customer is doing in response. The taxonomy is the same backbone for every call type — but tag weights and presence differ. A tradeshow chat doesn't get scored on Decision Criteria Explored the way a discovery call does.

Tags marked with a filled cyan dot are scorable — they move the score up or down. Tags marked with an empty dot are diagnostic — observed and recorded for context, but they don't move the score.

110

Tags total

Stage 1 — Introductions & rapport

7 tags · 4 scorable

Credibility Established Rapport Established Business-Relevant Rapport Tone Matched to Customer Participants & Roles Confirmed Excessive Small Talk Insufficient Rapport Building

Stage 2 — Transition to business

7 tags · diagnostic only

Purpose of Meeting Stated Agenda Set & Confirmed Time Check / Permission Obtained Compelling Reason to Engage Stated Natural Transition to Business Abrupt Transition to Business Weak or Absent Opening

Stage 3 — Discovery & information exchange

22 tags · 11 scorable

Current State Uncovered Desired Future State Uncovered Gap Impact Quantified Strategic Questions Asked Financial / Operational Questions Asked Open-Ended Question Layered / Follow-Up Question Question Demonstrating Business Acumen Value Linked to Uncovered Need Outcome-Based Value Statement ROI / Financial Justification Use of Evidence or Proof Point Competitive Differentiation Decision Maker Identified Influencer / Champion Identified Budget Discussed Timeline Discussed Decision Criteria Explored Premature Solution Offering Generic Value Statement Missed Discovery Opportunity Rep Dominated the Conversation

Stage 4 — Objection type

7 tags · diagnostic only

Price Competitor Timing / Urgency Authority Need Trust / Risk Status Quo

Stage 4 — Objection handling quality

8 tags · 5 scorable

Objection Acknowledged Objection Root Cause Diagnosed Strategic Reframing Applied Evidence-Based Response Effective Objection Resolution Defensive Response Objection Ignored Missed Objection

Stage 4 — Communication & presence

8 tags · 5 scorable

Executive Presence Confidence Under Pressure Clarity and Conciseness Active Listening Use of Storytelling Over-Talking / Long Monologue Filler Language Interruption

Stage 5 — Next steps & close

11 tags · 1 scorable

Customer Priorities Recapped Mutual Understanding Confirmed Mutual Action Plan Proposed Next Step Confirmed Next Step Not Established Timeline Commitment Secured Stakeholder Involvement Agreed Strong Close Weak Close No Clear Close Abrupt Ending

Customer · 40 tags across 5 stages3 scorable · 37 diagnostic

Stage 1 — Customer posture at opening

5 tags · diagnostic only

Customer Engaged at Outset Customer Distracted / Disengaged Customer Skeptical of Rep's Credibility Customer Warm / Familiar with Rep Customer Rushed / Time-Pressured

Stage 2 — Reception to setup

4 tags · diagnostic only

Customer Accepts Agenda Customer Redirects Agenda Customer Signals Strategic Interest Customer in Transactional Mode

Stage 3 — Engagement in discovery

14 tags · 2 scorable

Customer Shares Business Objectives Customer Reveals Current State Struggles Customer Articulates Desired Future State Customer Quantifies Business Impact Customer Asks Strategic Questions of Rep Customer Asks Only Tactical / Product Questions Customer Volunteers Competitive Information Customer Protective / Guarded Customer Confused by Rep Customer Challenges Rep's Knowledge Customer Mentions Incumbent Vendor Customer Expresses Status Quo Preference Customer Enthusiasm — Business Issue Customer Skepticism — Solution Fit

Stage 4 — Buying signals & objections

10 tags · 1 scorable

Positive Buying Signal Negative Buying Signal Customer Defers to Higher Authority Customer Requests Proof / Evidence Customer Requests Proposal or Pricing Customer Enthusiasm — Solution Customer Disengaged Mid-Call Customer Confusion About Solution Customer Actively Comparing Competitors Customer Price-Focused

Stage 5 — Commitment quality

7 tags · diagnostic only

Customer Confirms Next Step Customer Accepts Mutual Action Plan Customer Proposes Their Own Next Step Customer Vague on Commitment Customer Declines Next Step Customer Mentions Additional Stakeholders Customer Agreement — Mutual Understanding

The customer-side taxonomy is shorter and almost entirely diagnostic by design. The customer's behavior tells you whether the call is going somewhere — but it's the rep we're scoring. The exceptions are tags that act as outcome indicators: a customer sharing business objectives, asking strategic questions, or expressing genuine enthusiasm about the solution. Those are signals that something the rep did landed.

03How scoring works

Some tags move the score. Others sit alongside as context. Then it's calibrated.

Of the 110 tags, only 29 are scorable. The rest are diagnostic — observed and recorded, but they don't move the score up or down.

The reason is straightforward: not every observation is a quality signal. Open-Ended Question is descriptively useful but isn't itself a quality marker. A rep who asks ten open-ended questions — none strategic, none layered, none demonstrating business acumen — would score low on Discovery despite high open-ended-question volume. The tag is still applied, still useful for analysis, but it doesn't affect the score. Scorable tags are the moves a coach would notice: Strategic Question Asked, Layered Follow-Up Question, Gap Impact Quantified, Effective Objection Resolution.

Once tags are applied, the score is computed in 3 steps:

↓ How the score is computed

Tag weighting by call type

Each call type has its own weight profile. A formulary review weighs Use of Evidence or Proof Point heavily; a tradeshow chat barely touches it.

Stage normalization

Calls don't reach every stage. The score normalizes for stages actually present rather than penalizing the absence of stages that aren't relevant.

Calibration against ground truth

The aggregate is calibrated against thousands of human-annotated calls so a 4.0 means roughly the same thing across call types and over time.

The output is a number — but it's the traceable provenance of the number that matters. Every score links to the specific tags that produced it, and every tag links to the moment in the transcript it was applied to. Nothing is opaque. A rep whose Discovery score dropped from 4.2 to 3.6 can see exactly why: Strategic Questions Asked went from 6 to 2, Premature Solution Offering appeared, Decision Criteria Explored was missed.

Worth noting

Some categories are entirely diagnostic. Stage 2 (Transition to Business), Stage 4 (Objection Type), and most of Stage 5 (Next Steps) contain no scorable tags. These stages describe what happened in the call, not how well something was done. The quality assessment lives in the categories where there's a clear better-or-worse axis.

04Annotation

Humans calibrate the scoring engine. Continuously.

The reason most AI scoring is unreliable isn't that the AI is bad at recognizing tags — it's that nobody's checking the AI's work against ground truth. LLM-only scoring drifts. The same call scored twice on different days can come back differently. Without a feedback signal, the engine has no way to know it's wrong.

FirstPass maintains a continuously expanding corpus of calls annotated by humans — sales coaches, methodology specialists, and trained annotators — who tag every moment manually against the same taxonomy. The AI's tagging is then evaluated against that ground truth. Mismatches feed back into prompt engineering to handle ambiguous cases. This calibration loop runs continuously, which is why scoring accuracy improves over time rather than drifting.

An important distinction

Human annotation does not happen on every customer call. The annotation happens on a continuously growing calibration set used to ground the AI's tagging. Your team's practice sessions are scored by the calibrated AI engine — fast enough to deliver feedback in under a minute — while the underlying engine is anchored to human judgment. Every other AI scoring tool in the category has the same general architecture. The differentiator isn't the AI. It's whether the AI is calibrated against anything other than its own output.

05Co-definition

If your sales motion is unique, the taxonomy extends with you.

The 110-tag taxonomy covers the moves that recur across most B2B sales motions. For most teams, it's enough as-is.

For teams with specific methodologies — MEDDPICC, SPICED, Sandler, Challenger, and many more — the taxonomy extends. We work with your sales operations and enablement leads to map your methodology onto existing tags, identify gaps, and add tags that capture moves specific to your motion. Scoring weights get tuned alongside the taxonomy.

A typical co-definition engagement runs 3–5 weeks:

Discovery

Week 1

Review your existing methodology, scoring rubrics, top-rep call recordings, and field-leadership feedback. Identify moves that don't yet map to existing tags. Output: a gap analysis with proposed additions.

Tag extension

Week 1–2

Write new tags with explicit definitions, pass criteria, and example moments. Added to your team's instance of the taxonomy without affecting other customers.

AI persona tuning

Week 2–3

Calibrate the AI customer's behavior to push back the way your real customers do. Industry knowledge loaded in: the regulations they care about, the objections that actually slow deals down.

Annotation calibration

Week 3–4

Annotate a sample set of your team's existing calls against the extended taxonomy. This becomes the ground truth for the AI's tagging in your instance.

Rollout

Week 4–5

Reps practice against the calibrated system. The first two weeks include light review of scoring outputs to catch any drift — after that, the engine runs against your team's standard.

The outcome is FirstPass calibrated to how your team specifically sells. What each role sees day-to-day:

For the rep

After every call: a tagged transcript with strengths and gaps, coaching tied to specific moments, and a score that breaks down to specific tags. Over time, a personal trend — skills improving, patterns identified.

For the manager

Tag-level visibility into the team — which reps consistently miss Decision Criteria Explored, who's improving on objection handling, where coaching time will return the most.

For sales ops & enablement

Tag definitions, scoring weights, AI persona profiles, call type structures — all editable, all versionable. Co-definition produces documentation that becomes part of how your team understands what good looks like.

Want to see the taxonomy applied to your actual sales motion?

Your reps are developing their skills on live calls with real revenue on the line. There's a better way.

Find out if FirstPass is right for your team →