IEEE SLT 2026 Challenge

BeTraC

Beyond Transcription Challenge

Can end-to-end audio models reason faithfully over long clinical conversations — without ever producing a transcript?BeTraC challenges participants to generate structured SOAP notes directly from raw doctor-patient audio.

🤗 Access Dataset on Hugging Face Learn More →
8,800 conv.
Total Conversations
~1,329 hrs
Total Audio
~9 min
Avg. Duration
2
Competition Tracks
66
Ambient Sound Classes

Bridging the Gap in
End-to-End Audio Reasoning

On Synth-DoPaCo, E2E models exhibit hallucination rates of 99–100% versus 21–23% for cascaded ASR pipelines — despite identical architectures. BeTraC targets this gap directly.

Standardized Evaluation

A reproducible benchmark for long-form E2E audio reasoning in clinical settings.

Quantify the Gap

Measure the delta between E2E and cascaded systems under controlled conditions.

Stimulate Research

Drive new architectures and training strategies that close the hallucination gap.

Assess Generalization

Test transfer from synthetic to real clinical audio via post-competition evaluation.


Two Tracks,
One Hard Rule

Both tracks require open-weight models only and share the same constraint: no intermediate transcription.
Systems must reason directly from raw audio to structured SOAP notes.

Lightweight Track
≤ 3B Parameters
  • Open-weight models up to 3B parameters
  • Direct E2E audio-to-SOAP pipeline
  • No tool use or agentic pipelines
  • No intermediate transcription at any stage
  • 📌 Baseline: Qwen2.5-Omni-3B
🔬
Heavyweight Track
≤ 30B Parameters
  • Open-weight models up to 30B parameters
  • Tool use & agentic architectures allowed
  • Chain-of-thought agents, RAG pipelines
  • No intermediate transcription at any stage
  • 📌 Baseline: Best of Qwen2.5-Omni-3B / Qwen3-Omni-30B variants
Reference Topline: A cascaded system (Whisper Large V3 + Qwen3-32B-Thinking) will be reported as an upper-bound reference. Cascaded systems are not eligible for competition ranking.

Precise Rules &
Eligibility

The following rules apply to all BeTraC submissions. Detailed guidelines will be published on the challenge website before the April 2 data release.

🚫 Hard Constraints Ineligible if violated
  • No intermediate transcription at any pipeline stage — including chain-of-thought steps or tool outputs.
  • Open-weight models only. Proprietary API-based models (GPT-4o, Gemini, etc.) are not permitted.
  • Parameter cap enforced per track. Lightweight: ≤ 3B. Heavyweight: ≤ 30B. For MoE models, total (not active) parameter count applies.
  • No tool use in Lightweight Track. Agentic architectures are Heavyweight-only.
  • No use of withheld test labels at any stage.
✅ Allowed & Required
  • Fine-tuning on the provided training split (7,200 conversations) is permitted.
  • Dev set may be used freely for model selection and tuning.
  • System description paper required for all ranked submissions (due July 8, 2026).
  • Multiple submissions per team are allowed; the best result counts.
  • External data is subject to the whitelist policy below. Proposals due May 4.

📋 Submission Format
  • Plain-text SOAP note with clearly labeled sections: S, O, A, P.
  • Each audio file must be processed independently — no cross-file context.
⚠️  Under Discussion — Model & Data Whitelist Policy

Whether to operate a formal whitelist of approved models and/or datasets is still being finalized by the organizing committee. This section will be updated once a decision is reached. Participants are encouraged to check back before the training data release on April 2, 2026, or contact the organizers directly.

Approved Models
TBD — whitelist policy under discussion
Approved External Datasets
TBD — whitelist policy under discussion
📬  Questions & Registration

For questions about eligibility, rule clarifications, or to register your team's intent to participate, contact the organizing committee at betrac@googlegroups.com. Full submission instructions will be published alongside the data release on April 2, 2026.


Synth-DoPaCo 📄 Paper

Fully synthetic doctor-patient conversations generated with open-weight, permissively licensed models. Speaker identities are strictly disjoint across splits. Audio features two speakers, 66 ambient sound classes, room reverberation, and Opus compression artifacts.

🤗  View on Hugging Face

✍🏻 Evaluation Metrics

Primary
Open Medical Concept F1
MeSH keyword matching + NER via scispaCy
⚙ btc-eval harness
Secondary
ROUGE F1
R-2, R-3, R-L against reference notes
⚙ btc-eval harness

Post-Competition Analysis (Top 5 per Track)

LLM-as-a-Judge

Reference-free scoring across 4 dimensions (1–5): Faithfulness, Coverage, Structure, and Conciseness. Additional analyses include over-/under-medicalization, over-specificity, missed facts, critical omissions, duplicated content, unsupported claims, and contradictions.

Out-of-Domain Generalization

Evaluated on 20–30 real OSCE interviews (Fareez et al.) to test transfer from synthetic to real recorded audio.


Rankings

Ranked by Open Medical Concept F1 (primary), with ROUGE-2/3/L as secondary.
Winners determined separately per track.

Ranked by Open Medical Concept F1 · Lightweight Track (≤ 3B parameters) ⏳ Submissions open June 24
# Team System / Model Med Concept F1 ↓ ROUGE-2 ROUGE-L
🏁
No submissions yet
Results will appear here after the submission deadline (June 24, 2026).

* Baseline (Qwen2.5-Omni-3B) will be published alongside training data release on April 2, 2026.

Ranked by Open Medical Concept F1 · Heavyweight Track (≤ 30B parameters) ⏳ Submissions open June 24
# Team System / Model Med Concept F1 ↓ ROUGE-2 ROUGE-L
🏁
No submissions yet
Results will appear here after the submission deadline (June 24, 2026).

* Baseline (best of Qwen2.5-Omni-3B / Qwen3-Omni-30B variants) will be published alongside training data release on April 2, 2026.


Key Milestones

April 2, 2026
Training + Dev Data Release
Audio + reference SOAP notes published on Hugging Face
May 4, 2026
Open-Source Proposals Deadline
Participants proposing additional open-source models or data
June 24, 2026
System Submission Deadline
Final submissions for Lightweight and Heavyweight tracks
July 1, 2026
Results Announcement
Automated metric results released for all submissions
July 8, 2026
Challenge Paper Submission
System description papers due for IEEE SLT 2026 proceedings

Organizers

Assistant Professor of CSE
The Ohio State University
Ph.D. Student
The Ohio State University
Postdoctoral Research Associate
Carnegie Mellon University
Ph.D. Student
Carnegie Mellon University
Ph.D. Candidate
The Ohio State University
Nationwide Children's Hospital
Principal Research Scientist
Solventum / CMU LTI
Senior Applied Scientist
Amazon
Assistant Research Scientist
Johns Hopkins University (CLSP)
Assistant Research Professor
Johns Hopkins University Bloomberg School of Public Health
Contact For general inquiries, rule clarifications, and team registration, reach the organizing committee at betrac@googlegroups.com