Most Recent

Newsroom

Cover image for blog post: A New Look for Koel Labs

A New Look for Koel Labs

A photo of Alexander MetzgerA photo of Aruna SrivastavaA photo of Ruslan Mukhamedvaleev
By Koel Labs

You might have noticed that we’ve given Koel Labs a fresh new look. This change is intended to better align us with our mission of pioneering inclusive speech technology, with our goals as a research-focused startup, and with our belief in openly sharing our work.

Announcement
Cover image for blog post: The Underlying Intuition of Wav2Vec2’s Transformer

The Underlying Intuition of Wav2Vec2’s Transformer

A photo of Alexander MetzgerA photo of Aruna SrivastavaA photo of Ruslan Mukhamedvaleev
By Koel Labs

Wav2Vec2’s Transformer handles encoded audio features and aligns them to text. Building on our blog post about the feature extractor, this post dives into positional encodings tailored to audio and how CTC loss solves alignment without frame-level labels.

Technical Report

Technical Reports

Cover image for blog post: The Underlying Intuition of Wav2Vec2’s Transformer

The Underlying Intuition of Wav2Vec2’s Transformer

A photo of Alexander MetzgerA photo of Aruna SrivastavaA photo of Ruslan Mukhamedvaleev
By Koel Labs

Wav2Vec2’s Transformer handles encoded audio features and aligns them to text. Building on our blog post about the feature extractor, this post dives into positional encodings tailored to audio and how CTC loss solves alignment without frame-level labels.

Technical Report
Cover image for blog post: The Underlying Intuition of Wav2Vec2's CNN

The Underlying Intuition of Wav2Vec2's CNN

A photo of Alexander MetzgerA photo of Aruna SrivastavaA photo of Ruslan Mukhamedvaleev
By Koel Labs

Typically, every explanation of the Wav2vec2 architecture begins with the iconic diagram, but without extensive background, it is hard to know what the cones labeled as the CNN are really doing. What does it actually mean to extract features from audio? Let's find a stronger visual intuition for this.

Technical Report
Cover image for blog post: Building Open Source Hugging Face Leaderboards

Building Open Source Hugging Face Leaderboards

A photo of Alexander MetzgerA photo of Aruna SrivastavaA photo of Ruslan Mukhamedvaleev
By Koel Labs

Sometimes, the best machine learning models are hidden in plain sight. During our work on phonemic transcription, we stumbled upon a specialized ginic model that had been finetuned on Facebook's XLSR-53 model using the Buckeye corpus. This discovery proved significant: Ginic performs 1.2x better than Facebook, and iterating on their approach, our m... Read More →

Technical Report
Cover image for blog post: A Deep Dive into Phonemic Transcription Metrics

A Deep Dive into Phonemic Transcription Metrics

A photo of Alexander MetzgerA photo of Aruna SrivastavaA photo of Ruslan Mukhamedvaleev
By Koel Labs

The International Phonetic Alphabet (IPA) is like the Swiss Army knife of pronunciation—it gives us precise symbols to represent every sound humans make in language. In recent years, predicting these phonemic transcriptions from audio has become a popular machine learning task. But how do we calculate the accuracy of these models?

Technical Report

Announcements

Cover image for blog post: A New Look for Koel Labs

A New Look for Koel Labs

A photo of Alexander MetzgerA photo of Aruna SrivastavaA photo of Ruslan Mukhamedvaleev
By Koel Labs

You might have noticed that we’ve given Koel Labs a fresh new look. This change is intended to better align us with our mission of pioneering inclusive speech technology, with our goals as a research-focused startup, and with our belief in openly sharing our work.

Announcement
Cover image for blog post: Hello World! — Our Open Source Project Launch

Hello World! — Our Open Source Project Launch

A photo of Alexander MetzgerA photo of Aruna SrivastavaA photo of Ruslan Mukhamedvaleev
By Koel Labs

At Koel Labs, our goal is to make pronunciation learning more accessible and inclusive. To represent the diversity of language and dialects, we're excited to announce that everything from model weights and training code to datasets, research papers, and the frontend UI is officially open source!

Announcement

Early Access

Be First in Line

We’re inviting a small group for early access to our research previews. Reserve your spot today.