ODIN allows you to transcribe audio streams in real-time. This is useful if you want to build a voice assistant or a
chat bot. In this article, we will show you how to transcribe audio streams in real-time using our
NodeJS SDK.
Use cases
There are many use cases for audio transcription. Here are some examples:
- Content moderation: You might want to ban users that use inappropriate language in your game or app.
- Voice assistant: You might want to build a voice assistant that can answer questions from your users.
- Chat bot: You might want to build a chat bot that can answer questions from your users.
While some of these use cases are quite easy to do for single users interacting with an AI, it's much more complicated
to do that in a room with multiple users. ODIN makes it easy to do that. You can concentrate on building the use-case
and we do the heavy lifting for you.
Example
This example implements a NodeJS server that connects to an ODIN room and starts recording incoming audio streams into
a WAV file. Whenever a user stops talking for 2 seconds, the file is closed and transcribed using OpenAI's Whisper API.
Providing a UserData
object is not necessary but its good practice and allows you to identify your bot in the room.
The
user data object is a JSON object that is used by our Web client we use internally for testing. You can use it quickly
test if everything works fine. More info on the web client can be found here.
const accessKey = "__YOUR_ACCESS_KEY__";
const roomName = "Lobby";
const userName = "My Bot";
import odin from '@4players/odin-nodejs';
const {OdinClient} = odin;
import wav from 'wav';
import {Configuration, OpenAIApi} from "openai";
import fs from 'fs';
const configuration = new Configuration({
apiKey: '__YOUR_OPENAI_API_KEY__'
});
const openai = new OpenAIApi(configuration);
const odinClient = new OdinClient(accessKey);
const room = odinClient.createRoom(roomName, userName);
room.addEventListener('PeerJoined', (event) => {
console.log("Received PeerJoined event", event);
console.log(JSON.parse(new TextDecoder().decode(event.userData)));
});
room.addEventListener('PeerLeft', (event) => {
console.log("Received PeerLeft event", event);
});
room.addEventListener('MediaActivity', (event) => {
if (event.state) {
if (!fileRecorder[event.mediaId]) {
const timer = new Date().getTime();
const fileName = `./recording_${event.peerId}_${event.mediaId}_${timer}.wav`;
console.log("Created a new recording file: ", fileName);
fileRecorder[event.mediaId] = {
wavEncoder: new wav.FileWriter(fileName, {
channels: 1,
sampleRate: 48000,
bitDepth: 16
}),
fileName: fileName
};
} else {
if (fileRecorder[event.mediaId].timer) {
clearTimeout(fileRecorder[event.mediaId].timer);
delete fileRecorder[event.mediaId].timer;
}
}
} else {
if (fileRecorder[event.mediaId]) {
if (!fileRecorder[event.mediaId].timer) {
fileRecorder[event.mediaId].timer = setTimeout(() => {
fileRecorder[event.mediaId].wavEncoder.end();
try {
const file = fs.createReadStream(fileRecorder[event.mediaId].fileName);
openai.createTranscription(file, "whisper-1").then((response) => {
console.log("OpenAI Transcription: ", response.data.text);
});
} catch (e) {
console.log("Failed to transcribe: ", e);
}
delete fileRecorder[event.mediaId];
}, 2000);
}
}
}
});
const userData = {
name: "Recorder Bot",
seed: "123",
userId: "Bot007",
outputMuted: 1,
platform: "ODIN JS Bot SDK",
version: "0.1"
}
const data = new TextEncoder().encode(JSON.stringify(userData));
room.join("gateway.odin.4players.io", data);
console.log("ROOM-ID:", room.id);
room.addEventListener('AudioDataReceived', (data) => {
if (fileRecorder[data.mediaId]) {
fileRecorder[data.mediaId].wavEncoder.file.write(data.samples16, (error) => {
if (error) {
console.log("Failed to write audio file");
}
});
}
});
const message = {
kind: 'message',
payload: {
text: 'Hello World'
}
}
room.sendMessage(new TextEncoder().encode(JSON.stringify(message)));
console.log("Press any key to stop");
const stdin = process.stdin;
stdin.resume();
stdin.setEncoding('utf8');
stdin.on('data', function (key) {
console.log("Shutting down");
room.close();
fileWriter.end();
process.exit();
});
Next steps
You might also want the bot to send audio to the room. I.e. the bot could answer questions from your users or warn
users or group of users to stop using inappropriate language. We have an example for that too. You can find it
here.
Encoding to FLAC
Some speech to text services might require you to deliver FLAC encoded audio data. We have written a blog post about
that to get you started quickly. You can find
it here.
ODIN Bot SDK
This example is just a starting point. You can use it to build your own audio streaming application. We have built
an ODIN Bot SDK in TypeScript built on top of the ODIN NodeJS SDK that you can use to build your own AI bots and
provides
simple interfaces to capture and send audio streams. We have published it as a separate NPM package. You can find it
here.