Numen is voice control for handsfree computing, letting you type efficiently by saying syllables and literal words. It works system-wide on Linux, and the speech recognition runs locally.
There’s a short demonstration on: numenvoice.org
go (a.k.a golang) is required.
The speech recognition library and an English model (about 40MB) can be installed with:
sudo ./install-vosk.sh && sudo ./install-model.sh
The dotool command which simulates the input, can be installed with:
sudo ./install-dotool.sh
Finally, numen itself can be installed with:
sudo ./install-numen.sh
dotool requires permission to /dev/uinput
to create the virtual input devices, and a udev rule grants this to
users in group input.
You could try:
echo type hello | dotool
and if need be, you can run:
sudo groupadd -f input
sudo usermod -a -G input $USER
and re-login and trigger the udev rule or just reboot.
If it types something other than hello, see about keyboard layouts in the manpage.
Once you’ve got a microphone, you can run it with:
numen
There shouldn’t be any output, but you can try typing hey by saying “hoof each yank”, and try transcribing a sentence after saying “scribe”. Terminate it by pressing Ctrl+c (a.k.a “troy cap”).
If nothing happened, check it’s using the right audio device with:
timeout 5 numen --verbose --audiolog=me.wav
aplay me.wav
and specify a --mic from --list-mics if
not.
Now you’re ready to have a go in your text editor! The default
phrases are in the /etc/numen/phrases directory.
I just use Numen and the default phrases for all my computing, with keyboard-based programs like Neovim and qutebrowser. I also use a minimal desktop environment I made, called Tiles, that doesn’t require a pointer device for window management, file picking, etc.
The manpage covers configuring Numen.
You can send questions or patches by composing an email to ~geb/public-inbox@lists.sr.ht.
You’re also welcome to join the Matrix chat at #numen:matrix.org.
AGPLv3 only, see LICENSE.
Copyright (c) 2022-2023 John Gebbie