Skip to content

Unable to use external microphone with input_device_index #276

@DRMPN

Description

@DRMPN

Hello!

Description

When specifying input_device_index to use an external microphone, initialization fails with Exception: Selected device validation failed. The system then falls back to the default microphone. Both tested external microphones (device indexes 5 and 6) show the same [Errno -9997] Invalid sample rate error in logs despite supporting 48kHz sample rates.

Here's part of my script where I configure it:

if __name__ == "__main__":

    recorder = AudioToTextRecorder(
        spinner=False,
        model="base",
        language="ru",
        device="cpu",
        input_device_index=5,
        use_microphone=True,
        enable_realtime_transcription=True,
        on_recording_stop=stop_callback,
        on_realtime_transcription_update=trnscr_update,
        post_speech_silence_duration=1,
    )

    print("Ready.\n ")

    try:
        while True:
            start_callback()
            recorder.text(process_text)
    finally:
        recorder.shutdown()

Key Log Excerpts

input_device_index = 5

RealTimeSTT: realtimestt - INFO - Starting RealTimeSTT
RealTimeSTT: realtimestt - INFO - Initializing audio recording (creating pyAudio input stream, sample rate: 16000 buffer size: 512
RealTimeSTT: realtimestt - DEBUG - Starting audio data worker with target_sample_rate=16000, buffer_size=512, input_device_index=5
RealTimeSTT: realtimestt - DEBUG - Creating PyAudio interface...
RealTimeSTT: realtimestt - INFO - Initializing faster_whisper realtime transcription model tiny, default device: cpu, compute type: default, device index: 0, download root: None
RealTimeSTT: realtimestt - DEBUG - Retrieving highest sample rate for device index 5: {'index': 5, 'structVersion': 2, 'name': 'GeniusMic UC: USB Audio (hw:3,0)', 'hostApi': 0, 'maxInputChannels': 1, 'maxOutputChannels': 2, 'defaultLowInputLatency': 0.007979166666666667, 'defaultLowOutputLatency': 0.007979166666666667, 'defaultHighInputLatency': 0.032, 'defaultHighOutputLatency': 0.032, 'defaultSampleRate': 48000.0}
RealTimeSTT: realtimestt - DEBUG - Highest supported sample rate for device index 5 is 48000
RealTimeSTT: realtimestt - DEBUG - Sample rates to try for device 5: [16000, 48000]
RealTimeSTT: realtimestt - DEBUG - Attempting to initialize audio stream at 16000 Hz.
RealTimeSTT: realtimestt - DEBUG - Found 10 total audio devices on the system.
RealTimeSTT: realtimestt - DEBUG - Available input devices with input channels: [3, 4, 5, 7, 8, 9]
RealTimeSTT: realtimestt - DEBUG - Validating device index 5 with info: {'index': 5, 'structVersion': 2, 'name': 'GeniusMic UC: USB Audio (hw:3,0)', 'hostApi': 0, 'maxInputChannels': 1, 'maxOutputChannels': 2, 'defaultLowInputLatency': 0.007979166666666667, 'defaultLowOutputLatency': 0.007979166666666667, 'defaultHighInputLatency': 0.032, 'defaultHighOutputLatency': 0.032, 'defaultSampleRate': 48000.0}
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2718
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2842
RealTimeSTT: realtimestt - DEBUG - Device validation failed for index 5: [Errno -9997] Invalid sample rate
RealTimeSTT: realtimestt - ERROR - Microphone connection failed: Selected device validation failed. Retrying...
Traceback (most recent call last):
  File "/home/ilya/perimeter-voice-assistant/venv/lib/python3.12/site-packages/RealtimeSTT/audio_recorder.py", line 1169, in initialize_audio_stream
    raise Exception("Selected device validation failed")
Exception: Selected device validation failed
[2025-07-31 10:48:22.659] [ctranslate2] [thread 45950] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
[2025-07-31 10:48:22.825] [ctranslate2] [thread 46411] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
RealTimeSTT: realtimestt - DEBUG - Faster_whisper realtime speech to text transcription model initialized successfully
RealTimeSTT: realtimestt - INFO - Initializing WebRTC voice with Sensitivity 3
RealTimeSTT: realtimestt - DEBUG - WebRTC VAD voice activity detection engine initialized successfully
RealTimeSTT: realtimestt - DEBUG - Silero VAD voice activity detection engine initialized successfully
RealTimeSTT: realtimestt - DEBUG - Starting realtime worker
RealTimeSTT: realtimestt - DEBUG - Waiting for main transcription model to start
RealTimeSTT: realtimestt - DEBUG - Main transcription model ready
RealTimeSTT: realtimestt - DEBUG - RealtimeSTT initialization completed successfully

input_device_index = 6

RealTimeSTT: realtimestt - INFO - Starting RealTimeSTT
RealTimeSTT: realtimestt - INFO - Initializing audio recording (creating pyAudio input stream, sample rate: 16000 buffer size: 512
RealTimeSTT: realtimestt - DEBUG - Starting audio data worker with target_sample_rate=16000, buffer_size=512, input_device_index=6
RealTimeSTT: realtimestt - INFO - Initializing faster_whisper realtime transcription model tiny, default device: cpu, compute type: default, device index: 0, download root: None
RealTimeSTT: realtimestt - DEBUG - Creating PyAudio interface...
RealTimeSTT: realtimestt - DEBUG - Retrieving highest sample rate for device index 6: {'index': 6, 'structVersion': 2, 'name': 'USB Audio Device: - (hw:4,0)', 'hostApi': 0, 'maxInputChannels': 2, 'maxOutputChannels': 2, 'defaultLowInputLatency': 0.007979166666666667, 'defaultLowOutputLatency': 0.007979166666666667, 'defaultHighInputLatency': 0.032, 'defaultHighOutputLatency': 0.032, 'defaultSampleRate': 48000.0}
RealTimeSTT: realtimestt - DEBUG - Highest supported sample rate for device index 6 is 48000
RealTimeSTT: realtimestt - DEBUG - Sample rates to try for device 6: [16000, 48000]
RealTimeSTT: realtimestt - DEBUG - Attempting to initialize audio stream at 16000 Hz.
RealTimeSTT: realtimestt - DEBUG - Found 11 total audio devices on the system.
RealTimeSTT: realtimestt - DEBUG - Available input devices with input channels: [3, 4, 5, 6, 8, 9, 10]
RealTimeSTT: realtimestt - DEBUG - Validating device index 6 with info: {'index': 6, 'structVersion': 2, 'name': 'USB Audio Device: - (hw:4,0)', 'hostApi': 0, 'maxInputChannels': 2, 'maxOutputChannels': 2, 'defaultLowInputLatency': 0.007979166666666667, 'defaultLowOutputLatency': 0.007979166666666667, 'defaultHighInputLatency': 0.032, 'defaultHighOutputLatency': 0.032, 'defaultSampleRate': 48000.0}
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2718
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2842
RealTimeSTT: realtimestt - DEBUG - Device validation failed for index 6: [Errno -9997] Invalid sample rate
RealTimeSTT: realtimestt - ERROR - Microphone connection failed: Selected device validation failed. Retrying...
Traceback (most recent call last):
  File "/home/ilya/perimeter-voice-assistant/venv/lib/python3.12/site-packages/RealtimeSTT/audio_recorder.py", line 1169, in initialize_audio_stream
    raise Exception("Selected device validation failed")
Exception: Selected device validation failed
[2025-07-31 11:08:38.800] [ctranslate2] [thread 57746] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
[2025-07-31 11:08:38.970] [ctranslate2] [thread 57952] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
RealTimeSTT: realtimestt - DEBUG - Faster_whisper realtime speech to text transcription model initialized successfully
RealTimeSTT: realtimestt - INFO - Initializing WebRTC voice with Sensitivity 3
RealTimeSTT: realtimestt - DEBUG - WebRTC VAD voice activity detection engine initialized successfully
RealTimeSTT: realtimestt - DEBUG - Silero VAD voice activity detection engine initialized successfully
RealTimeSTT: realtimestt - DEBUG - Starting realtime worker
RealTimeSTT: realtimestt - DEBUG - Waiting for main transcription model to start
RealTimeSTT: realtimestt - DEBUG - Main transcription model ready
RealTimeSTT: realtimestt - DEBUG - RealtimeSTT initialization completed successfully

Thank you for your great work on this project! ❤️

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions