Unable to use external microphone with `input_device_index`

Hello!

## Description

When specifying `input_device_index` to use an external microphone, initialization fails with `Exception: Selected device validation failed.` The system then falls back to the default microphone. Both tested external microphones (device indexes 5 and 6) show the same `[Errno -9997] Invalid sample rate` error in logs despite supporting 48kHz sample rates.

Here's part of my script where I configure it:
```
if __name__ == "__main__":

    recorder = AudioToTextRecorder(
        spinner=False,
        model="base",
        language="ru",
        device="cpu",
        input_device_index=5,
        use_microphone=True,
        enable_realtime_transcription=True,
        on_recording_stop=stop_callback,
        on_realtime_transcription_update=trnscr_update,
        post_speech_silence_duration=1,
    )

    print("Ready.\n ")

    try:
        while True:
            start_callback()
            recorder.text(process_text)
    finally:
        recorder.shutdown()
```

## Key Log Excerpts
`input_device_index` = 5
```
RealTimeSTT: realtimestt - INFO - Starting RealTimeSTT
RealTimeSTT: realtimestt - INFO - Initializing audio recording (creating pyAudio input stream, sample rate: 16000 buffer size: 512
RealTimeSTT: realtimestt - DEBUG - Starting audio data worker with target_sample_rate=16000, buffer_size=512, input_device_index=5
RealTimeSTT: realtimestt - DEBUG - Creating PyAudio interface...
RealTimeSTT: realtimestt - INFO - Initializing faster_whisper realtime transcription model tiny, default device: cpu, compute type: default, device index: 0, download root: None
RealTimeSTT: realtimestt - DEBUG - Retrieving highest sample rate for device index 5: {'index': 5, 'structVersion': 2, 'name': 'GeniusMic UC: USB Audio (hw:3,0)', 'hostApi': 0, 'maxInputChannels': 1, 'maxOutputChannels': 2, 'defaultLowInputLatency': 0.007979166666666667, 'defaultLowOutputLatency': 0.007979166666666667, 'defaultHighInputLatency': 0.032, 'defaultHighOutputLatency': 0.032, 'defaultSampleRate': 48000.0}
RealTimeSTT: realtimestt - DEBUG - Highest supported sample rate for device index 5 is 48000
RealTimeSTT: realtimestt - DEBUG - Sample rates to try for device 5: [16000, 48000]
RealTimeSTT: realtimestt - DEBUG - Attempting to initialize audio stream at 16000 Hz.
RealTimeSTT: realtimestt - DEBUG - Found 10 total audio devices on the system.
RealTimeSTT: realtimestt - DEBUG - Available input devices with input channels: [3, 4, 5, 7, 8, 9]
RealTimeSTT: realtimestt - DEBUG - Validating device index 5 with info: {'index': 5, 'structVersion': 2, 'name': 'GeniusMic UC: USB Audio (hw:3,0)', 'hostApi': 0, 'maxInputChannels': 1, 'maxOutputChannels': 2, 'defaultLowInputLatency': 0.007979166666666667, 'defaultLowOutputLatency': 0.007979166666666667, 'defaultHighInputLatency': 0.032, 'defaultHighOutputLatency': 0.032, 'defaultSampleRate': 48000.0}
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2718
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2842
RealTimeSTT: realtimestt - DEBUG - Device validation failed for index 5: [Errno -9997] Invalid sample rate
RealTimeSTT: realtimestt - ERROR - Microphone connection failed: Selected device validation failed. Retrying...
Traceback (most recent call last):
  File "/home/ilya/perimeter-voice-assistant/venv/lib/python3.12/site-packages/RealtimeSTT/audio_recorder.py", line 1169, in initialize_audio_stream
    raise Exception("Selected device validation failed")
Exception: Selected device validation failed
[2025-07-31 10:48:22.659] [ctranslate2] [thread 45950] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
[2025-07-31 10:48:22.825] [ctranslate2] [thread 46411] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
RealTimeSTT: realtimestt - DEBUG - Faster_whisper realtime speech to text transcription model initialized successfully
RealTimeSTT: realtimestt - INFO - Initializing WebRTC voice with Sensitivity 3
RealTimeSTT: realtimestt - DEBUG - WebRTC VAD voice activity detection engine initialized successfully
RealTimeSTT: realtimestt - DEBUG - Silero VAD voice activity detection engine initialized successfully
RealTimeSTT: realtimestt - DEBUG - Starting realtime worker
RealTimeSTT: realtimestt - DEBUG - Waiting for main transcription model to start
RealTimeSTT: realtimestt - DEBUG - Main transcription model ready
RealTimeSTT: realtimestt - DEBUG - RealtimeSTT initialization completed successfully
```

`input_device_index` = 6
```
RealTimeSTT: realtimestt - INFO - Starting RealTimeSTT
RealTimeSTT: realtimestt - INFO - Initializing audio recording (creating pyAudio input stream, sample rate: 16000 buffer size: 512
RealTimeSTT: realtimestt - DEBUG - Starting audio data worker with target_sample_rate=16000, buffer_size=512, input_device_index=6
RealTimeSTT: realtimestt - INFO - Initializing faster_whisper realtime transcription model tiny, default device: cpu, compute type: default, device index: 0, download root: None
RealTimeSTT: realtimestt - DEBUG - Creating PyAudio interface...
RealTimeSTT: realtimestt - DEBUG - Retrieving highest sample rate for device index 6: {'index': 6, 'structVersion': 2, 'name': 'USB Audio Device: - (hw:4,0)', 'hostApi': 0, 'maxInputChannels': 2, 'maxOutputChannels': 2, 'defaultLowInputLatency': 0.007979166666666667, 'defaultLowOutputLatency': 0.007979166666666667, 'defaultHighInputLatency': 0.032, 'defaultHighOutputLatency': 0.032, 'defaultSampleRate': 48000.0}
RealTimeSTT: realtimestt - DEBUG - Highest supported sample rate for device index 6 is 48000
RealTimeSTT: realtimestt - DEBUG - Sample rates to try for device 6: [16000, 48000]
RealTimeSTT: realtimestt - DEBUG - Attempting to initialize audio stream at 16000 Hz.
RealTimeSTT: realtimestt - DEBUG - Found 11 total audio devices on the system.
RealTimeSTT: realtimestt - DEBUG - Available input devices with input channels: [3, 4, 5, 6, 8, 9, 10]
RealTimeSTT: realtimestt - DEBUG - Validating device index 6 with info: {'index': 6, 'structVersion': 2, 'name': 'USB Audio Device: - (hw:4,0)', 'hostApi': 0, 'maxInputChannels': 2, 'maxOutputChannels': 2, 'defaultLowInputLatency': 0.007979166666666667, 'defaultLowOutputLatency': 0.007979166666666667, 'defaultHighInputLatency': 0.032, 'defaultHighOutputLatency': 0.032, 'defaultSampleRate': 48000.0}
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2718
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2842
RealTimeSTT: realtimestt - DEBUG - Device validation failed for index 6: [Errno -9997] Invalid sample rate
RealTimeSTT: realtimestt - ERROR - Microphone connection failed: Selected device validation failed. Retrying...
Traceback (most recent call last):
  File "/home/ilya/perimeter-voice-assistant/venv/lib/python3.12/site-packages/RealtimeSTT/audio_recorder.py", line 1169, in initialize_audio_stream
    raise Exception("Selected device validation failed")
Exception: Selected device validation failed
[2025-07-31 11:08:38.800] [ctranslate2] [thread 57746] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
[2025-07-31 11:08:38.970] [ctranslate2] [thread 57952] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
RealTimeSTT: realtimestt - DEBUG - Faster_whisper realtime speech to text transcription model initialized successfully
RealTimeSTT: realtimestt - INFO - Initializing WebRTC voice with Sensitivity 3
RealTimeSTT: realtimestt - DEBUG - WebRTC VAD voice activity detection engine initialized successfully
RealTimeSTT: realtimestt - DEBUG - Silero VAD voice activity detection engine initialized successfully
RealTimeSTT: realtimestt - DEBUG - Starting realtime worker
RealTimeSTT: realtimestt - DEBUG - Waiting for main transcription model to start
RealTimeSTT: realtimestt - DEBUG - Main transcription model ready
RealTimeSTT: realtimestt - DEBUG - RealtimeSTT initialization completed successfully
```


Thank you for your great work on this project! ❤️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to use external microphone with `input_device_index` #276

Description

Key Log Excerpts

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unable to use external microphone with input_device_index #276

Description

Description

Key Log Excerpts

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Unable to use external microphone with `input_device_index` #276