if you have the 1.21 version of the ffmpegsource, you could do that in avisynth. ffmpegsource("your video.mkv", atrack=2)
the atrack=2 tells it to pick the second audio track assuming that's the track with the voice. If the track with the voice is the first one, just change that to atrack=1.
If you don't have ffmpegsource 1.21, you can get it here.