MyWhisper "Transcribe audio and video on Windows PC"
This is a Windows application that easily transcribes audio files (wav, mp3, m4a) on your PC.
It can be used even on a PC without GPU because it uses CPU for calculation.
Video files (avi, mp4) are also supported.
Since processing is completed locally (inside the PC), it is more secure than cloud computing or APIs.
Details
You can purchase the paid professional version on this Gumroad page. The free version can be downloaded from the following BOOTH page. (Due to file size limitation, it cannot be uploaded to Gumroad)
https://umiyuki.booth.pm/items/4663311
All Whisper models are available in the paid Professional version. The free version offers all models (Tiny, Base, Small, Medium) except the Large (highest quality) model. Larger models are more accurate, but take longer to process and consume more memory.
Please check the operation with the free version before purchasing the professional version.
Due to the large file size, the file is split into multiple files and compressed. Please download all files and decompress them with decompression software.
Also, the source of this application is open source and available to the public.
https://github.com/umiyuki/MyWhisper
How to use
Open the MyWhisper.exe file in the extracted folder. (You may see "PC protected by Windows", but it is ok, just press the "Run" button and it will start)
From the menu, select File→Add File and choose an audio file (wav, mp3, m4a, avi, mp4) and it will be added to the file list. Multiple files are also possible. You can also drag and drop files directly into the file list.
Click the "Start Transcription" button to begin processing.
When the "Processing" text in the lower left corner of the screen changes to "Done", the process is complete.
The transcribed output file will be saved in the same folder as the audio file.
Parameter Description
Model
You can select the Whisper model to be used: Tiny, Base, Small, Medium, and Large (Professional version only), in that order. The larger the size, the higher the quality of the transcription, but the longer the processing time and the larger the memory consumption.
Num Threads
Increasing the number of threads allows processing by utilizing multiple cores of the CPU. However, the optimal number of threads is 4 or 8.
Num Processes
When the number of processes is set to 4, for example, the audio file is divided into 4 parts and processed in parallel. However, if the audio file is cut off in the middle of a speech, there is a possibility that the speech recognition will go wrong only there.
Language
Specify the language of the audio file. Select "japanese" for Japanese or "english" for English.
Output format
Select "txt" for a normal text file. You can also select vtt, srt, or wts.
Precautions
Before purchasing the professional version, we recommend that you try the free version to see how it works. Also, the large model consumes more memory, so it may not work depending on your PC.
Downloads may be restricted by Chrome, Windows Defender, or other anti-virus software. Unfortunately I have no way to deal with that.
Please note that refunds are not available.
Audio and video files may not be supported depending on the codec method, etc. Please understand.
Support
I will provide support on the "Product Support" channel on the Discord server that I manage. However, the free version is not guaranteed. replies and DMs via twitter may be missed.
System Requirements
Tested on Windows 10. Should work on other Windows systems as well.
Update history
[2023-04-03: v1.0.1] Icon change
[2023-04-03: v1.0.0] First version