Close Menu
    Trending
    • 5 Ways Artificial Intelligence Can Support SMB Growth at a Time of Economic Uncertainty in Industries
    • Microsoft Says Its AI Diagnoses Patients Better Than Doctors
    • From Reporting to Reasoning: How AI Is Rewriting the Rules of Data App Development
    • Can AI Replace Doctors? How Technology Is Shaping Healthcare – Healthcare Info
    • Singapore police can now seize bank accounts to stop scams
    • How One Founder Is Rethinking Supplements With David Beckham
    • Revisiting Benchmarking of Tabular Reinforcement Learning Methods
    • Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Artificial Intelligence»Use OpenAI Whisper for Automated Transcriptions
    Artificial Intelligence

    Use OpenAI Whisper for Automated Transcriptions

    Team_AIBS NewsBy Team_AIBS NewsJune 26, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    improvement currently with giant language fashions (LLMs). Plenty of the main target is on the question-answering you are able to do with each pure text-based fashions, or vision-language fashions (VLMs), the place you can too enter photographs.

    Nonetheless, there’s one other dimension that has advanced a ton over the previous few years: Audio. Fashions that may each transcribe (speech -> textual content), speech synthesis (textual content -> speech), and likewise speech-to-speech, the place you’ve a complete dialog with a language mannequin, with audio going each out and in.

    The arcitecture and and coaching pipeline for OpenAI’s Whisper mannequin. Picture from OpenAI Whisper GitHub repository with MIT license.

    On this article, I’ll focus on how I’m using the event throughout the audio mannequin area to my benefit, turning into an much more environment friendly programmer.

    That is an instance video of me utilizing the transcription device. I first choose the immediate area in Cursor and use my hotkey to activate the microphone, which is indicated by the orange icon within the prime left. I then converse out the sentence I wish to transcribe, and it rapidly seems within the immediate window with out me having to sort on the keyboard in any respect. This can be a extra environment friendly solution to sort lengthy English prompts into your editor. Video by the creator.

    Motivation

    My main motivation for writing this text is that I’m frequently in search of methods to develop into a extra environment friendly programmer. After utilizing the ChatGPT cellular app for some time, I found their transcription choice (the microphone icon to the proper within the person enter area). I used the transcription and rapidly realized how a lot better this transcription is in comparison with others I’ve used earlier than, similar to Apple’s built-in iPhone transcription.

    OpenAI’s transcription virtually at all times captures all of my phrases, with only a few errors. Even when I exploit much less frequent phrases, for instance, acronyms associated to laptop science, it’s nonetheless in a position to choose up what I’m saying.

    The transcription icon from the OpenAI software. Picture by the creator, taken from OpenAI’s ChatGPT.

    This transcription was solely accessible within the ChatGPT app. Nonetheless, I do know that OpenAI has an API endpoint for his or her Whisper mannequin, which is (presumably) the identical mannequin they’re utilizing to transcribe textual content within the app. I thus needed to set this mannequin up on my Mac to be accessible by way of a shortcut.

    (I do know there are apps similar to Macwhisper accessible, however I needed to develop a very free answer, aside from the prices of the API calls themselves)

    Conditions

    • Alfred (I might be utilizing Alfred on the Mac to set off some scripts. Nonetheless, alternate options to this additionally exist. On the whole, you want a solution to set off scripts in your Mac / PC from a hotkey.

    Execs

    The primary benefit of utilizing this transcription is that you would be able to enter phrases into your laptop extra rapidly. Once I sort as rapidly as I can on my laptop, I’m not even in a position to attain 100 phrases per minute, and if I’m to sort at that velocity, I actually need to focus. Nonetheless, the typical speaking velocity is at a minimum of 110, according to this article.

    This implies you could be a lot simpler if you’ll be able to converse your phrases with transcription, as a substitute of typing them out on the keyboard.

    I feel that is particularly related after the rise of huge language fashions similar to ChatGPT. You spend extra time prompting the language fashions, for instance, asking inquiries to ChatGPT, or prompting the cursor to implement a function, or fixing a bug. Thus, using the English language is way more prevalent now than earlier than, in comparison with using programming languages similar to Python instantly.

    Notice: After all, you’ll nonetheless be writing quite a lot of code, however from expertise, I spend much more time prompting the cursor, for instance, with intensive English prompts, during which case, utilizing this transcription saves me quite a lot of time.

    Cons

    There can, nevertheless, be some downsides to utilizing the transcription as nicely. One of many principal ones is that quite a lot of occasions, you don’t want to talk out loud when programming. You could be sitting within the airport (as I’m when writing this text), and even in your workplace. Once you’re in these eventualities, you most likely don’t wish to disturb these round you by talking out loud. Nonetheless, in case you are sitting in a house workplace, that is naturally not an issue.

    One other destructive aspect is that smaller prompts may not be that a lot sooner. Think about this: should you simply wish to write a immediate of a single sentence, it’s going to, in lots of eventualities, be sooner simply to sort the immediate out by hand. That is due to the delay in beginning, stopping, and transcribing audio into textual content. Sending the API name takes just a little little bit of time, and the shorter the immediate you’ve, the bigger fraction of the time you must spend ready for the response.

    Easy methods to implement

    You’ll be able to see the code I used in this article on my GitHub. Nonetheless, you additionally want so as to add hotkeys to run the scripts.

    First, you must:

    • Clone the GitHub repository:
    git clone https://github.com/EivindKjosbakken/whisper-shortcut.git
    • Create a digital setting known as .venv and set up the required packages:
    python3 -m venv .venv
    supply .venv/bin/activate
    pip set up -r necessities.txt
    • Get an OpenAI API Key. You are able to do that by:
      • Going to the OpenAI API Overview, logging in/making a profile
      • Go to your profile, and API Keys
      • Create a brand new key. Bear in mind to repeat the important thing, as you won’t be able to see it once more

    The scripts from the GitHub repository work by:

    • start_recording.sh — begins recording your voice. The primary time you utilize this, it’s going to ask you for permission to make use of the microphone
    • stop_recording.sh — sends a cease sign to the script to cease recording. Then sends the recorded audio to OpenAI for transcription. Moreover, it provides the transcribed textual content to your clipboard and pastes the textual content in case you have a textual content area in your PC chosen

    All the repository is obtainable with an MIT license.

    Alfred

    You’ll find the Alfred workflow on the GitHub repository right here: Transcribe.alfredworkflow.

    That is how I arrange the Alfred workflow:

    My Alfred workflow. I’ve two hotkeys, one to begin the transcription (report voice), and one to cease transcription (cease recording, and ship the audio to the OpenAI Whisper API for transcription). The choice + Q command runs the start_recording.sh script, and the choice + W run the stop_recording.sh script. You’ll be able to, after all, change the hotkeys for these instructions. Picture by the creator.

    You’ll be able to merely obtain it and add it to your Alfred.

    Additionally, keep in mind to have a terminal window open everytime you wish to run this script, as you activate the Python script from the terminal. I needed to do it this fashion as a result of if the script was activated instantly from Alfred, I obtained permission points. The primary time you run the script, you need to be prompted to offer your terminal entry to the microphone, which you need to approve.

    Price

    An essential consideration when utilizing APIs similar to OpenAI Whisper is the price of the API utilization. I’d think about the price of utilizing OpenAI’s Whisper mannequin reasonably excessive. As at all times, the associated fee is totally depending on how a lot you utilize the mannequin. I’d say I exploit the mannequin as much as 25 occasions a day, as much as 150 phrases, and the associated fee is lower than 1 greenback per day.

    This implies, nevertheless, that should you use the mannequin rather a lot, you may see prices as much as 30 {dollars} per 30 days, which is certainly a considerable value. Nonetheless, I feel it’s essential to be aware of the time financial savings you’ve from the mannequin. If every mannequin utilization saves you 30 seconds, and you utilize it 20 occasions per day, you’ve simply saved ten minutes of your day. Personally, I’m keen to pay one greenback to avoid wasting ten minutes of my day, performing a process (writing on my keyboard), that doesn’t actually grant me another profit. If any, utilizing your keyboard might contribute to the next threat of accidents similar to carpal tunnel syndrome. Utilizing the mannequin is thus positively value it for me.

    Conclusion

    On this article, I began off discussing the immense advances inside language fashions in the previous few years. This has helped us create highly effective chatbots, saving us huge quantities of time. Nonetheless, with the advances of language fashions, we now have additionally seen advances in voice fashions. Transcription utilizing OpenAI Whisper is now close to excellent (from private expertise), which makes it a robust device you should use to enter phrases in your laptop extra successfully. I mentioned the professionals and cons of utilizing OpenAI Whisper in your PC, and I additionally went step-by-step by way of how one can implement it by yourself laptop.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleShojib Miah on his AI-Powered Lead Generation System for Commercial Real Estate | by Shojib Miah | Jun, 2025
    Next Article Starbucks Changes Pricing for Syrups, Powders
    Team_AIBS News
    • Website

    Related Posts

    Artificial Intelligence

    From Reporting to Reasoning: How AI Is Rewriting the Rules of Data App Development

    July 2, 2025
    Artificial Intelligence

    Revisiting Benchmarking of Tabular Reinforcement Learning Methods

    July 2, 2025
    Artificial Intelligence

    An Introduction to Remote Model Context Protocol Servers

    July 2, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    5 Ways Artificial Intelligence Can Support SMB Growth at a Time of Economic Uncertainty in Industries

    July 2, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    AI Ethics for the Everyday User — Why Should You Care? | by Murtaza Ali | Jan, 2025

    January 29, 2025

    Cut Overhead, Not Capabilities: Microsoft Office Pro 2021 Is Just $49.97

    June 7, 2025

    Rare Earths Reality: Ukraine’s Nonexistent Deposits

    March 9, 2025
    Our Picks

    5 Ways Artificial Intelligence Can Support SMB Growth at a Time of Economic Uncertainty in Industries

    July 2, 2025

    Microsoft Says Its AI Diagnoses Patients Better Than Doctors

    July 2, 2025

    From Reporting to Reasoning: How AI Is Rewriting the Rules of Data App Development

    July 2, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.