Close Menu
    Trending
    • Implementing IBCS rules in Power BI
    • What comes next for AI copyright lawsuits?
    • Why PDF Extraction Still Feels LikeHack
    • GenAI Will Fuel People’s Jobs, Not Replace Them. Here’s Why
    • Millions of websites to get ‘game-changing’ AI bot blocker
    • I Worked Through Labor, My Wedding and Burnout — For What?
    • Cloudflare will now block AI bots from crawling its clients’ websites by default
    • 🚗 Predicting Car Purchase Amounts with Neural Networks in Keras (with Code & Dataset) | by Smruti Ranjan Nayak | Jul, 2025
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Artificial Intelligence»Translating a Memoir: A Technical Journey | by Valeria Cortez | Dec, 2024
    Artificial Intelligence

    Translating a Memoir: A Technical Journey | by Valeria Cortez | Dec, 2024

    Team_AIBS NewsBy Team_AIBS NewsDecember 12, 2024No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Leveraging GPT-3.5 and unstructured APIs for translations

    Towards Data Science

    This weblog put up particulars how I utilised GPT to translate the non-public memoir of a household buddy, making it accessible to a broader viewers. Particularly, I employed GPT-3.5 for translation and Unstructured’s APIs for environment friendly content material extraction and formatting.

    The memoir, a heartfelt account by my household buddy Carmen Rosa, chronicles her upbringing in Bolivia and her romantic journey in Paris with an Iranian man throughout the vibrant Seventies. Initially written in Spanish, we aimed to protect the essence of her narrative whereas increasing its attain to English-speaking readers by way of the applying of LLM applied sciences.

    Cover image of “Un Destino Sorprendente”, used with permission of author Carmen Rosa Wichtendahl.
    Cowl picture of “Un Destino Sorprendente”, used with permission of creator Carmen Rosa Wichtendahl.

    Beneath you may learn the interpretation course of in additional element or you may access here the Colab Notebook.

    I adopted the following steps for the interpretation of the e book:

    1. Import Guide Knowledge: I imported the e book from a Docx doc utilizing the Unstructured API and divided it into chapters and paragraphs.
    2. Translation Method: I translated every chapter utilizing GPT-3.5. For every paragraph, I supplied the newest three translated sentences (if accessible) from the identical chapter. This method served two functions:
    • Model Consistency: Sustaining a constant model all through the interpretation by offering context from earlier translations.
    • Token Restrict: Limiting the variety of tokens processed without delay to keep away from exceeding the mannequin’s context restrict.

    3. Exporting translation as Docx: I used Unstructured’s API as soon as once more to save lots of the translated content material in Docx format.

    1. Libraries

    We’ ll begin with the set up and import of the mandatory libraries.

    pip set up --upgrade openai 
    pip set up python-dotenv
    pip set up unstructured
    pip set up python-docx
    import openai

    # Unstructured
    from unstructured.partition.docx import partition_docx
    from unstructured.cleaners.core import group_broken_paragraphs

    # Knowledge and different libraries
    import pandas as pd
    import re
    from typing import Checklist, Dict
    import os
    from dotenv import load_dotenv

    2. Connecting to OpenAI’s API

    The code beneath units up the OpenAI API key to be used in a Python challenge. You have to save your API key in an .env file.

    import openai

    # Specify the trail to the .env file
    dotenv_path = '/content material/.env'

    _ = load_dotenv(dotenv_path) # learn native .env file
    openai.api_key = os.environ['OPENAI_API_KEY']

    3. Loading the e book

    The code permits us to import the e book in Docx format and divide it into particular person paragraphs.

    parts = partition_docx(
    filename="/content material/libro.docx",
    paragraph_grouper=group_broken_paragraphs
    )

    The code beneath returns the paragraph within the tenth index of parts.

    print(parts[10])

    # Returns: Destino sorprendente, es el título que la autora le puso ...

    4. Group e book into titles and chapters

    The following step entails creating an inventory of chapters. Every chapter might be represented as a dictionary containing a title and an inventory of paragraphs. This construction simplifies the method of translating every chapter and paragraph individually. Right here’s an instance of this format:

    [
    {"title": title 1, "content": [paragraph 1, paragraph 2, ..., paragraph n]},
    {"title": title 2, "content material": [paragraph 1, paragraph 2, ..., paragraph n]},
    ...
    {"title": title n, "content material": [paragraph 1, paragraph 2, ..., paragraph n]},
    ]

    To attain this, we’ll create a perform known as group_by_chapter. Listed below are the important thing steps concerned:

    1. Extract Related Info: We are able to get every narrative textual content and title by calling component.class. These are the one classes we’re desirous about translating at this level.
    2. Establish Narrative Titles: We recognise that some titles needs to be a part of the narrative textual content. To account for this, we assume that italicised titles belong to the narrative paragraph.
    def group_by_chapter(parts: Checklist) -> Checklist[Dict]:
    chapters = []
    current_title = None

    for component in parts:

    text_style = component.metadata.emphasized_text_tags # checks whether it is 'b' or 'i' and returns record
    unique_text_style = record(set(text_style)) if text_style shouldn't be None else None

    # we take into account a component a title if it's a title class and the model is daring
    is_title = (component.class == "Title") & (unique_text_style == ['b'])

    # we take into account a component a story content material if it's a narrative textual content class or
    # if it's a title class, however it's italic or italic and daring
    is_narrative = (component.class == "NarrativeText") | (
    ((component.class == "Title") & (unique_text_style is None)) |
    ((component.class == "Title") & (unique_text_style == ['i'])) |
    ((component.class == "Title") & (unique_text_style == ['b', 'i']))
    )

    # for brand spanking new titles
    if is_title:
    print(f"Including title {component.textual content}")

    # Add earlier chapter when a brand new one is available in, until present title is None
    if current_title shouldn't be None:
    chapters.append(current_chapter)

    current_title = component.textual content
    current_chapter = {"title": current_title, "content material": []}

    elif is_narrative:
    print(f"Including Narrative {component.textual content}")
    current_chapter["content"].append(component.textual content)

    else:
    print(f'### No have to convert. Ingredient kind: {component.class}')

    return chapters

    Within the instance beneath, we are able to see an instance:

    book_chapters[2] 

    # Returns
    {'title': 'Proemio',
    'content material': [
    'La autobiografía es considerada ...',
    'Dentro de las artes literarias, ...',
    'Se encuentra más próxima a los, ...',
    ]
    }

    5. Guide translation

    To translate the e book, we comply with these steps:

    1. Translate Chapter Titles: We translate the title of every chapter.
    2. Translate Paragraphs: We translate every paragraph, offering the mannequin with the newest three translated sentences as context.
    3. Save Translations: We save each the translated titles and content material.

    The perform beneath automates this course of.

    def translate_book(book_chapters: Checklist[Dict]) -> Dict:
    translated_book = []
    for chapter in book_chapters:
    print(f"Translating following chapter: {chapter['title']}.")
    translated_title = translate_title(chapter['title'])
    translated_chapter_content = translate_chapter(chapter['content'])
    translated_book.append({
    "title": translated_title,
    "content material": translated_chapter_content
    })
    return translated_book

    For the title, we ask GPT a easy translation as follows:

    def translate_title(title: str) -> str:
    response = shopper.chat.completions.create(
    mannequin="gpt-3.5-turbo",
    messages= [{
    "role": "system",
    "content": f"Translate the following book title into English:n{title}"
    }]
    )
    return response.decisions[0].message.content material

    To translate a single chapter, we offer the mannequin with the corresponding paragraphs. We instruct the mannequin as follows:

    1. Establish the position: We inform the mannequin that it’s a useful translator for a e book.
    2. Present context: We share the newest three translated sentences from the chapter.
    3. Request translation: We ask the mannequin to translate the following paragraph.

    Throughout this course of, the perform combines all translated paragraphs right into a single string.

    # Perform to translate a chapter utilizing OpenAI API
    def translate_chapter(chapter_paragraphs: Checklist[str]) -> str:
    translated_content = ""

    for i, paragraph in enumerate(chapter_paragraphs):

    print(f"Translating paragraph {i + 1} out of {len(chapter_paragraphs)}")

    # Builds the message dynamically based mostly on whether or not there may be earlier translated content material
    messages = [{
    "role": "system",
    "content": "You are a helpful translator for a book."
    }]

    if translated_content:
    latest_content = get_last_three_sentences(translated_content)
    messages.append(
    {
    "position": "system",
    "content material": f"That is the newest textual content from the e book that you have translated from Spanish into English:n{latest_content}"
    }
    )

    # Provides the consumer message for the present paragraph
    messages.append(
    {
    "position": "consumer",
    "content material": f"Translate the next textual content from the e book into English:n{paragraph}"
    }
    )

    # Calls the API
    response = shopper.chat.completions.create(
    mannequin="gpt-3.5-turbo",
    messages=messages
    )

    # Extracts the translated content material and appends it
    paragraph_translation = response.decisions[0].message.content material
    translated_content += paragraph_translation + 'nn'

    return translated_content

    Lastly, beneath we are able to see the supporting perform to get the newest three sentences.

    def get_last_three_sentences(paragraph: str) -> str:
    # Use regex to separate the textual content into sentences
    sentences = re.break up(r'(?

    # Get the final three sentences (or fewer if the paragraph has lower than 3 sentences)
    last_three = sentences[-3:]

    # Be part of the sentences right into a single string
    return ' '.be part of(last_three)

    6. Guide export

    Lastly, we cross the dictionary of chapters to a perform that provides every title as a heading and every content material as a paragraph. After every paragraph, a web page break is added to separate the chapters. The ensuing doc is then saved regionally as a Docx file.

    from docx import Doc

    def create_docx_from_chapters(chapters: Dict, output_filename: str) -> None:
    doc = Doc()

    for chapter in chapters:
    # Add chapter title as Heading 1
    doc.add_heading(chapter['title'], stage=1)

    # Add chapter content material as regular textual content
    doc.add_paragraph(chapter['content'])

    # Add a web page break after every chapter
    doc.add_page_break()

    # Save the doc
    doc.save(output_filename)

    Whereas utilizing GPT and APIs for translation is quick and environment friendly, there are key limitations in comparison with human translation:

    • Pronoun and Reference Errors: GPT did misread pronouns or references in few instances, doubtlessly attributing actions or statements to the incorrect particular person within the narrative. A human translator can higher resolve such ambiguities.
    • Cultural Context: GPT missed delicate cultural references and idioms {that a} human translator may interpret extra precisely. On this case, a number of slang phrases distinctive to Santa Cruz, Bolivia, have been retained within the unique language with out further context or clarification.

    Combining AI with human evaluate can stability pace and high quality, making certain translations are each correct and genuine.

    This challenge demonstrates an method to translating a e book utilizing a mix of GPT-3 and Unstructured APIs. By automating the interpretation course of, we considerably diminished the guide effort required. Whereas the preliminary translation output could require some minor human revisions to refine the nuances and make sure the highest high quality, this method serves as a powerful basis for environment friendly and efficient e book translation

    When you have any suggestions or recommendations on methods to enhance this course of or the standard of the translations, please be happy to share them within the feedback beneath.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleHow to automate Accounts Payable using LLM-Powered Multi Agent Systems
    Next Article Why Your AI Strategy Will Fail Without the Right Talent in Place
    Team_AIBS News
    • Website

    Related Posts

    Artificial Intelligence

    Implementing IBCS rules in Power BI

    July 1, 2025
    Artificial Intelligence

    Become a Better Data Scientist with These Prompt Engineering Tips and Tricks

    July 1, 2025
    Artificial Intelligence

    Lessons Learned After 6.5 Years Of Machine Learning

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Implementing IBCS rules in Power BI

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    How I Scaled from Side Hustle to 7 Figures Using 4 AI Tools (No Tech Skills Needed)

    May 17, 2025

    DeepSeek AI: The $5M Disruptor Taking on OpenAI & Google 🤯 | by Deepanshu khudania | Jan, 2025

    January 29, 2025

    The Power of Data Science in Business | by Kamran | Jan, 2025

    January 3, 2025
    Our Picks

    Implementing IBCS rules in Power BI

    July 1, 2025

    What comes next for AI copyright lawsuits?

    July 1, 2025

    Why PDF Extraction Still Feels LikeHack

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.