Skip to main content
  1. Blog Posts/

Better AI translation using gold-standard workflows for human translation

·6055 words·29 mins

Like it or not, hundreds of thousands of people are using general-purpose LLMs like GPT-4o for translation. A while ago, I became interested in approaches to automatically evaluating the quality of these translations, and using these evaluations to improve the final outputs — a process we’ve now started calling “reasoning” since the launch of o1, this general idea, and the code below, predate that.

In practice, this isn’t the best approach AI translation: you’re generally better off using a specialised tool like DeepL, or possibly even Google translate. However, this workflow is interesting for a few reasons. It demonstrates a flexible pattern for working with LLMs — generate something, evaluate it, improve it — and shows how workflows for expert human workers (forward and backward translation) can be adapted for use with LLMs.

Before I begin, I wanted to call out this excellent comment on HackerNews from a professional translator who uses LLMs to assist his work (hat tip: Simon Willison).

Info
After publishing this post, I came across some similar work by Eugene Vinitsky. This overlap isn’t a bad thing: we both copied the workflow from the approach that human translators take. I think his use of structured outputs is neater than mine!

It’s also important to say, if you’re translating something important or sensitive, don’t use AI. Translators are highly-trained professionals, who can understand the nuance of what your’re trying to say. An LLM isn’t, and doesn’t. That said, AI translation has its place, and there are mountains of information out there that would never be translated at all without AI translation.

That said, let’s begin.

The Data #

As example data, I’ve taken some tricky examples from https://tianakai.com/2012/12/top-american-phrases-that-dont-translate-well/, and translate to French because it’s the only other major language I (vaguely) understand.

source_items = [
    "you're off the hook",
    "the drinks are on the house",
    "he quit cold turkey",
    "it's about to go down",
    "I'm going to totally pig out",
    "are you shitting me?",
    "he's going to pop the question",
    "hold your horses!",
    "I'll take a rain check and see you next week",
    "I'm there like a bear",
    "Wow, he's going to town on that sandwich",
    "her drunken rage is now in full effect, watch out",
    "that's so old school",
    "she flaked out on me last night",
    "don't push your luck",
    "that girl, she's really something else!",
]
# Some imports
import json
from typing import Optional, TypeVar
from textwrap import dedent

from openai import OpenAI
from dotenv import load_dotenv
from pydantic import BaseModel

load_dotenv()
client = OpenAI()
MODEL = "gpt-4o-mini"

Approach 1: Just Ask #

The naive approach to doing this with an LLM is pretty obvious. Some terms, for reference:

  • L1 - language 1, the source language (English here)
  • L2 - language 2, the target language (French)
# For type hints
ResponseFormatType = TypeVar("ResponseFormatType", bound=BaseModel)

def llm(
    system_prompt: str,
    user_message: str,
    response_format: type[ResponseFormatType] | None = None,
) -> str | ResponseFormatType:
    """Basic wrapper for OpenAI API"""
    messages = [
        {"role": "system", "content": dedent(system_prompt.strip())},
        {"role": "user", "content": dedent(user_message.strip())},
    ]
    if response_format is None:
        response = client.chat.completions.create(
            messages=messages,
            model=MODEL,
        )
        content = response.choices[0].message.content
        if content is None:
            raise ValueError("No result from OpenAI")
        return content.strip()
    else:
        response = client.beta.chat.completions.parse(
            messages=messages,
            model=MODEL,
            response_format=response_format,
        )
        result = response.choices[0].message.content
        if result is None:
            raise ValueError("No result from OpenAI")
        return response_format(**json.loads(result))


def translate_v1(source: str, l1: str, l2: str) -> str:
    prompt = f"""
    Translate the following text from {l1} to {l2}.
    Output the translated text and nothing else.
    """
    return llm(prompt, source)

We can run this either over the whole text, or a phrase at a time.

batch_result_v1 = translate_v1("\n".join(source_items), "English", "French")
items_result_v1 = [translate_v1(text, "English", "French") for text in source_items]
# Although it's not totally necessary, I have this utility, so let's use it
import difflib
from rich.console import Console
from rich.text import Text


def compare_paragraphs(text1: str, text2: str):
    """Visually compare two multi-line texts"""
    signs = ["-", "+"]
    colors = ["red", "green"]

    output = Text()
    for s1, s2 in zip(text1.split("\n"), text2.split("\n")):
        if s1 == s2:
            output.append(s1 + "\n")
            output.append(s2 + "\n")
        else:
            diff = list(difflib.ndiff(s1.split(), s2.split()))
            for s, c in zip(signs, colors):
                for w in diff:
                    w = w.strip()
                    if w.startswith("?"):
                        continue
                    if w[0] in signs:
                        if w[0] == s:
                            output.append(w.split(" ")[1] + " ", style=f"bold {c}")
                    else:
                        output.append(w + " ")
                output.append("\n")
            output.append("\n")
    Console().print(output)
compare_paragraphs(batch_result_v1, "\n".join(items_result_v1))
tu es tiré d'affaire 
tu es tiré d'affaire 

les boissons sont offertes 
les boissons sont offertes 

il a arrêté du jour au lendemain 
il a arrêté du jour au lendemain 

ça va chauffer 
ça va chauffer 

je vais vraiment me goinfrer 
Je vais vraiment me goinfrer. 

tu rigoles 
tu te moques de moi 

il va faire sa demande 
il va poser la question 

calme-toi ! 
calme-toi ! 

je vais remettre ça à plus tard et te voir la semaine prochaine 
Je prendrai un report et je te verrai la semaine prochaine. 

je suis là comme un ours 
Je suis là comme un ours. 

Waouh, il se régale avec ce sandwich 
Wow, il s'attaque à ce sandwich. 

sa colère ivre est maintenant en pleine force, fais attention 
sa colère ivre est maintenant en pleine puissance, attention 

c'est tellement à l'ancienne 
c'est tellement rétro 

elle m'a laissé tomber hier soir 
elle m'a laissé tomber hier soir 

ne tente pas ta chance 
ne tente pas ta chance 

cette fille, elle est vraiment unique ! 
cette fille, elle est vraiment quelque chose ! 


Approach 2: Add Context #

A good general way of improving the quality of requests like this is to add explanatory context. In this case, it’s not particularly helpful, because the inputs don’t have any context, but nevertheless let’s add a context argument to our function.

def translate_v2(source: str, l1: str, l2: str, context: str | None = None) -> str:
    if context is not None:
        context = f"\n\n{context}\n\n"
    else:
        context = ""
    prompt = f"""
    Translate the following text from {l1} to {l2}.    
    {context}
    Output the translated text and nothing else.
    """
    return llm(prompt, source)

Approach 3: Forward and Back-translation #

Finally, the good bit. Broadly speaking, the gold-standard approach for translation of validated surveys and other texts where it’s important that meaning is preserved is as follows (see here):

  1. Translate: Translate the original material from L1 to L2.
  2. Back-translate: Have someone who has not seen the original material translate the result back from L2 to L1.
  3. Compare: Have a third person compare the original text to the back-translated version and note any changes in meaning.
  4. Retranslate: Use these notes to retranslate any items where the meaning has been changed.
  5. Iterate: Repeat steps 2-4 until the meaning of all items is consistent in both languages.

We can approximate all of these steps using LLM calls.

This iterative process is valuable for two reasons. First, it will obviously improve the quality of the translation for some cases. Second, it provides an indication that some texts where harder to translate (they took more iterations, or the meaning still differed after you’ve reached your last iteration). You should make sure your domain experts or human translators review these texts in particular. Often, if something proves difficult to translate, it indicates that the meaning wasn’t clear in the source language, so the best course of action may be to improve that first.

# Used by OpenAI's structured outputs
class ComparisonResult(BaseModel):
    same_meaning: bool
    notes: str


def compare_texts(
    text1: str, text2: str, context: Optional[str] = None
) -> ComparisonResult:
    system_prompt = """
    Compare the two texts provided (source and target), and indicate:
    - Whether the target has exactly the same meaning and connotations as the source, and
    - If the meaning has changed, note how (leave blank otherwise)
    """
    if context:
        system_prompt += "\n\nThe following context may help:\n\n{context}"
    message = f"""
    # Source text
    {text1}
    
    # Target text
    {text2}
    """
    return llm(system_prompt, message, ComparisonResult)


# Classes used to store results
class TranslationIteration(BaseModel):
    iteration: int = 1
    l1: str
    l2: str
    source: str
    forward: str
    backward: str
    comparison: ComparisonResult


class TranslationResult(BaseModel):
    success: bool
    iterations: int
    l1: str
    l2: str
    source: str
    forward: str
    backward: str
    history: list[TranslationIteration]


def iterative_translation(
    source: str, l1: str, l2: str, context: str = "", max_iter=3
) -> TranslationResult:
    # We'll store every iteration to inspect them later
    history = []
    for i in range(max_iter):
        forward = translate_v2(source, l1, l2, context)
        backward = translate_v2(forward, l2, l1, context)
        comparison = compare_texts(source, backward, context)
        iter_result = TranslationIteration(
            iteration=i + 1,
            l1=l1,
            l2=l2,
            source=source,
            forward=forward,
            backward=backward,
            comparison=comparison,
        )
        print(iter_result)  # Something to look at while it's running
        history.append(iter_result)
        if comparison.same_meaning:
            break
        else:
            new_context = f"""
            ## Previous failed translation
            Source: {source}
            Translation: {forward}
            Backtranslation: {backward}
            Differences between backtranslation and source:
            {comparison.notes}
            
            """
            context += dedent(new_context)
        print(context)
    return TranslationResult(
        success=comparison.same_meaning,
        iterations=len(history),
        l1=l1,
        l2=l2,
        source=source,
        forward=forward,
        backward=backward,
        history=history,
    )
results = [
    iterative_translation(text, "English", "French") 
    for text in source_items
]
iteration=1 l1='English' l2='French' source="you're off the hook" forward="tu es tiré d'affaire" backward='you are out of trouble' comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source='the drinks are on the house' forward='les drinks sont offerts' backward='drinks are on the house' comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source='he quit cold turkey' forward='il a arrêté du jour au lendemain' backward='he stopped overnight' comparison=ComparisonResult(same_meaning=False, notes="'Quit cold turkey' implies stopping something abruptly and completely, usually referencing addiction, while 'stopped overnight' suggests a sudden change but doesn't necessarily carry the connotation of an addiction or effort involved in quitting.")

## Previous failed translation
Source: he quit cold turkey
Translation: il a arrêté du jour au lendemain
Backtranslation: he stopped overnight
Differences between backtranslation and source:
'Quit cold turkey' implies stopping something abruptly and completely, usually referencing addiction, while 'stopped overnight' suggests a sudden change but doesn't necessarily carry the connotation of an addiction or effort involved in quitting.


iteration=2 l1='English' l2='French' source='he quit cold turkey' forward='il a arrêté 
👈Open for more outputs
brusquement' backward='he quit abruptly' comparison=ComparisonResult(same_meaning=False, notes="The phrase 'cold turkey' implies quitting something suddenly and completely, often in the context of addiction, while 'abruptly' suggests a sudden change but does not specifically convey the same connotation of a complete cessation or addiction context.")

## Previous failed translation
Source: he quit cold turkey
Translation: il a arrêté du jour au lendemain
Backtranslation: he stopped overnight
Differences between backtranslation and source:
'Quit cold turkey' implies stopping something abruptly and completely, usually referencing addiction, while 'stopped overnight' suggests a sudden change but doesn't necessarily carry the connotation of an addiction or effort involved in quitting.


## Previous failed translation
Source: he quit cold turkey
Translation: il a arrêté brusquement
Backtranslation: he quit abruptly
Differences between backtranslation and source:
The phrase 'cold turkey' implies quitting something suddenly and completely, often in the context of addiction, while 'abruptly' suggests a sudden change but does not specifically convey the same connotation of a complete cessation or addiction context.


iteration=3 l1='English' l2='French' source='he quit cold turkey' forward='il a arrêté net' backward='he quit flat out' comparison=ComparisonResult(same_meaning=False, notes="The phrase 'cold turkey' implies stopping an addiction abruptly without tapering off, while 'flat out' suggests being direct or unequivocal, not necessarily related to addiction. Therefore, the meanings and connotations differ.")

## Previous failed translation
Source: he quit cold turkey
Translation: il a arrêté du jour au lendemain
Backtranslation: he stopped overnight
Differences between backtranslation and source:
'Quit cold turkey' implies stopping something abruptly and completely, usually referencing addiction, while 'stopped overnight' suggests a sudden change but doesn't necessarily carry the connotation of an addiction or effort involved in quitting.


## Previous failed translation
Source: he quit cold turkey
Translation: il a arrêté brusquement
Backtranslation: he quit abruptly
Differences between backtranslation and source:
The phrase 'cold turkey' implies quitting something suddenly and completely, often in the context of addiction, while 'abruptly' suggests a sudden change but does not specifically convey the same connotation of a complete cessation or addiction context.


## Previous failed translation
Source: he quit cold turkey
Translation: il a arrêté net
Backtranslation: he quit flat out
Differences between backtranslation and source:
The phrase 'cold turkey' implies stopping an addiction abruptly without tapering off, while 'flat out' suggests being direct or unequivocal, not necessarily related to addiction. Therefore, the meanings and connotations differ.


iteration=1 l1='English' l2='French' source="it's about to go down" forward='ça va se passer' backward="It's going to happen." comparison=ComparisonResult(same_meaning=False, notes="The target text implies that something is going to happen but lacks the urgency or dramatic connotation of 'it's about to go down,' which suggests an imminent and possibly intense event.")

## Previous failed translation
Source: it's about to go down
Translation: ça va se passer
Backtranslation: It's going to happen.
Differences between backtranslation and source:
The target text implies that something is going to happen but lacks the urgency or dramatic connotation of 'it's about to go down,' which suggests an imminent and possibly intense event.


iteration=2 l1='English' l2='French' source="it's about to go down" forward='ça va chauffer' backward="It's about to heat up." comparison=ComparisonResult(same_meaning=False, notes="The phrase 'it's about to go down' often conveys that something significant, potentially confrontational or exciting, is about to happen, while 'it's about to heat up' suggests an increase in intensity, possibly in a competitive or dramatic sense. Although both phrases indicate forthcoming action, the nuances differ.")

## Previous failed translation
Source: it's about to go down
Translation: ça va se passer
Backtranslation: It's going to happen.
Differences between backtranslation and source:
The target text implies that something is going to happen but lacks the urgency or dramatic connotation of 'it's about to go down,' which suggests an imminent and possibly intense event.


## Previous failed translation
Source: it's about to go down
Translation: ça va chauffer
Backtranslation: It's about to heat up.
Differences between backtranslation and source:
The phrase 'it's about to go down' often conveys that something significant, potentially confrontational or exciting, is about to happen, while 'it's about to heat up' suggests an increase in intensity, possibly in a competitive or dramatic sense. Although both phrases indicate forthcoming action, the nuances differ.


iteration=3 l1='English' l2='French' source="it's about to go down" forward='ça va se passer' backward="it's going to happen." comparison=ComparisonResult(same_meaning=False, notes="The target text is less informal and lacks the urgency and dramatic connotation conveyed by 'it's about to go down.'")

## Previous failed translation
Source: it's about to go down
Translation: ça va se passer
Backtranslation: It's going to happen.
Differences between backtranslation and source:
The target text implies that something is going to happen but lacks the urgency or dramatic connotation of 'it's about to go down,' which suggests an imminent and possibly intense event.


## Previous failed translation
Source: it's about to go down
Translation: ça va chauffer
Backtranslation: It's about to heat up.
Differences between backtranslation and source:
The phrase 'it's about to go down' often conveys that something significant, potentially confrontational or exciting, is about to happen, while 'it's about to heat up' suggests an increase in intensity, possibly in a competitive or dramatic sense. Although both phrases indicate forthcoming action, the nuances differ.


## Previous failed translation
Source: it's about to go down
Translation: ça va se passer
Backtranslation: it's going to happen.
Differences between backtranslation and source:
The target text is less informal and lacks the urgency and dramatic connotation conveyed by 'it's about to go down.'


iteration=1 l1='English' l2='French' source="I'm going to totally pig out" forward='Je vais vraiment me goinfrer.' backward="I'm really going to stuff myself." comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source='are you shitting me?' forward='tu rigoles?' backward='Are you kidding?' comparison=ComparisonResult(same_meaning=False, notes='The source text is more vulgar and expresses disbelief or incredulity in a stronger manner than the target text, which is a more polite way of asking if someone is joking.')

## Previous failed translation
Source: are you shitting me?
Translation: tu rigoles?
Backtranslation: Are you kidding?
Differences between backtranslation and source:
The source text is more vulgar and expresses disbelief or incredulity in a stronger manner than the target text, which is a more polite way of asking if someone is joking.


iteration=2 l1='English' l2='French' source='are you shitting me?' forward='tu te fous de moi ?' backward='Are you shitting me?' comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source="he's going to pop the question" forward='il va poser la question' backward='he will ask the question' comparison=ComparisonResult(same_meaning=False, notes="The phrase 'pop the question' specifically refers to proposing marriage, while 'ask the question' is vague and does not necessarily imply a marriage proposal.")

## Previous failed translation
Source: he's going to pop the question
Translation: il va poser la question
Backtranslation: he will ask the question
Differences between backtranslation and source:
The phrase 'pop the question' specifically refers to proposing marriage, while 'ask the question' is vague and does not necessarily imply a marriage proposal.


iteration=2 l1='English' l2='French' source="he's going to pop the question" forward='il va faire sa demande' backward="he's going to propose" comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source='hold your horses!' forward='rendez-vous à vos chevaux !' backward='rendez-vous at your horses!' comparison=ComparisonResult(same_meaning=False, notes='The source text is an idiomatic expression meaning to wait or be patient, while the target text suggests meeting at the location of the horses, which completely changes the meaning.')

## Previous failed translation
Source: hold your horses!
Translation: rendez-vous à vos chevaux !
Backtranslation: rendez-vous at your horses!
Differences between backtranslation and source:
The source text is an idiomatic expression meaning to wait or be patient, while the target text suggests meeting at the location of the horses, which completely changes the meaning.


iteration=2 l1='English' l2='French' source='hold your horses!' forward='patientez un instant !' backward='wait a moment!' comparison=ComparisonResult(same_meaning=False, notes='The target text is a more general request for patience, whereas the source text has a more specific connotation of telling someone to be cautious or not to rush in doing something.')

## Previous failed translation
Source: hold your horses!
Translation: rendez-vous à vos chevaux !
Backtranslation: rendez-vous at your horses!
Differences between backtranslation and source:
The source text is an idiomatic expression meaning to wait or be patient, while the target text suggests meeting at the location of the horses, which completely changes the meaning.


## Previous failed translation
Source: hold your horses!
Translation: patientez un instant !
Backtranslation: wait a moment!
Differences between backtranslation and source:
The target text is a more general request for patience, whereas the source text has a more specific connotation of telling someone to be cautious or not to rush in doing something.


iteration=3 l1='English' l2='French' source='hold your horses!' forward='calme-toi !' backward='calm down!' comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source="I'll take a rain check and see you next week" forward='Je prendrai un report et je te verrai la semaine prochaine.' backward='I will take a report and I will see you next week.' comparison=ComparisonResult(same_meaning=False, notes="The phrase 'take a rain check' implies postponing or rescheduling an informal invitation, while 'take a report' changes the meaning to suggest receiving a document or information.")

## Previous failed translation
Source: I'll take a rain check and see you next week
Translation: Je prendrai un report et je te verrai la semaine prochaine.
Backtranslation: I will take a report and I will see you next week.
Differences between backtranslation and source:
The phrase 'take a rain check' implies postponing or rescheduling an informal invitation, while 'take a report' changes the meaning to suggest receiving a document or information.


iteration=2 l1='English' l2='French' source="I'll take a rain check and see you next week" forward='Je prendrai un autre rendez-vous et je te verrai la semaine prochaine.' backward='I will make another appointment and I will see you next week.' comparison=ComparisonResult(same_meaning=False, notes="The phrase 'take a rain check' implies postponing a current arrangement without specifying an alternative appointment, whereas 'make another appointment' suggests a new arrangement is being proactively set which changes the nuance of the original statement.")

## Previous failed translation
Source: I'll take a rain check and see you next week
Translation: Je prendrai un report et je te verrai la semaine prochaine.
Backtranslation: I will take a report and I will see you next week.
Differences between backtranslation and source:
The phrase 'take a rain check' implies postponing or rescheduling an informal invitation, while 'take a report' changes the meaning to suggest receiving a document or information.


## Previous failed translation
Source: I'll take a rain check and see you next week
Translation: Je prendrai un autre rendez-vous et je te verrai la semaine prochaine.
Backtranslation: I will make another appointment and I will see you next week.
Differences between backtranslation and source:
The phrase 'take a rain check' implies postponing a current arrangement without specifying an alternative appointment, whereas 'make another appointment' suggests a new arrangement is being proactively set which changes the nuance of the original statement.


iteration=3 l1='English' l2='French' source="I'll take a rain check and see you next week" forward='Je prendrai un report et je te verrai la semaine prochaine.' backward='I will take a rain check and see you next week.' comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source="I'm there like a bear" forward='Je suis là comme un ours.' backward='I am here like a bear.' comparison=ComparisonResult(same_meaning=False, notes="The source uses 'there' while the target uses 'here', changing the location reference.")

## Previous failed translation
Source: I'm there like a bear
Translation: Je suis là comme un ours.
Backtranslation: I am here like a bear.
Differences between backtranslation and source:
The source uses 'there' while the target uses 'here', changing the location reference.


iteration=2 l1='English' l2='French' source="I'm there like a bear" forward='Je suis là comme un ours.' backward='I am there like a bear.' comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source="Wow, he's going to town on that sandwich" forward="Wow, il s'attaque à ce sandwich." backward="Wow, he's going after that sandwich." comparison=ComparisonResult(same_meaning=False, notes='The target text suggests chasing after or pursuing the sandwich rather than eating it vigorously, which may imply a different level of enthusiasm or urgency.')

## Previous failed translation
Source: Wow, he's going to town on that sandwich
Translation: Wow, il s'attaque à ce sandwich.
Backtranslation: Wow, he's going after that sandwich.
Differences between backtranslation and source:
The target text suggests chasing after or pursuing the sandwich rather than eating it vigorously, which may imply a different level of enthusiasm or urgency.


iteration=2 l1='English' l2='French' source="Wow, he's going to town on that sandwich" forward='Wow, il se régale avec ce sandwich.' backward="Wow, he's enjoying that sandwich." comparison=ComparisonResult(same_meaning=False, notes="The target text changes the connotation from a more enthusiastic or vigorous eating ('going to town') to a calmer expression of enjoyment ('enjoying').")

## Previous failed translation
Source: Wow, he's going to town on that sandwich
Translation: Wow, il s'attaque à ce sandwich.
Backtranslation: Wow, he's going after that sandwich.
Differences between backtranslation and source:
The target text suggests chasing after or pursuing the sandwich rather than eating it vigorously, which may imply a different level of enthusiasm or urgency.


## Previous failed translation
Source: Wow, he's going to town on that sandwich
Translation: Wow, il se régale avec ce sandwich.
Backtranslation: Wow, he's enjoying that sandwich.
Differences between backtranslation and source:
The target text changes the connotation from a more enthusiastic or vigorous eating ('going to town') to a calmer expression of enjoyment ('enjoying').


iteration=3 l1='English' l2='French' source="Wow, he's going to town on that sandwich" forward='Wow, il dévore ce sandwich.' backward="Wow, he's devouring that sandwich." comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source='her drunken rage is now in full effect, watch out' forward='sa rage ivre est maintenant en plein effet, attention' backward='his drunken rage is now in full effect, beware' comparison=ComparisonResult(same_meaning=False, notes="The subject has changed from 'her' to 'his', which alters the meaning regarding who is experiencing rage. The warning has also changed from 'watch out' to 'beware', which may imply a slightly different tone.")

## Previous failed translation
Source: her drunken rage is now in full effect, watch out
Translation: sa rage ivre est maintenant en plein effet, attention
Backtranslation: his drunken rage is now in full effect, beware
Differences between backtranslation and source:
The subject has changed from 'her' to 'his', which alters the meaning regarding who is experiencing rage. The warning has also changed from 'watch out' to 'beware', which may imply a slightly different tone.


iteration=2 l1='English' l2='French' source='her drunken rage is now in full effect, watch out' forward='sa rage ivre est maintenant en plein effet, attention' backward='her drunken rage is now in full effect, watch out' comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source="that's so old school" forward="c'est tellement à l'ancienne" backward="it's so old-fashioned" comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source='she flaked out on me last night' forward="elle m'a posé un lapin hier soir" backward='She stood me up last night.' comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source="don't push your luck" forward='ne tente pas ta chance' backward="don't take your chances" comparison=ComparisonResult(same_meaning=False, notes='The source text implies a warning against risking a favorable situation by pushing for more, while the target text suggests a more general caution against taking risks.')

## Previous failed translation
Source: don't push your luck
Translation: ne tente pas ta chance
Backtranslation: don't take your chances
Differences between backtranslation and source:
The source text implies a warning against risking a favorable situation by pushing for more, while the target text suggests a more general caution against taking risks.


iteration=2 l1='English' l2='French' source="don't push your luck" forward='ne tente pas ta chance' backward='don’t push your luck' comparison=ComparisonResult(same_meaning=True, notes='')
iteration=1 l1='English' l2='French' source="that girl, she's really something else!" forward="cette fille, elle est vraiment quelqu'un d'autre !" backward='this girl, she is really someone else!' comparison=ComparisonResult(same_meaning=False, notes="The source uses 'something else' which is an idiomatic expression implying that the girl is extraordinary or special. The target changes this to 'someone else', which suggests a comparison or distinction, altering the meaning.")

## Previous failed translation
Source: that girl, she's really something else!
Translation: cette fille, elle est vraiment quelqu'un d'autre !
Backtranslation: this girl, she is really someone else!
Differences between backtranslation and source:
The source uses 'something else' which is an idiomatic expression implying that the girl is extraordinary or special. The target changes this to 'someone else', which suggests a comparison or distinction, altering the meaning.


iteration=2 l1='English' l2='French' source="that girl, she's really something else!" forward='cette fille, elle est vraiment exceptionnelle !' backward='this girl, she is really exceptional!' comparison=ComparisonResult(same_meaning=True, notes='')

Let’s pull out the cases that took more than one attempt.

difficult_cases = [r for r in results if r.iterations > 1]
for case in difficult_cases:
    print(f"\n\n# Phrase: '{case.source}'")
    for entry in case.history:
        print(entry.model_dump_json(indent=4))
# Phrase: 'he quit cold turkey'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "he quit cold turkey",
    "forward": "il a arrêté du jour au lendemain",
    "backward": "he stopped overnight",
    "comparison": {
        "same_meaning": false,
        "notes": "'Quit cold turkey' implies stopping something abruptly and completely, usually referencing addiction, while 'stopped overnight' suggests a sudden change but doesn't necessarily carry the connotation of an addiction or effort involved in quitting."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "he quit cold turkey",
    "forward": "il a arrêté brusquement",
    "backward": "he quit abruptly",
    "comparison": {
        "same_meaning": false,
        "notes": "The phrase 'cold turkey' implies quitting something suddenly and completely, often in the context of addiction, while 'abruptly' suggests a sudden change but does not specifically convey the same connotation of a complete cessation or addiction context."
    }
}
{
    "iteration": 3,
    "l1": "English",
    "l2": "French",
    "source": "he quit cold turkey",
    "forward": "il a arrêté net",
    "backward": "he quit flat out",
    "comparison": {
        "same_meaning": false,
        "notes": "The phrase 'cold turkey' implies stopping an addiction abruptly without tapering off, while 'flat out' suggests being direct or unequivocal, not necessarily related to addiction. Therefore, the meanings and connotations differ."
    }
}
👈Open for more outputs
# Phrase: 'it's about to go down'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "it's about to go down",
    "forward": "ça va se passer",
    "backward": "It's going to happen.",
    "comparison": {
        "same_meaning": false,
        "notes": "The target text implies that something is going to happen but lacks the urgency or dramatic connotation of 'it's about to go down,' which suggests an imminent and possibly intense event."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "it's about to go down",
    "forward": "ça va chauffer",
    "backward": "It's about to heat up.",
    "comparison": {
        "same_meaning": false,
        "notes": "The phrase 'it's about to go down' often conveys that something significant, potentially confrontational or exciting, is about to happen, while 'it's about to heat up' suggests an increase in intensity, possibly in a competitive or dramatic sense. Although both phrases indicate forthcoming action, the nuances differ."
    }
}
{
    "iteration": 3,
    "l1": "English",
    "l2": "French",
    "source": "it's about to go down",
    "forward": "ça va se passer",
    "backward": "it's going to happen.",
    "comparison": {
        "same_meaning": false,
        "notes": "The target text is less informal and lacks the urgency and dramatic connotation conveyed by 'it's about to go down.'"
    }
}


# Phrase: 'are you shitting me?'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "are you shitting me?",
    "forward": "tu rigoles?",
    "backward": "Are you kidding?",
    "comparison": {
        "same_meaning": false,
        "notes": "The source text is more vulgar and expresses disbelief or incredulity in a stronger manner than the target text, which is a more polite way of asking if someone is joking."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "are you shitting me?",
    "forward": "tu te fous de moi ?",
    "backward": "Are you shitting me?",
    "comparison": {
        "same_meaning": true,
        "notes": ""
    }
}


# Phrase: 'he's going to pop the question'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "he's going to pop the question",
    "forward": "il va poser la question",
    "backward": "he will ask the question",
    "comparison": {
        "same_meaning": false,
        "notes": "The phrase 'pop the question' specifically refers to proposing marriage, while 'ask the question' is vague and does not necessarily imply a marriage proposal."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "he's going to pop the question",
    "forward": "il va faire sa demande",
    "backward": "he's going to propose",
    "comparison": {
        "same_meaning": true,
        "notes": ""
    }
}


# Phrase: 'hold your horses!'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "hold your horses!",
    "forward": "rendez-vous à vos chevaux !",
    "backward": "rendez-vous at your horses!",
    "comparison": {
        "same_meaning": false,
        "notes": "The source text is an idiomatic expression meaning to wait or be patient, while the target text suggests meeting at the location of the horses, which completely changes the meaning."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "hold your horses!",
    "forward": "patientez un instant !",
    "backward": "wait a moment!",
    "comparison": {
        "same_meaning": false,
        "notes": "The target text is a more general request for patience, whereas the source text has a more specific connotation of telling someone to be cautious or not to rush in doing something."
    }
}
{
    "iteration": 3,
    "l1": "English",
    "l2": "French",
    "source": "hold your horses!",
    "forward": "calme-toi !",
    "backward": "calm down!",
    "comparison": {
        "same_meaning": true,
        "notes": ""
    }
}


# Phrase: 'I'll take a rain check and see you next week'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "I'll take a rain check and see you next week",
    "forward": "Je prendrai un report et je te verrai la semaine prochaine.",
    "backward": "I will take a report and I will see you next week.",
    "comparison": {
        "same_meaning": false,
        "notes": "The phrase 'take a rain check' implies postponing or rescheduling an informal invitation, while 'take a report' changes the meaning to suggest receiving a document or information."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "I'll take a rain check and see you next week",
    "forward": "Je prendrai un autre rendez-vous et je te verrai la semaine prochaine.",
    "backward": "I will make another appointment and I will see you next week.",
    "comparison": {
        "same_meaning": false,
        "notes": "The phrase 'take a rain check' implies postponing a current arrangement without specifying an alternative appointment, whereas 'make another appointment' suggests a new arrangement is being proactively set which changes the nuance of the original statement."
    }
}
{
    "iteration": 3,
    "l1": "English",
    "l2": "French",
    "source": "I'll take a rain check and see you next week",
    "forward": "Je prendrai un report et je te verrai la semaine prochaine.",
    "backward": "I will take a rain check and see you next week.",
    "comparison": {
        "same_meaning": true,
        "notes": ""
    }
}


# Phrase: 'I'm there like a bear'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "I'm there like a bear",
    "forward": "Je suis là comme un ours.",
    "backward": "I am here like a bear.",
    "comparison": {
        "same_meaning": false,
        "notes": "The source uses 'there' while the target uses 'here', changing the location reference."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "I'm there like a bear",
    "forward": "Je suis là comme un ours.",
    "backward": "I am there like a bear.",
    "comparison": {
        "same_meaning": true,
        "notes": ""
    }
}


# Phrase: 'Wow, he's going to town on that sandwich'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "Wow, he's going to town on that sandwich",
    "forward": "Wow, il s'attaque à ce sandwich.",
    "backward": "Wow, he's going after that sandwich.",
    "comparison": {
        "same_meaning": false,
        "notes": "The target text suggests chasing after or pursuing the sandwich rather than eating it vigorously, which may imply a different level of enthusiasm or urgency."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "Wow, he's going to town on that sandwich",
    "forward": "Wow, il se régale avec ce sandwich.",
    "backward": "Wow, he's enjoying that sandwich.",
    "comparison": {
        "same_meaning": false,
        "notes": "The target text changes the connotation from a more enthusiastic or vigorous eating ('going to town') to a calmer expression of enjoyment ('enjoying')."
    }
}
{
    "iteration": 3,
    "l1": "English",
    "l2": "French",
    "source": "Wow, he's going to town on that sandwich",
    "forward": "Wow, il dévore ce sandwich.",
    "backward": "Wow, he's devouring that sandwich.",
    "comparison": {
        "same_meaning": true,
        "notes": ""
    }
}


# Phrase: 'her drunken rage is now in full effect, watch out'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "her drunken rage is now in full effect, watch out",
    "forward": "sa rage ivre est maintenant en plein effet, attention",
    "backward": "his drunken rage is now in full effect, beware",
    "comparison": {
        "same_meaning": false,
        "notes": "The subject has changed from 'her' to 'his', which alters the meaning regarding who is experiencing rage. The warning has also changed from 'watch out' to 'beware', which may imply a slightly different tone."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "her drunken rage is now in full effect, watch out",
    "forward": "sa rage ivre est maintenant en plein effet, attention",
    "backward": "her drunken rage is now in full effect, watch out",
    "comparison": {
        "same_meaning": true,
        "notes": ""
    }
}


# Phrase: 'don't push your luck'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "don't push your luck",
    "forward": "ne tente pas ta chance",
    "backward": "don't take your chances",
    "comparison": {
        "same_meaning": false,
        "notes": "The source text implies a warning against risking a favorable situation by pushing for more, while the target text suggests a more general caution against taking risks."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "don't push your luck",
    "forward": "ne tente pas ta chance",
    "backward": "don’t push your luck",
    "comparison": {
        "same_meaning": true,
        "notes": ""
    }
}


# Phrase: 'that girl, she's really something else!'
{
    "iteration": 1,
    "l1": "English",
    "l2": "French",
    "source": "that girl, she's really something else!",
    "forward": "cette fille, elle est vraiment quelqu'un d'autre !",
    "backward": "this girl, she is really someone else!",
    "comparison": {
        "same_meaning": false,
        "notes": "The source uses 'something else' which is an idiomatic expression implying that the girl is extraordinary or special. The target changes this to 'someone else', which suggests a comparison or distinction, altering the meaning."
    }
}
{
    "iteration": 2,
    "l1": "English",
    "l2": "French",
    "source": "that girl, she's really something else!",
    "forward": "cette fille, elle est vraiment exceptionnelle !",
    "backward": "this girl, she is really exceptional!",
    "comparison": {
        "same_meaning": true,
        "notes": ""
    }
}

Longer Texts #

While this approach works well for shorter texts, repeatedly processing longer content in this way doesn’t make sense. That’s a much more difficult problem, but I would approach it as follows:

  • First, identify any key terms in the document and and translate them using the approach above (context included), or using human experts if possible.
  • Break the document down into mid-sized sections that you could confidently send to an LLM to attempt the initial translation. Make sure to include enough context for these sections to make sense.
  • Use structured outputs to generate a list of backtranslations (first API call) and comparisons to the source (second API call) for each section. You might do this separately for each paragraph in the section, if paragraphs are small enough, or sentences otherwise. As is often the case with LLMs, chunking strategy is important.
  • Combine these comparison results to provide an updated context that can be used to improve the translation of the section.
  • Rinse, repeat.