This article explores some of OpenAI’s recently published patents and patent applications, shedding light on their innovative approaches to training speech recognition models and iteratively editing text with chatbots. From leveraging weakly supervised training on large datasets to refining methods for generating and editing text, these filings reveal OpenAI’s strategies for tackling complex AI challenges. However, they also raise intriguing questions about detectability and enforcement, particularly in the context of real-world applications.
Fighting Hate Speech with LLMs
US 2024/362421 A1 aligns with OpenAI’s ongoing strategy to rapidly secure patents by filing a Track One request for accelerated processing. This approach aims to speed up the examination process, but this particular application has had an interesting journey so far.
The Journey of US 2024/362421 A1
Upon filing, the process initially seemed straightforward: the examiner issued a notice of allowance immediately. But things took a twist when OpenAI submitted an Information Disclosure Statement (IDS) with additional prior art. This led the examiner to reconsider, withdrawing the allowance and issuing new objections. Now, after multiple rounds of claim amendments and office actions, the application remains pending. OpenAI has filed a priority-claiming PCT application (WO 2024/226089 A1).
A Closer Look at the Invention
The patent application tackles the increasingly relevant issue of automated content classification and moderation using a large language model (LLM). With online content increasing exponentially, moderating platforms for harmful or unwanted content, such as self-harm encouragement or hate speech, is more challenging than ever. OpenAI’s approach centers on training LLMs to identify these nuanced forms of content with greater accuracy and efficiency.
Interestingly, this task of content classification is highly nuanced. For example, Fig. 4 of the application presents four similar sentences, where only the first three contain content that should be classified as hateful:
The Core Solution
The independent claims outline a process that comprises four key steps (this is a simplification of the real claim wording):
- Generating a Content Taxonomy: This taxonomy organizes content into categories and subcategories, which are ranked by a prediction metric to distinguish desired from undesired content.
- Generating Training Data: The system compiles a robust dataset tailored to the specific categories outlined in the taxonomy.
- Iteratively Optimizing the Language Model: Using both the taxonomy and training data, the model undergoes continuous optimization to refine its content classification skills.
- Moderating Content: Finally, the optimized LLM is deployed to moderate content according to the predefined taxonomy, enabling it to categorize new content in real-time with minimal human intervention.
Potential Enforcement Challenges Ahead?
One notable issue with the claims is the lack of a clear separation between configuration and runtime processes. By merging setup actions, such as generating a content taxonomy and training data, with operational steps like content moderation, the claims could be challenging to enforce. This blending introduces ambiguity about whether a single entity must perform the entire set of activities, as configuration and runtime steps may occur separately or be carried out by different parties. Such a structure complicates enforcement, as it would require proving that an infringer conducts both setup and operational actions in a single, continuous workflow—an alignment that may be impractical in many real-world scenarios.
Iterative Text Editing with a Chatbot
OpenAI explores the concept of iteratively editing text with a chatbot in WO 2024/191475 A1. The application claims priority to two US filings, both of which have already been granted with narrower claim sets via the TrackOne route for expedited examination.
The specification provides a straightforward example: a user prompts the language model (LLM) to write a poem and then iteratively modifies it by instructing the LLM to adjust tone and format:
A Broad Starting Point
The originally filed claims, which are currently pending, are broad in scope. For instance, independent method claim 11 outlines the following process:
11. A computer-implemented method for automatically generating and editing text, comprising:
receiving an input text prompt;
receiving one or more user instructions;
accessing a language model based on the input text prompt and the one or more user instructions;
outputting, using the accessed language model, language model output text; and
editing the input text prompt based on the language model and the one or more user instructions by replacing at least a portion of the input text prompt with the language model output text.
One notable strength of such a claim is its detectability. The described functionality is externally observable in practical use cases, making potential infringement easier to identify.
In that sense, this application is similar to the other “human-machine interaction” patents of OpenAI which I discussed in this video:
Potential Challenges Ahead
However, the fate of these claims in examination remains to be seen. The priority date of March 14, 2023, postdating the launch of ChatGPT in November 2022, may introduce complications regarding prior art. The International Search Report (ISR) cites two “X” references, one of which is a GPT-2 tutorial.
Training Speech Recognition Models
US patent 12,079,587 B1 tackles the challenges of training speech recognition models, focusing on the limitations of conventional supervised pre-training due to sparse labeled training data. The patent highlights the stark contrast between 5,000 hours of public labeled supervised datasets and 1 million hours of unlabeled data suitable for unsupervised training.
To bridge this gap, the patent proposes leveraging weakly supervised training on significantly larger labeled audio datasets. Furthermore, the patent suggests that multilingual and multitask training can further enhance model performance by utilizing diverse language data and multiple tasks during training.
A Closer Look at the Claims
Independent system claim 1 describes a specific training approach that includes the following operation:
obtaining a transformer model including an encoder and a decoder, the transformer model trained to transcribe or translate audio data in multiple languages using labeled audio data, the labeled audio data including first audio segments associated with first same-language transcripts of the first audio segments and second audio segments associated with second different-language transcripts of the second audio segments
The Challenge of Detectability
One notable aspect of this claim is its reliance on specific labeled training data, a feature that provides clear differentiation from generic training methods. However, this specificity introduces a key challenge: the use of such training data is likely difficult to detect externally. Identifying potential infringement may prove complex, as verifying compliance with the patented method would require access to the internal training processes and datasets—information often guarded as proprietary.
Stay Updated on the Latest in AI Patents
OpenAI’s recent patent activity offers a fascinating glimpse into the cutting-edge developments shaping AI technology and its applications. If you’re interested in staying informed about the latest trends in AI innovation, patent strategies, and their real-world implications, be sure to subscribe to my newsletter: