This patent discloses techniques for adapting previously-annotated training examples into updated training examples for training machine learning models. The core idea involves identifying a specific part (the “find expression”) within a targeted subset of training examples (defined by a “filtering constraint”) and replacing it with a new part (the “replacement expression”). This process allows for efficient modification of existing training data to reflect changes in system capabilities, user expectations, or to correct inaccuracies.
Main Themes and Important Ideas/Facts:
Problem Addressed: Challenges in Maintaining and Updating AI Training Data:
- Creating human-annotated training data for AI systems, particularly conversational computing systems, is time-consuming, requires skilled annotators, and can be error-prone.
- Maintaining the correctness and relevance of large training datasets is challenging due to:
- Human errors and inconsistencies: Annotators may interpret inputs differently.
- Evolving system capabilities and syntax: APIs and programming languages used in annotations may change.
- Shifting user expectations: How users intend the system to respond to certain inputs can evolve.
- These challenges necessitate methods for efficiently updating existing training data instead of relying solely on creating new annotations from scratch.
Proposed Solution: Targeted Adaptation of Training Examples:
- The patent introduces a computer program-implemented method for adapting previously-annotated training examples.
- The key components of this method are:
- Find Expression: A pattern or specific piece of content to be located within the training data.
- Replacement Expression: The new content that will replace the identified instances of the find expression.
- Filtering Constraint: A set of rules or criteria used to select a specific subset of the existing training examples where the replacement should occur. This allows for targeted updates without affecting the entire dataset.
- The process involves:
- Recognizing the find expression, replacement expression, and filtering constraint.
- Identifying instances of the find expression within the subset of training examples that meet the filtering constraint.
- Replacing these identified instances with the replacement expression to generate an updated subset of training examples.
- Outputting the updated subset, which can then be used for training the machine learning model.
Filtering Constraints for Granular Control:
- Filtering constraints are crucial for ensuring that updates are applied only where necessary.
- They allow users to target specific “subregions” of the training data. These subregions can include:
- Specific training examples identified by their identifiers.
- Sections within a training example, such as the input utterance, the annotated plan (program fragment), or preamble information.
- Training examples containing specific keywords within certain subregions.
- Dialogues based on associated metadata like authors, timestamps, or descriptive tags.
Migration Interface for User Interaction:
- The patent envisions a “migration interface” that allows human users (migrators) to define the find expression, replacement expression, and filtering constraints.
- This interface likely includes fields for inputting the expressions and tools for selecting and configuring filters based on various criteria (e.g., preamble inclusion, turn predicates, dialogue filters).
- Examples of dialogue filters include targeting by tags, authors, creation/update dates, dialogue IDs, annotation language, or the presence/absence of specific utterances.
- Turn predicates allow filtering based on conditions within individual turns of a dialogue (e.g., presence of specific tags or matching a given expression).
Preview Functionality for Testing and Validation:
- Recognizing that migrations on large datasets can be time-consuming, the patent describes a “preview feature.”
- This feature allows users to apply the defined migration to a smaller “previewed portion” of the training data before committing to the full migration.
- Users can review the changes and any errors introduced in the preview to ensure the migration behaves as intended.
- A “preview constraint” can be used to define the scope or size of the previewed portion.
Application to Conversational Computing Systems (but not limited to):
- The patent frequently uses the example of training conversational computing systems, where training examples consist of user utterances paired with desired annotated responses (e.g., computer-executable plans).
- However, the disclosed techniques are presented as potentially applicable to adapting training data for other types of machine learning models as well.
Workflow Integration:
- The updated training examples can be used in conjunction with the original, unchanged examples to train or retrain a machine learning model.
- The patent also mentions the potential integration of annotation and migration tools within the same training program or across different systems.
Examples of Migration Scenarios:
- The patent provides several illustrative examples of how these techniques could be used:
- Correcting inconsistent interpretations of user utterances in annotations.
- Migrating annotations to use new or updated APIs.
- Adapting annotations to reflect evolving programming syntax or formatting.
- Adjusting system responses based on changing user expectations or regional differences.
- Changing units in annotations (e.g., days to weeks in date calculations).
- Modifying how the system searches for meeting attendees.
- Refactoring annotation structures (e.g., representing date ranges with a constraint object).