Add backwards compatibility for add_pattern_pair

mattieruth · mattieruth · commit 6b2d9b3bc8e9 · 2025-11-13T13:23:52.000-05:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -51,8 +51,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   - Introduced a new `Aggregation` dataclass to represent both the aggregated `text` and
     a string identifying the `type` of aggregation (ex. "sentence", "word", "my custom
     aggregation")
-    # MRKB TODO -- don't break. leave pattern_id as-is and remove 'type', so that the old
-              remove param can remain
   - **BREAKING**: `BaseTextAggregator.text` now returns an `Aggregation` (instead of `str`).
     To update: `aggregated_text = myAggregator.text` -> `aggregated_text = myAggregator.text.text`
   - **BREAKING**: `BaseTextAggregator.aggregate()` now returns `Optional[Aggregation]`
@@ -65,36 +63,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   - `SimpleTextAggregator`, `SkipTagsAggregator`, `PatternPairAggregator` updated to
     produce/consume `Aggregation` objects.
 
-- Augmented the `PatterPairAggregator`:
-  - **BREAKING CHANGES**: Support for the items below resulted in two breaking changes to the
-    `PatternPairAggregator` methods
-    1. The `add_pattern_pair` method arguments have changed:
-        - The `pattern_id` argument is now `type`
-        - The `remove_match` argument has been replaced with the `action` argument. To update,
-          change `remove_match: True` to `action: MatchAction.REMOVE` or `remove_match: False` to
-          `action: MatchAction.KEEP`
-    2. The `PatternMatch` type returned to handlers registered via `on_pattern_match` has been
-       updated to subclass from the new `Aggregation` type, which means that `content` has been
-       replaced with `text` and `pattern_id` has been replaced with `type`:
-         ```
-         async dev on_match_tag(match: PatternMatch):
-            pattern = match.type # instead of match.pattern_id
-            text = match.text # instead of match.content
-         ```
-  - `PatternPairAggregator` now supports `type` and `action` per pattern.
-  - New `MatchAction` enum: `REMOVE`, `KEEP`, `AGGREGATE`, allowing customization for how
-    a match should be handled.
-    - `REMOVE`: The text along with its delimiters will be removed from the streaming text.
-                Sentence aggregation will continue on as if this text did not exist.
-    - `KEEP`: The delimiters will be removed, but the content between them will be kept.
-              Sentence aggregation will continue on with the internal text included.
-    - `AGGREGATE`: The delimiters will be removed and the content between will be treated
-              as a separate aggregation. Any text before the start of the pattern will be
-              returned early, whether or not a complete sentence was found. Then the pattern
-              will be returned. Then the aggregation will continue on sentence matching after
-              the closing delimiter is found. The content between the delimiters is not
-              aggregated by sentence. It is aggregated as one single block of text.
-    - `PatternMatch` now extends `Aggregation` and provides richer info to handlers.
+- Augmented the `PatternPairAggregator`:
+  - Introduced a new, preferred version of `add_pattern` to support a new option for treating a
+    match as a separate aggregation returned from `aggregate()`. This replaces the now
+    deprecated `add_pattern_pair` method and you provide a `MatchAction` in lieu of the `remove_match` field.
+    - `MatchAction` enum: `REMOVE`, `KEEP`, `AGGREGATE`, allowing customization for how
+      a match should be handled.
+      - `REMOVE`: The text along with its delimiters will be removed from the streaming text.
+                  Sentence aggregation will continue on as if this text did not exist.
+      - `KEEP`: The delimiters will be removed, but the content between them will be kept.
+                Sentence aggregation will continue on with the internal text included.
+      - `AGGREGATE`: The delimiters will be removed and the content between will be treated
+                as a separate aggregation. Any text before the start of the pattern will be
+                returned early, whether or not a complete sentence was found. Then the pattern
+                will be returned. Then the aggregation will continue on sentence matching after
+                the closing delimiter is found. The content between the delimiters is not
+                aggregated by sentence. It is aggregated as one single block of text.
+      - `PatternMatch` now extends `Aggregation` and provides richer info to handlers.
+  - **BREAKING**: The `PatternMatch` type returned to handlers registered via `on_pattern_match`
+     has been updated to subclass from the new `Aggregation` type, which means that `content`
+     has been replaced with `text` and `pattern_id` has been replaced with `type`:
+       ```
+       async dev on_match_tag(match: PatternMatch):
+          pattern = match.type # instead of match.pattern_id
+          text = match.text # instead of match.content
+       ```
 
 ### Changed
 
@@ -127,6 +120,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   behavior, but if you want to override the aggregation behavior, you should use the new
   processor.
 
+- Deprecated `add_pattern_pair` in the `PatternPairAggregator` which takes a `pattern_id`
+  and `remove_match` field in favor of the new `add_pattern` method which takes a `type` and an
+  `action`
+
 ### Fixed
 
 - Fixed subtle issue of assistant context messages ending up with double spaces
diff --git a/examples/foundational/35-pattern-pair-voice-switching.py b/examples/foundational/35-pattern-pair-voice-switching.py
@@ -110,7 +110,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
     pattern_aggregator = PatternPairAggregator()
 
     # Add pattern for voice switching
-    pattern_aggregator.add_pattern_pair(
+    pattern_aggregator.add_pattern(
         type="voice",
         start_pattern="<voice>",
         end_pattern="</voice>",
diff --git a/src/pipecat/extensions/ivr/ivr_navigator.py b/src/pipecat/extensions/ivr/ivr_navigator.py
@@ -118,15 +118,15 @@ def _get_conversation_history(self) -> List[dict]:
     def _setup_xml_patterns(self):
         """Set up XML pattern detection and handlers."""
         # Register DTMF pattern
-        self._aggregator.add_pattern_pair("dtmf", "<dtmf>", "</dtmf>", action=MatchAction.REMOVE)
+        self._aggregator.add_pattern("dtmf", "<dtmf>", "</dtmf>", action=MatchAction.REMOVE)
         self._aggregator.on_pattern_match("dtmf", self._handle_dtmf_action)
 
         # Register mode pattern
-        self._aggregator.add_pattern_pair("mode", "<mode>", "</mode>", action=MatchAction.REMOVE)
+        self._aggregator.add_pattern("mode", "<mode>", "</mode>", action=MatchAction.REMOVE)
         self._aggregator.on_pattern_match("mode", self._handle_mode_action)
 
         # Register IVR pattern
-        self._aggregator.add_pattern_pair("ivr", "<ivr>", "</ivr>", action=MatchAction.REMOVE)
+        self._aggregator.add_pattern("ivr", "<ivr>", "</ivr>", action=MatchAction.REMOVE)
         self._aggregator.on_pattern_match("ivr", self._handle_ivr_action)
 
     async def process_frame(self, frame: Frame, direction: FrameDirection):
diff --git a/src/pipecat/utils/text/pattern_pair_aggregator.py b/src/pipecat/utils/text/pattern_pair_aggregator.py
@@ -25,9 +25,16 @@ class MatchAction(Enum):
     """Actions to take when a pattern pair is matched.
 
     Parameters:
-        REMOVE: Remove the matched pattern from the text.
-        KEEP: Keep the matched pattern in the text as normal text.
-        AGGREGATE: Return the matched pattern as a separate aggregation object.
+        REMOVE: The text along with its delimiters will be removed from the streaming text.
+              Sentence aggregation will continue on as if this text did not exist.
+        KEEP: The delimiters will be removed, but the content between them will be kept.
+              Sentence aggregation will continue on with the internal text included.
+        AGGREGATE: The delimiters will be removed and the content between will be treated
+              as a separate aggregation. Any text before the start of the pattern will be
+              returned early, whether or not a complete sentence was found. Then the pattern
+              will be returned. Then the aggregation will continue on sentence matching after
+              the closing delimiter is found. The content between the delimiters is not
+              aggregated by sentence. It is aggregated as one single block of text.
     """
 
     REMOVE = "remove"
@@ -106,7 +113,7 @@ def text(self) -> Aggregation:
             return Aggregation(self._text, pattern_start[1].get("type", "sentence"))
         return Aggregation(self._text, "sentence")
 
-    def add_pattern_pair(
+    def add_pattern(
         self,
         type: str,
         start_pattern: str,
@@ -148,6 +155,46 @@ def add_pattern_pair(
         }
         return self
 
+    def add_pattern_pair(
+        self, pattern_id: str, start_pattern: str, end_pattern: str, remove_match: bool = True
+    ):
+        """Add a pattern pair to detect in the text.
+
+        .. deprecated:: 0.0.95
+            This function is deprecated and will be removed in a future version.
+            Use `add_pattern` with a type and MatchAction instead.
+
+            This method calls `add_pattern` setting type with the provided pattern_id and action
+            to either MatchAction.REMOVE or MatchAction.KEEP based on `remove_match`.
+
+        Args:
+            pattern_id: Identifier for this pattern pair. Should be unique and ideally descriptive.
+                        (e.g., 'code', 'speaker', 'custom'). pattern_id can not be 'sentence' as that is
+                        reserved for the default behavior.
+            start_pattern: Pattern that marks the beginning of content.
+            end_pattern: Pattern that marks the end of content.
+            remove_match: If True, the matched pattern will be removed from the text. (Same as MatchAction.REMOVE)
+                          If False, it will be kept and treated as normal text. (Same as MatchAction.KEEP)
+        """
+        import warnings
+
+        with warnings.catch_warnings():
+            warnings.simplefilter("once")
+            warnings.warn(
+                "add_pattern_pair with a pattern_id or remove_match is deprecated and will be"
+                " removed in a future version. Use add_pattern with a type and MatchAction instead",
+                DeprecationWarning,
+                stacklevel=2,
+            )
+
+        action = MatchAction.REMOVE if remove_match else MatchAction.KEEP
+        return self.add_pattern(
+            type=pattern_id,
+            start_pattern=start_pattern,
+            end_pattern=end_pattern,
+            action=action,
+        )
+
     def on_pattern_match(
         self, type: str, handler: Callable[[PatternMatch], Awaitable[None]]
     ) -> "PatternPairAggregator":
diff --git a/tests/test_pattern_pair_aggregator.py b/tests/test_pattern_pair_aggregator.py
@@ -22,12 +22,11 @@ def setUp(self):
 
         # Add a test pattern
         self.aggregator.add_pattern_pair(
-            type="test_pattern",
+            pattern_id="test_pattern",
             start_pattern="<test>",
             end_pattern="</test>",
-            action=MatchAction.REMOVE,
         )
-        self.aggregator.add_pattern_pair(
+        self.aggregator.add_pattern(
             type="code_pattern",
             start_pattern="<code>",
             end_pattern="</code>",
@@ -119,14 +118,14 @@ async def test_multiple_patterns(self):
         voice_handler = AsyncMock()
         emphasis_handler = AsyncMock()
 
-        self.aggregator.add_pattern_pair(
+        self.aggregator.add_pattern(
             type="voice",
             start_pattern="<voice>",
             end_pattern="</voice>",
             action=MatchAction.REMOVE,
         )
 
-        self.aggregator.add_pattern_pair(
+        self.aggregator.add_pattern(
             type="emphasis",
             start_pattern="<em>",
             end_pattern="</em>",