Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit f884bf1

Browse files
Create 20220620-tfx-modify-bulkinferrer.md
1 parent eff9c82 commit f884bf1

File tree

1 file changed

+234
-0
lines changed

1 file changed

+234
-0
lines changed
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
# Title of RFC
2+
3+
| Status | Proposed |
4+
:-------------- |:----------------------------------------------------------------------------------------------|
5+
| **RFC #** | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #) |
6+
| **Author(s)** | Abin Thomas ([email protected]), Iain Stitt ([email protected]) |
7+
| **Sponsor** | Robert Crowe ([email protected]) |
8+
| **Updated** | 2020-06-20 |
9+
10+
## Objective
11+
12+
Modify [BulkInferrer](https://github.com/tensorflow/tfx/tree/master/tfx/components/bulk_inferrer) TFX component.
13+
14+
Changes :-
15+
* Store only a subset of features in `output_examples` artifact.
16+
* Support inference on multiple models.
17+
18+
## Motivation
19+
20+
A BulkInferrer TFX component is used to perform batch inference on unlabeled tf.Examples.
21+
The generated output examples contains the original features and the prediction results.
22+
Keeping all original features in the output is troubling when dealing with feature heavy models.
23+
For most of the use cases we only require example identifiers and the predictions in the output.
24+
25+
In machine learning, it is a common practice to train multiple models using the same feature set to perform different tasks (sometimes same tasks).
26+
It will be convenient to have a multimodel inference feature in bulk-inferrer. The component should take a list of models and produce predictions for all models.
27+
28+
## User Benefit
29+
30+
Filtering down the number of features in the output helps to reduce storage space for artifcats.
31+
It allows us to use larger batch sizes in downstream processing and reduces the chance of OOM issue.
32+
33+
Multimodel inference when done separately requires joining of the outputs on some identifiers, which is computationally and otherwise expensive.
34+
With this update the user can do post-processing directly without joining different outputs.
35+
36+
## Design Proposal
37+
38+
This is the meat of the document, where you explain your proposal. If you have
39+
multiple alternatives, be sure to use sub-sections for better separation of the
40+
idea, and list pros/cons to each approach. If there are alternatives that you
41+
have eliminated, you should also list those here, and explain why you believe
42+
your chosen approach is superior.
43+
44+
### Filter Output Features
45+
46+
The component decides whether to keep all the features or not based on an additional field in `OutputExampleSpec` proto.
47+
The updated proto will look like this:-
48+
```protobuf
49+
message OutputExampleSpec {
50+
// Defines how the inferrence results map to columns in output example.
51+
repeated OutputColumnsSpec output_columns_spec = 3;
52+
repeated string example_features = 5;
53+
54+
reserved 1, 2, 4;
55+
}
56+
```
57+
`example_features` expects a list of feature names to be persisted in the output. Component will not filter if an empty list is provided.
58+
The check and filtering will be performed in the [prediction_to_example_utils.py](https://github.com/tensorflow/tfx/blob/master/tfx/components/bulk_inferrer/prediction_to_example_utils.py#L86).
59+
60+
Check:-
61+
```python
62+
def convert(prediction_log: prediction_log_pb2.PredictionLog,
63+
output_example_spec: _OutputExampleSpecType) -> tf.train.Example:
64+
65+
66+
67+
if len(output_example_spec.example_features) > 0:
68+
example = _filter_columns(example, output_example_spec)
69+
70+
return _add_columns(example, output_features)
71+
```
72+
`_filter_columns` function:-
73+
```python
74+
def _filter_columns(example: tf.train.Example,
75+
output_example_spec: _OutputExampleSpecType) -> tf.train.Example:
76+
"""Remove features not in output_example_spec.example_features"""
77+
all_features = list(example.features.feature)
78+
for feature in all_features:
79+
if feature not in output_example_spec.example_features:
80+
del example.features.feature[feature]
81+
return example
82+
```
83+
84+
### Mulitmodel Inference
85+
86+
For muli-model inference, the component will expect a union channel of models as input.
87+
[RunInference](https://github.com/tensorflow/tfx/blob/master/tfx/components/bulk_inferrer/executor.py#L253) will be performed using [RunInferencePerModel](https://github.com/tensorflow/tfx-bsl/blob/master/tfx_bsl/public/beam/run_inference.py#L101) method from tfx-bsl.
88+
This method will return a tuple of prediction logs instead of one single log.
89+
In subsequent steps these multiple logs will be merged to produce one single tf.Example.
90+
If raw inference_results are expected, then the component will save the predictions logs in inference_result subdirectories.
91+
92+
#### Changes to input protos
93+
94+
`model_spec` and `output_example_spec` parameters expect `ModelSpec` and `OutputExampleSpec` protos respectively.
95+
For supporting multiple models and also keeping in mind backward compatibility, self referencing proto definitions can be used.
96+
97+
`model_spec` : -
98+
```protobuf
99+
message ModelSpec {
100+
// Specifies the signature name to run the inference with. If multiple
101+
// signature names are specified (ordering doesn't matter), inference is done
102+
// as a multi head model. If nothing is specified, default serving signature
103+
// is used as a single head model.
104+
repeated string model_signature_name = 2;
105+
106+
// Tags to select metagraph from the saved model. If unspecified, the default
107+
// tag selects metagraph to run inference on CPUs. See some valid values in
108+
// tensorflow.saved_model.tag_constants.
109+
repeated string tag = 5;
110+
111+
// handle multiple ModelSpec
112+
repeated ModelSpec model_specs = 7;
113+
114+
reserved 1, 3, 4, 6;
115+
}
116+
```
117+
118+
`output_example_spec` : -
119+
```protobuf
120+
message OutputExampleSpec {
121+
// Defines how the inferrence results map to columns in output example.
122+
repeated OutputColumnsSpec output_columns_spec = 3;
123+
124+
// List of features to maintain in the output_examples
125+
repeated string example_features = 5;
126+
127+
// handle multiple OutputExampleSpec
128+
repeated OutputExampleSpec output_example_specs = 6;
129+
130+
reserved 1, 2, 4;
131+
}
132+
```
133+
Parsing both protos requires additional validation checks to figure out single model spec or multiple model spec.
134+
135+
#### Changes to input channels
136+
137+
`model` and `model_blessing` parameters can be either of the type [BaseChannel](https://github.com/tensorflow/tfx/blob/master/tfx/types/channel.py#L51) or [UnionChannel](https://github.com/tensorflow/tfx/blob/master/tfx/types/channel.py#L363).
138+
If BaseChannel is passed as input, the component will convert it to a single item UnionChanel before invoking the executor.
139+
```python
140+
if model and (not isinstance(model, types.channel.UnionChannel)):
141+
model = types.channel.union([model])
142+
if model_blessing and (not isinstance(model_blessing, types.channel.UnionChannel)):
143+
model_blessing = types.channel.union([model_blessing])
144+
```
145+
If any of the model is not blessed the executor will return without doing inference.
146+
147+
148+
#### Changes to write `inference_result` beam pipeline
149+
150+
If raw inference_results are expected, then the component will save the predictions logs in inference_result subdirectories.
151+
```python
152+
if inference_result:
153+
data = (
154+
data_list
155+
| 'FlattenInferenceResult' >> beam.Flatten(pipeline=pipeline))
156+
for i in range(len(inference_endpoints)):
157+
_ = (
158+
data
159+
| 'SelectPredictionLog[{}]'.format(i) >> beam.Map(lambda x: x[i])
160+
| 'WritePredictionLogs[{}]'.format(i) >> beam.io.WriteToTFRecord(
161+
os.path.join(inference_result.uri, str(i), _PREDICTION_LOGS_FILE_NAME),
162+
file_name_suffix='.gz', coder=beam.coders.ProtoCoder(prediction_log_pb2.PredictionLog)))
163+
```
164+
165+
#### Changes to prediction to examples convert function
166+
167+
In case of multiple prediction logs, the features are extracted from the first one.
168+
```python
169+
def convert(prediction_logs: Tuple[prediction_log_pb2.PredictionLog, ...],
170+
output_example_spec: _OutputExampleSpecType) -> tf.train.Example:
171+
"""Converts given `prediction_log` to a `tf.train.Example`.
172+
173+
Args:
174+
prediction_logs: The input prediction log.
175+
output_example_spec: The spec for how to map prediction results to columns
176+
in example.
177+
178+
Returns:
179+
A `tf.train.Example` converted from the given prediction_log.
180+
Raises:
181+
ValueError: If the inference type or signature name in spec does not match
182+
that in prediction_log.
183+
"""
184+
is_single_output_example_spec = bool(output_example_spec.output_columns_spec)
185+
is_multiple_output_example_spec = bool(output_example_spec.output_example_specs)
186+
187+
if (not is_single_output_example_spec) and (not is_multiple_output_example_spec):
188+
raise ValueError('Invalid output_example spec')
189+
elif is_single_output_example_spec and (not is_multiple_output_example_spec):
190+
specs = [output_example_spec]
191+
elif (not is_single_output_example_spec) and is_multiple_output_example_spec:
192+
specs = output_example_spec.output_example_specs
193+
if len(prediction_logs) != len(specs):
194+
raise ValueError('inference result, spec length mismatch '
195+
'output_example_spec: %s' % output_example_spec)
196+
else:
197+
raise ValueError('Invalid output_example spec')
198+
199+
example = _parse_examples(prediction_logs[0])
200+
output_features = [_parse_output_feature(prediction_log, example_spec.output_columns_spec)
201+
for prediction_log, example_spec in zip(prediction_logs, specs)]
202+
203+
if len(output_example_spec.example_features) > 0:
204+
example = _filter_columns(example, output_example_spec)
205+
206+
return _add_columns(example, output_features)
207+
```
208+
209+
### Alternatives Considered
210+
211+
### Performance Implications
212+
Neutral
213+
214+
### Dependencies
215+
No new dependencies introduced.
216+
217+
### Engineering Impact
218+
219+
### Platforms and Environments
220+
No special considerations across different platforms and environments.
221+
222+
### Best Practices
223+
No change in best practices.
224+
225+
### Tutorials and Examples
226+
API docs will be updated.
227+
228+
### Compatibility
229+
Proto and input changes are backward compatible.
230+
231+
### User Impact
232+
233+
## Questions and Discussion Topics
234+
* Is it okay to use self-referencing proto definitions for backward compatibility?

0 commit comments

Comments
 (0)