Skip to content

fix(prediction): output_iterator silently yields individual characters for string output#461

Open
devteamaegis wants to merge 1 commit into
replicate:mainfrom
devteamaegis:fix/output-iterator-non-list-output
Open

fix(prediction): output_iterator silently yields individual characters for string output#461
devteamaegis wants to merge 1 commit into
replicate:mainfrom
devteamaegis:fix/output-iterator-non-list-output

Conversation

@devteamaegis
Copy link
Copy Markdown

Summary

output_iterator() and async_output_iterator() use self.output or [] to coerce the model's output to a list. This is only safe when output is None (not started yet) or an actual list.

For non-array model outputs — plain strings, URLs, dicts — the falsy-or pattern silently returns the raw value unchanged:

# output_iterator with output = "hello world"
previous_output = self.output or []  # → "hello world"
# ...
output = self.output or []           # → "hello world"
new_output = output[len(previous_output):]  # string slice, not list slice
yield from new_output                # silently yields 'h', 'e', 'l', 'l', ...

Fix

Replace value or [] with an explicit isinstance check:

def _as_list(value):
    if value is None:
        return []
    if isinstance(value, list):
        return value
    raise ValueError(
        f"output_iterator requires an array output type, "
        f"but the model returned a {type(value).__name__!r}. "
        f"Use prediction.output directly for non-array outputs."
    )

This resolves the long-standing # TODO: check output is list comments in both methods.

Bonus fix

The async_output_iterator loop also had a variable-shadowing bug:

# Before — outer `output` list is clobbered by the loop variable
for output in new_output:
    yield output

# After
for item in new_output:
    yield item

Tests

Six new unit tests added to tests/test_prediction.py covering:

  • None output → empty iteration (no error)
  • String output → ValueError (sync and async)
  • Dict output → ValueError (sync)
  • Completed prediction with list output → empty iteration (all tokens were "seen" at start)

…acter-by-character

Both output_iterator and async_output_iterator used ``self.output or []``
to coerce the model's output to a list.  This idiom is only safe when
output is None (not started yet) or an actual list.

For non-array model outputs (plain string, URL, dict) the falsy-or pattern
silently returns the raw value.  Calling ``yield from "hello world"`` then
iterates over 11 individual characters instead of the intended single token,
and ``output[len(previous_output):]`` on a string produces a string slice
rather than a list slice — causing subtly wrong behaviour with no error.

Fix: replace ``value or []`` with an explicit isinstance check that accepts
None (treated as an empty list, meaning "no output yet") and list, and
raises a descriptive ValueError for anything else so callers know to use
``prediction.output`` directly for non-streaming models.

Also fixes the variable shadowing bug in async_output_iterator where the
loop variable was named ``output`` instead of ``item``, clobbering the
outer ``output`` reference on each iteration.

Resolves the long-standing ``# TODO: check output is list`` comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant