Skip to content

Dedicate the line-by-line reader from FD a separate method#186

Merged
praiskup merged 4 commits into
mainfrom
praiskup-library-method
May 25, 2026
Merged

Dedicate the line-by-line reader from FD a separate method#186
praiskup merged 4 commits into
mainfrom
praiskup-library-method

Conversation

@praiskup

Copy link
Copy Markdown
Owner

No description provided.

praiskup added 2 commits May 18, 2026 14:35
Separate the select/read/buffer line-splitting logic into a reusable
generator that supports multiple file descriptors.

Assisted-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Group EOF logic together and yield partial lines as-is instead of
injecting an artificial newline character.

Drop discard_partial_line parameter.  The parameter doesn't need to be
set to True value.  Partial lines at EOF are now always yielded as-is
(except for timeouts), without a trailing newline.

Assisted-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@praiskup praiskup changed the title Praiskup library method Dedicate the line-by-line reader from FD a separate method May 18, 2026
@praiskup praiskup requested a review from FrostyX May 21, 2026 06:46

@FrostyX FrostyX left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the follow-up @praiskup,
I didn't have the energy to do this last step.

Some comments in the tests but overall it looks good to me :-)

assert result == [
(fd, b"first\n"),
(fd, b"second"),
]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change of behavior from PR #182, so I'll explain the reasoning that we had with @nikromen.

This is a generic function, and we thought it probably depends on the situation how we want to treat partial lines. When processing some output just so that we can log it and display it to the users, for example, we probably want to read and show as much as possible. In such case, this would be the desired assert.

However, in the case of our cmd_list, the partial line might be missing more than just the \n and therefore the resource name might be mangled. Trying to delete it might lead to errors that such a resource doesn't exist, or worst-case-scenario, deleting a different resource.

I don't think it is a big concern, because this is probably a rare corner case, which probably can't even happen in our resalloc setup. But I wanted to point out the change.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually did the behavior change intentionally, because I didn't see a practical use for the other approach (and if, detecting missing \n should be easy on the caller's side).

We never yield incomplete lines? Only if it is the last one (missing one before EOF). And unless there's a timeout, it means the file was successfully read from beginning to the end. I am probably missing something important here, please elaborate (we can have a meeting if you think it is worth it).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can talk about it on a meeting, it's likely that I am missing something, not you.

detecting missing \n should be easy on the caller's side

Agreed. And that's definitely a better place where to handle it. So +1 to this change, I see no problem with it, I just wanted to point out the difference so we are all aware of it.

assert set(result) == {
(fd1, b"from-fd1\n"),
(fd2, b"from-fd2\n"),
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this works 100% as expected. When I have this cmd_list script:

#!/usr/bin/python3 -u

import sys

sys.stdout.write("LINE1\n")
sys.stderr.write("LINE2\n")
sys.stdout.write("LINE3\n")
sys.stderr.write("LINE4\n")

then I am getting this in /var/log/resallocserver/hooks/000000_list:

LINE2
LINE4
LINE1
LINE3

(Notice the order)

Also, the server is trying to delete only two resources:

Cleaning unused resources in local_x86_64_normal_prod pool
running: /home/resalloc/provision/local-list
running: /home/resalloc/provision/local-delete
running: /home/resalloc/provision/local-delete

I suppose the stdout ones.

I don't know how much of this is intended or unexpected.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a regression by this PR? I doubt because we continue handle the stdout/stderr differently. We would have to care about flushing, and even after that I'm not really sure it would be guaranteed to not mix.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a regression.

I was trying to implement it in my PR but ran out of patience, threw all the changes away and kept reading only STDOUT. But when I was reading this test, I thought maybe you are trying to implement support for both STDOUT and STDERR at the same time.

It seems like that was not the intention, which is absolutely fine by me :-)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding test_interleaved_stdout_stderr ...

fd = _pipe_with_data(data)
result = list(yield_lines_from_fds([fd]))
assert len(result) == 1
assert result[0][1] == data

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add two more tests? One trivial, when the pipe has no data:

fd = _pipe_with_data(b"")

It works, I tested manually, but I'd like to have a test that the read doesn't hang forever (a naive implementation would IIRC hang forever)

And very importantly, a test for the case that crashed our production the last time. It was basically this:

#!/usr/bin/python3

import time
time.sleep(99999)

So no output, but hanged forever. We need to make sure it timeouts correctly. It does, I tested manually, but having a unit test would be really nice.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_empty_input + test_timeout_deadline_expired is basically the thing you want?

@praiskup praiskup force-pushed the praiskup-library-method branch from b409a87 to 9260fc7 Compare May 25, 2026 16:33
Comment thread tests/test_yield_lines_from_fds.py Fixed
praiskup added 2 commits May 25, 2026 18:36
Assisted-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Limit line length to 10k when catch_stdout_lines_securely is set

Assisted-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@praiskup praiskup force-pushed the praiskup-library-method branch from 9260fc7 to 30d6de0 Compare May 25, 2026 16:36
@praiskup praiskup merged commit 51a8f9a into main May 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants