<div dir="ltr"><div>FWIW, I wonder if this is a fallout of PR #1274 : <a href="https://github.com/networkupstools/nut/pull/1274/files">https://github.com/networkupstools/nut/pull/1274/files</a></div><div>and specifically <a href="https://github.com/networkupstools/nut/commit/550064930e369fb3a322c3b28ffff8b4acf53532">https://github.com/networkupstools/nut/commit/550064930e369fb3a322c3b28ffff8b4acf53532</a> <br></div><div>which added a `sock_read()` loop continuation effectively if `pconf_char()` returned 0.</div><div><br></div><div>Just grasping at straws here (since it is a change a couple of months before the release was cut), because if ALL of those 128 reads (skipped above) did return 0, we should not have got to this line.</div><div><br></div><div>The comment added here "<i>O_NDELAY with zero bytes means nothing to read but since read() follows a successful select() with ready file descriptor, ret shouldn't be 0</i>" - that might be misleading too, since per internet:<br></div><div><br></div><div>>
The result of O_nonblock and O_ndelay is to make I/O into non-blocking
mode (non-blocking), return immediately without reading the data or the
write buffer full, without blocking the wait.
<br>
> <br>
>
The difference is that, in a read operation, if the data is not
read, O_ndelay causes the I/O function to return 0 immediately, but this
gives rise to a problem, because the read to the end of the file (EOF)
is returned 0, so it is impossible to distinguish between the cases.
Therefore, O_nonblock is generated, it will return 1 when the data is
not read, and set errno to Eagain.
<br>
>
<br>
>
O_ndelay was introduced in earlier versions of System V, and when
coding, it is recommended that POSIX-defined o_nonblock,o_nonblock be
set at open and Fcntl.
</div><div><br></div><div>Jim</div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jun 13, 2023 at 10:41 AM Jim Klimov <<a href="mailto:jimklimov%2Bnut@gmail.com">jimklimov+nut@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div>Thanks for the interesting puzzle to crunch!</div><div><br></div><div>Looking at these few bread-crumbs, I wager an educated guess that this loops in `sendcmd()` (where CLI child processes talk to a daemonized copy which tracks the timers for events), around here: <a href="https://github.com/networkupstools/nut/blob/v2.8.0/clients/upssched.c#L583-L626" target="_blank">https://github.com/networkupstools/nut/blob/v2.8.0/clients/upssched.c#L583-L626</a> for the daemon and here for the child: <a href="https://github.com/networkupstools/nut/blob/v2.8.0/clients/upssched.c#L762-L787" target="_blank">https://github.com/networkupstools/nut/blob/v2.8.0/clients/upssched.c#L762-L787</a></div><br><div>In both cases we run an infinite loop with `select()` to wait up to 1 second and `read()` from the pipe. The common expectation is that if there is nothing to read, the program blocks in `select()` waiting for a reply from the kernel, and only returns after the specified timeout - to the point that some `sleep()` with sub-second precision implementations do just that on systems that lack `usleep()`.</div><div><br></div><div>
<div>With `select()` in your capture posting an array of FDs `[7 10]`
I'd say it is the parent loop, and one of those is the `pipefd` and
another is the FD for socket connection to talk to a particular child on
the other side of the pipe.<br></div><div><br></div>
</div><div>Here I must guess that either the `pselect6()` implementation behaves differently which would be surprising after decades worth of expected behavior, or that it for some reason decides that the FD #7 (whoever that is) raised the hand but was then silent (0 bytes read).</div><div><br></div><div>The next clue is 128 reads, which leads to <a href="https://github.com/networkupstools/nut/blob/v2.8.0/clients/upssched.c#L485" target="_blank">https://github.com/networkupstools/nut/blob/v2.8.0/clients/upssched.c#L485</a> `sock_read()` method which indeed does up to `US_MAX_READ==128` attempts to read one byte from a connection (so from the CLI/child counterpart) at a time - so that is who FD#7 gotta be.<br></div><div><br></div><div>So it is just at this point that I get stuck explaining what happens and why :) Possibly there was a child or CLI counterpart process which died and somehow the connection did not get closed (which would return a `read()` error and evict it from the conn list), e.g. frozen/zombified, or something else connected to the pipe (antivirus heh?) and loiters there, or something else I could not think of quickly...</div><div><br></div><div>The next clue could be to look at who is connected to the pipe file, maybe with `fuser`, or look in `ps -ef` if there are other `upssched` processes present, or look in `/proc/PIDNUM/fd/` of the daemonized and looping `upssched`...<br></div><div><br></div><div>If anyone decides to track through the code, the potential processing loop goes</div><div> checkconf -> conf_arg -> parse_at -> sendcmd -> daemon if needed -> start_daemon -> conn_add(pipefd) or sock_read(conn)</div><div>with short-lived child processes (command-line launched from `upsmon`) parsing the configuration to pick the part relevant to them per envvars, and talk to the daemon (fork one if needed) to start/cancel the timers.<br></div><div><br></div><div>Jim<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jun 13, 2023 at 8:33 AM Kari Lempiäinen <<a href="mailto:kari.lempiainen@summerday.net" target="_blank">kari.lempiainen@summerday.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I ran the strace command while upssched was 100% CPU hungry. This is what I got:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
1686633611.702798 read(7, "", 1) = 0 <0.000004>
<div>1686633611.702816 read(7, "", 1) = 0 <0.000004></div>
<div>1686633611.702834 pselect6(11, [7 10], NULL, NULL, {tv_sec=1, tv_nsec=0}, NULL) = 1 (in [7], left {tv_sec=0, tv_nsec=999998800}) <0.000006></div>
<div>1686633611.702862 read(7, "", 1) = 0 <0.000004></div>
<div>1686633611.702880 read(7, "", 1) = 0 <0.000004></div>
.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
1686633611.705206 read(7, "", 1) = 0 <0.000004>
<div>1686633611.705224 read(7, "", 1) = 0 <0.000004></div>
<div>1686633611.705242 pselect6(11, [7 10], NULL, NULL, {tv_sec=1, tv_nsec=0}, NULL) = 1 (in [7], left {tv_sec=0, tv_nsec=999999204}) <0.000005></div>
<div>1686633611.705268 read(7, "", 1) = 0 <0.000004></div>
<div>1686633611.705288 read(7, "", 1) = 0 <0.000302></div>
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
there are 128 "read(7" lines between every pselect6 lines (I did count the read lines for couple of blocks, 128 in each of them). Nothing else in the strace log, just read and pselect lines for the short time I let strace to run.</div>
<div id="m_-8088851625876775577m_-3899177195465648473appendonsend"></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Any help from this?</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Best regards,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Kari</div>
<hr style="display:inline-block;width:98%"><font face="Calibri, sans-serif"><b><more history snapped><br></b></font></div>
</div></blockquote></div></div>
</blockquote></div>