[Pkg-puppet-devel] reports retention policy

Wed Aug 28 14:26:07 BST 2024

Le 2024-08-27 à 17 h 59, Thomas Goirand a écrit :
> On 8/27/24 19:38, Antoine Beaupré wrote:
>> What did you do before the upgrade?
> 
> You mean, how did I manage with puppet 5? Well, I simply edited 
> /etc/cron.daily/puppet* to fix number of days to 2 instead of 31.
> 
>> surely this is not a new problem for
>> you?
> 
> Indeed, it's been annoying me over and over again with my puppet servers 
> getting disk full all the time. I'd really love if the Debian package 
> had a better working default.
> 
>> did you look at upstream's policy? i would defer to that.
> 
> Should I care? I care our package to do the right thing (tm). :)
> 
>> for now, i did revert to 30 days, after you said, on IRC, "I'm ok with
>> whatever improvement, really." Perhaps I misinterpreted that?
> 
> Well, on IRC we talked about handling empty dirs and such, we didn't 
> talk about retention policy. To me, reverting to 30 days is a huge 
> regression.
> 
>> The other thing I would point out is that this is a configuration file:
>> it's easily overrideable locally and will survive upgrades. Another
>> alternative would me to make that (more) configurable in a systemd
>> timer.
> 
> A systemd timer wouldn't be more configurable than it is already. If 
> you'd like to do a systemd timer, that I don't mind. What I mind, is my 
> puppet server in my CI getting full after a few days. I don't think I 
> should I also automate "fixing" of the retention policy in my CI...
> 
>> But we should probably just find a proper solution for this, which,
>> again, I would look at upstream for.
> 
> If upstream has a bad default of 30 days, we shouldn't keep it. Though 
> as Jerome wrote, I don't think upstream even has such a cron job.
> 
> On 8/27/24 19:44, Jérôme Charaoui wrote:
>  > This confirms my hunch that the optimal cleanup period is very
>  > dependent on the environment. In this case we should opt to document
>  > in  README.Debian or NEWS our suggestion for administrators to deploy
>  > such a  cron job, which they will be able to customize to their needs.
> 
> By all means, if you have time writing such documentation, go ahead. 
> Though this is orthogonal to the discussion of shipping sensible 
> defaults. And to me, 30 days isn't reasonable unless you really have a 
> *HUGE* amount of spare space on your puppet server.
> 
>  > I'll also mention that although the default directory for reports is
>  > "$vardir/reports", this can be changed in puppet.conf, and reports may
>  > even be disabled altogether with "reports = none". In either
>  > situation, the cron job shipped by the package becomes wrong.
> 
> If a user decides to change the default behavior, it shouldn't be our 
> concern, IMO. Though I care that it works "in most cases". Without 
> aggressively purging the reports, I consider the package broken. This is 
> the case for the 16+ puppet clusters I maintain (with 50 to 200 servers 
> in each clusters). It is my point of view that, by default, the puppet 
> package should be able to handle that many servers.
> 
> I also would like to ask: what do you do with so many puppet run 
> reports? Do you read them during your week-ends, maybe? :)

These reports could be useful for forensics and troubleshooting.

We can argue all day long but it seems obvious that we're not going to 
be able to agree about the appropriate retention policy, so this is my plan:

- Remove the cron job from the package
- Ship a systemd service/timer pair to cleanup reports, disabled on install
- Ship an example file to override the service to change retention time
- Properly announce the new service/timer pair in NEWS
- Document how to enable all this in README.Debian, and how to change
the retention time with a service unit override

-- Jérôme