[pkg-apparmor] UDD notifications script

Thu Feb 19 21:31:48 UTC 2015

Hello,

may I also offer some "nice to have" comments? ;-)

Am Donnerstag, 19. Februar 2015 schrieb intrigeri:
> * This looks suboptimal (but of course, as long as we haven't
>   a massive amount of usertagged bugs, it should be just fine):
> 
>  	for bug in cursor.fetchall():
> 		buglist.append(bug)
> 
>   ... so I'm curious: can't we do that in one batch operation instead?

Probably depends on the interface that psycopg offers.

I just searched a bit:

http://initd.org/psycopg/docs/cursor.html

fetchall()
    Fetch all (remaining) rows of a query result, returning them as a 
    list of tuples. An empty list is returned if there is no more record 
    to fetch.

so maybe (untested!)
    buglist = cursor.fetchall()
could work

> * When using the get_bug_list output (a list of pairs) in
>   send_team_notification, we have to access it by indices: bug[0] to
>   get the bug ID, bug[1] to get the added or removed usertag. It's not
> astonishingly obvious. I would recommend that get_bug_list outputs:
> 
>   - either a list of dictionaries so we can then write bug['id']
>     and bug['usertag']

I didn't see an obvious way to do this in the psycopg docu, so if we 
really want this, you'll probably need something like
    buglist2 = []
    for bug in buglist:
        buglist2.append( { 'id': bug[0], 'usertag': bug[1] })
which is not worth the effort IMHO.

Either keep the numbers, or define constands like
    field_id = 0
    field_usertag = 1
that you can use instead (define those constants inside each function 
to avoid conflicting field numbers for another function/query!)

>   - or a dictionary whose keys would be the bug IDs, and values would
>     be the usertag; but wait! this would be too limiting, read on:

Indeed - what happens if a bug has multiple usertags?

> * I see we first retrieve a list of usertagged bugs (in get_bug_list),
> and then run one more query per bug that had a usertag added or
> removed, in order to get its title. Alternatively, get_bug_list could
> use a JOIN to retrieve all the data we need about all usertagged bugs
> in one go. No idea the higher number of queries (current
> implementation) costs more than retrieving the title of bugs that we
> won't use (suggested implementation). 

I have some experience (from old PostfixAdmin code) that optimized 
queries are much faster than having lots of database requests. This is 
especially true if the database is on another host, but even with the 
database on localhost, some 1000 MySQL <-> PHP roundtrips took several 
minutes!

In this case, the question is how many bug ids and tags the first query 
returns, and for how many of them you need to look up the bug title.

Anyway, you can speed it up by replacing multiple calls of
    SELECT title from bugs WHERE id='1'
    SELECT title from bugs WHERE id='2'
    SELECT title from bugs WHERE id='3'
with one call of
    SELECT id, title from bugs WHERE id IN ('1', '2', '3')

Note that I included the id in the query to avoid ordering problems.

Also note that this is the first thing to fix/change if you notice 
performance problems ;-)

BTW: did I ever point you to my "1001 bugs - or: the golden rules of 
bad programming" slides? ;-)

English version:
    http://blog.cboltz.de/archives/63-1001-bugs-or-the-golden-rules-of-bad-programming.html
German version:
    http://blog.cboltz.de/archives/64-1001-Bugs-oder-die-goldenen-Regeln-fuer-schlechte-Programmierung.html

Speaking about get_bug_title():
	for bug in cursor.fetchall():
		return bug[0]

"return" also means "leave the function". This isn't a problem with the 
current implementation that expects only one row, but will break if you 
switch to the query I proposed.

> Anyhow, putting aside
> performance and resources considerations, I would find the
> implementation of send_team_notification more readable and elegant if
> get_bug_list returned a list of dictionaries, each such dictionary
> having keys (id, title, usertag). Food for thought :)

Can there be multiple usertags on a bug?

> * pylint says that "def get_bug_title(bugid) :" has one too much
>   space. It's also quite unhappy about a lot of other small things.
>   Possibly it knows Python best practices better than you and I.
>   I've no idea (really -- it might produce obsolete recommendations,
>   and I've not checked). Similarly, pyflakes is a bit mean and detects
> two unused variables.

FYI: We are using pyflakes in upstream AppArmor (make check) ;-)

> * In:
> 
> 	cursor.execute("SELECT title from bugs WHERE id='%s'" % bugid)
> 	for bug in cursor.fetchall():
> 		return bug[0]
> 
>   ... we're somewhat acting defensively as if we didn't have
>   a guarantee that the id is a unique identifier (is it the case?),
>   which can arguably be a good thing.
> 
>   But we're actually not defensive in practice, since we're always
>   returning the title of the first matching bug and ignoring any
>   other. So, retrieving all the data using fetchall(), and then
>   iterating on the first result only in a for loop feels... weird.

I'd hope that the bug id is unique, which means you'll only have one 
result (in theory no result would also be possible, but it's unlikely if 
we got a usertag for that bug number before ;-)

>   So I would simply use fetchone() instead of fetchall(), and drop the
> for loop entirely.

Or change the query as proposed above (which would also mean some 
changes in the calling code)

>   Now, if you really want to be more defensive, we can also count the
>   number of matching bugs and throw an exception if it's 2 or more,
>   but I suspect we could avoid that after a quick look at the
>   UDD schema. What's an ID if it's not an ID? :)

;-)

(hmm, I should read the whole mail before starting to write my reply ;-)

> * Mixing tabs and spaces for indentation produces very confusing
>   results in my $EDITOR:
> 
> 	except IOError as e:
> 		send_error_mail("Could not save state file.")
>         return False
> 
> 	return True
> 
>   ... here, the "return False" statement is really part of the
>   exception handling block, even if it doesn't look like it when one
>   has tabs display width = 8 spaces. I suggest applying the
>   indentation changes suggested by pylint to avoid that :)

I'm slightly surprised that the script runs at all - I remember cases 
where python complained loudly about mixed spaces/tabs and refused to 
run a script. Maybe that's python3 only? [After testing: yes, only 
python3 complains about this.]

> * In the read_statefile function, I believe that the "return old"
>   statement has no good reason to be in the try block. In general it's
> good practice to guard as little code as possible with exception
> catching blocks. In this case, as a non-expert in Python, I have no
> idea if it may have any adversarial consequences in practice to leave
> it there, but IMO it would be good code hygiene to move it after the
> exception handling block. And by the way, it'll make it clearer what
> is the expected output of the function on success :)

In general I agree, however in this case it would be the only line after 
the except block, so the current code is probably easier to understand.

The perfect solution would be:

def read_statefile(state_filename):
    statefile_content = ''
    try:
        ...
    except ...:
        ...

    return statefile_content

Final note: I wrote this mail without actually testing anything. 
Also, I only looked at the parts listed in this mail, not the whole script.

Regards,

Christian Boltz
-- 
[checkinstall] is a tool that allows you to keep your
brain in suspend mode.   [Robert Schiele in opensuse]