[Teammetrics-discuss] How does commitstat injects data
Andreas Tille
andreas at an3as.eu
Fri Jan 13 21:43:07 UTC 2012
On Sat, Jan 14, 2012 at 02:36:47AM +0530, Sukhbir Singh wrote:
> On checking vasks, I think commitstat.py has stopped. Can you please
> check the status at your end?
$ ./commitstat.py -u tille
Segmentation fault
> If this is the case, I have a very stupid solution but then there
> seems to be no other way -- run the script in parts during the first
> run. For x-y-z teams, then x-y teams and then x teams.
Well, we currently have:
teammetrics=# SELECT project, count(*) from commitstat group by project ;
project | count
-----------------+--------
debian-live | 341
pkg-scicomp | 357
pkg-openoffice | 3913
kernel | 237710
pet | 346
pkg-kde | 1061
d-i | 82980
pkg-osm | 1051
pkg-samba | 16
nm | 219
pkg-perl | 2081
debian-release | 1079
teammetrics | 325
perl | 614
pkg-common-lisp | 2005
pkg-multimedia | 16302
pkg-java | 4605
debian-science | 3985
debian-med | 1468
debconf | 1349
pkg-phototools | 1219
debtags | 704
pkg-hurd | 962
pkg-games | 6626
demudi | 8
pkg-grass | 2270
pkg-postgresql | 43
(27 rows)
I just keep on trying as is again. I also have seen
> Why it crashes
> is because there is *lots* of data to be processed and that keeps the
> CPU very busy for a long time and I think then ultimately the process
> is killed.
Hmmm, I'm a bit concerned about things on blends.d.n as well:
Jan 11 08:44:01 blends kernel: postgres[21917]: segfault at b85eb2aa ip b72d3776 sp bf84d6c0 error 4 in postgres[b7271000+4da000]
Jan 11 08:44:02 blends kernel: python[14889]: segfault at 28 ip 080ed0a9 sp bfa10a9c error 4 in python2.6[8048000+1e0000]
Jan 12 08:33:49 blends kernel: postgres[10556]: segfault at b88032aa ip b72d3776 sp bf84d6c0 error 4 in postgres[b7271000+4da000]
Jan 13 08:29:15 blends kernel: postgres[30998]: segfault at b860f2aa ip b72d3776 sp bf84d6c0 error 4 in postgres[b7271000+4da000]
Jan 13 19:55:07 blends kernel: python[11015]: segfault at 28 ip 080ed0a9 sp bfdbf6fc error 4 in python2.6[8048000+1e0000]
Finally this was the reason why I wanted to look into your insert
statements today - just to see whether I can find any explanation. I
now have three coredumps from postgresql and have no idea what to do
with these - hey, inspecting core dumps with gdb is something I did not
entertained before. I'm happy enough that I learned how to create those
dumps which is not default behavour of postgresql.
The reason why I cared for those dumps is that in summer I several times
observed these without any clue what this might have caused. It might
be a reasonable explanation that this was the time when tests on
commitstat were run. Later I upgraded to postgresql 9.1 and assumed the
crashes are over because I did not observed them ... until 11. January.
So let's assume commitstat is guilty for the problem - just because we
do not have any better theory.
I have no idea whether this helps but I have an idea what might be a
chance on one hand while as a side effect beeing faster and has shorter
code.
teammetrics=# \help copy
Command: COPY
Description: copy data between a file and a table
Syntax:
COPY table_name [ ( column [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
...
Just dive into the usage of COPY. If I understand the format of your
temporary file correctly you just can feed this file into a COPY command
and you are done. According to PostgreSQL docs this should be quite
performant and saves you some parsing and inserting code.
Moreover, if you above give advise to split the job up into per project
chunks (I have no idea whether this helps or not) why not just doing it
staight in the code and do the fetchrevisions.py per team - copy the
temporary file over to blends.d.n, inject it via copy command and fetch
the next team afterwards. The overall performance should be the same
and it might help to let vasks "relax" a bit.
> Maybe they have set something up on vasks:
>
> if cpu_usage == 100% for $TIME, kill process.
>
> Possible?
I don't think so, but you might like to use UNIX command nice to reduce
the priority of your job compared to other jobs.
> It shows that it stopped at pkg-samba. Now I suggest that you run it
> *only* for pkg-samba and then see? If it runs flawlessly, it will
> confirm my 'theory'. Else, I will check it again.
What do you think about my suggestion? Should I wait before starting
again?
> (Before running commitstat.py from scratch, please execute refresh.sh on vasks)
I did so.
Kind regards
Andreas.
--
http://fam-tille.de
More information about the Teammetrics-discuss
mailing list