[Teammetrics-discuss] Web Parser

Sukhbir Singh sukhbir.in at gmail.com
Mon Dec 5 18:24:59 UTC 2011


    git pull

`archiveparser.py` is now working without any errors for at least
`debian-accessibility`.

I am sorry for the many errors but in our case we are dealing with
data that has no proper defined formatting. And unless we get to the
error where our parsing has failed, we can never predict what we are
dealing with. For example, we seem to have there different types of
'From' field in the web archives of lists.debian.org:

Name <email>
Name(email)
email (Name)
email <email>

We were getting errors because of this. Now they seem to have been handled.

The connection problem I was telling you about might be an issue local
to my connection only, so I won't comment on that unless I get the
same error from blends.

Please don't bother time testing it, I will do that. Once I am sure
it's ready, I will let you know. It's working for
`debian-accessibility` if you want to see the results and *should*
work for other lists also now (I will confirm this soon).



More information about the Teammetrics-discuss mailing list