[Teammetrics-discuss] Updates and related to mbox creation.
Andreas Tille
andreas at an3as.eu
Tue Dec 13 21:54:46 UTC 2011
On Tue, Dec 13, 2011 at 08:01:16PM +0530, Sukhbir Singh wrote:
> 1. The test run for web archive parser has ended on blends.d.n. It ran
> for 55 teams and based on a casual glance of some popular teams I
> 'knew', it looks good. You can check in your own way!
I'll do soon.
> 2. Mbox creation:
>
> Because we are fetching the message from the web archive, we don't
> know which encoding it was originally sent in. While creating a mbox,
> we need to specify the encoding for non-ASCII characters in the From,
> Subject and the Body fields. As we don't know the encoding, here is
> what we can do:
>
> i. Ignore the convention of mbox and save everything in utf-8
> *without* specifying the encoding. So we just save the file using
> utf-8. (This breaks the convention of a mbox/ email headers as defined
> in the RFCs).
> ii. Assume utf-8 for all messages and specify the encoding as that
> only. (safest?)
>
> If I am missing something, please let me know. IMHO, there is no other
> way I know of other than ii and it doesn't make a difference if we
> store everything in utf-8.
IMHO ii makes sense. The web archive is finally what we have and it is
just UTF-8. If mbox format needs the specification of the encoding
then we should just specify it.
Kind regards
Andreas.
--
http://fam-tille.de
More information about the Teammetrics-discuss
mailing list