[Teammetrics-discuss] Updates and related to mbox creation.

Andreas Tille andreas at an3as.eu
Tue Dec 13 21:54:46 UTC 2011


On Tue, Dec 13, 2011 at 08:01:16PM +0530, Sukhbir Singh wrote:
> 1. The test run for web archive parser has ended on blends.d.n. It ran
> for 55 teams and based on a casual glance of some popular teams I
> 'knew', it looks good. You can check in your own way!

I'll do soon.
 
> 2. Mbox creation:
> 
> Because we are fetching the message from the web archive, we don't
> know which encoding it was originally sent in. While creating a mbox,
> we need to specify the encoding for non-ASCII characters in the From,
> Subject and the Body fields. As we don't know the encoding, here is
> what we can do:
> 
> i. Ignore the convention of mbox and save everything in utf-8
> *without* specifying the encoding. So we just save the file using
> utf-8. (This breaks the convention of a mbox/ email headers as defined
> in the RFCs).
> ii. Assume utf-8 for all messages and specify the encoding as that
> only. (safest?)
> 
> If I am missing something, please let me know. IMHO, there is no other
> way I know of other than ii and it doesn't make a difference if we
> store everything in utf-8.

IMHO ii makes sense.  The web archive is finally what we have and it is
just UTF-8.  If mbox format needs the specification of the encoding
then we should just specify it.

Kind regards

         Andreas. 

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list