[Pkg-crosswire-devel] Miscellaneous responses to various threads
DM Smith
dmsmith555 at yahoo.com
Mon Jan 26 21:54:20 GMT 2009
Please pardon me for not replying to threads individually. I've only
recently joined and much of what I say pertains to emails not in my inbox.
Regarding this effort:
Many, many thanks!
Regarding ICU:
osis2mod and tei2mod both require it for proper building of UTF-8 texts.
Specifically, it will convert cp1252 to UTF-8 and then normalize that to
NFC. It is quite possible to do this before running these command line
apps using uconv (part of ICU) or the equivalent on the text.
The SWORD lib (and it is properly branded in upper case) has one place
where ICU is critical. If ICU is not present SWORD's upper case string
converter will use an ASCII upper case routine (e.g. that from ctype) as
a fallback, otherwise it will use a UTF-8 aware ICU routine. There are
at least two places that this makes a difference:
1) The rendering of <divineName>Lord's</divineName> into all upper-case
where the ' is a non-ASCII character. Without ICU, it breaks.
2) Dictionaries are keyed and indexed on upper case words. When a user
selects a lower case word for lookup, it is uppercased. If it has
non-ASCII characters, it breaks.
Regarding e-texts:
The SWORD library is backward compatible with e-texts. It is not forward
compatible. Modules may have features that don't work for earlier
engines and break them. (I can give specific examples if needed/wanted.)
For this reason, the module confs have a minimum SWORD version field,
MinimumVersion. If I am not mistaken, the front-ends' install managers
take this into account warning and/or preventing the user from
installing modules that are incompatible with the SWORD engine. But,
IIRC, once the module is installed, this field is not used. I think that
modules may be another place that having libsword6 and libsword7
co-exist will be problematic.
Other fields which should be considered are that of:
Obsoletes -- names a module that is replaced by this one
Font -- names a font which may be necessary for proper viewing of the text
For a complete definition of the conf see:
http://crosswire.org/wiki/DevTools:confFiles
Another issue is that of copyrights and licenses. We at CrossWire have
made the best attempt to properly obtain permission for copyrighted
material and to properly classify public domain texts. We simply won't
distribute copyrighted texts without written permssion. We have been
wrong at times. The classic example is a Portuguese text which was
inappropriately named and dated, perhaps to obscure its ownership, and
thus labelled as public domain. When it was discovered that it was under
current copyright, we immediately took down the text. Our rigid stance
on this has helped us to successfully negotiate other texts. CrossWire
does not expect to have to let any other distribution know that a module
needs to come down. The mere fact that it is gone from the CrossWire
repository should be sufficient. While we at CrossWire are willing to
bear that risk of making such a mistake, I don't recommend it to anyone
else. Especially one with a derived income stream. For any Linux
distribution to distribute a wide range of modules they would need to be
ready to remove them quickly on such a condition.
Regarding modules with which there is no problem, I recommend KJV,
StrongsGreek, StrongsHebrew and Robinson. Perhaps Greek and Hebrew
modules. But this is because, I use SWORD to do biblical research. For
daily reading, I'd personally never recommend the KJV. I'd recommend
texts that are under current copyright and for which CrossWire has
permission to distribute.
Regarding indexing:
SWORD has not come up with a versioning mechanism for Lucene indexing.
This is a two fold problem. The first relates to the version of Lucene
being used. Lucene core has a strict backward compatibility core
implementation, such that 4.x can read 2.x indexes but possibly not 1.x
indexes. Also, within 2.1-2.9, the index is backward compatible. That is
an index built under 2.9 can be used by 2.0. The clucene project is
stuck on a very old version of Lucene, 1.4.3, I think.
The other is that of how the SWORD engine uses Lucene. Adding new fields
would require some kind of versioning mechanism.
For this reason, indexes won't be pre-built and distributed as part of
the module, but will need to be built on a per module per install basis.
Regarding /usr/share/sword
IMHO, the biggest breakage regarding /usr/share/sword is that unless it
is writeable one cannot use Lucene to index modules held there. Lucene
indexing is awesome. When we get problem reports regarding
/usr/share/sword at CrossWire, we recommend to the user to log on as
root and change permissions to open it up, or to delete the module using
the distribution's package manager and re-install it using the SWORD
installer.
Hope this is helpful.
In Christ,
DM
More information about the Pkg-crosswire-devel
mailing list