[Pkg-javascript-commits] [pdf.js] 65/161: Quick notes about the format
David Prévot
taffit at moszumanska.debian.org
Sat Apr 19 14:16:25 UTC 2014
This is an automated email from the git hooks/post-receive script.
taffit pushed a commit to branch master
in repository pdf.js.
commit 3e8ea958ae64caf3af212b212c3dc690649f85e2
Author: Yury Delendik <ydelendik at mozilla.com>
Date: Fri Mar 14 15:23:01 2014 -0500
Quick notes about the format
---
external/cmapscompress/README.md | 171 +++++++++++++++++++++++++++++++++++++++
1 file changed, 171 insertions(+)
diff --git a/external/cmapscompress/README.md b/external/cmapscompress/README.md
new file mode 100644
index 0000000..9796eb2
--- /dev/null
+++ b/external/cmapscompress/README.md
@@ -0,0 +1,171 @@
+# Quick notes about binary CMap format (bcmap)
+
+The format is designed to package some information from the CMap files located at external/cmap. Please notice for size optimization reasons, the original information blocks can be changed (split or joined) and items in the blocks can be swaped.
+
+The data stored in binary format in network byte order (big-endian).
+
+# Data primitives
+
+The following primitives used during encoding of the file:
+ - byte (B) – a byte, bits are numbered from 0 (less significant) to 7 (most significant)
+ - bytes block (B[n]) – a sequence of n bytes
+ - unsigned number (UN) – the number is encoded as sequence of bytes, bit 7 is flag to continue decoding the byte, bits 6-0 store number information, e.g. bytes 0x818407 will represent 16903 (0x4207). Limited to the 32 bit.
+ - signed number (SN) – the number is encoded as sequence of bytes, as UN, however shall be transformed before encoding: if n < 0, the n shall be encoded as (-2*n-1) using UN encoding, other n shall be encoded as (2*n) using UN encoding. So the lowest bit of the number indicates the sign of the initial number
+ - unsigned fixed number (UB[n]) – similar to the UN, but it represents an unsigned number that is stored in B[n]
+ - signed fixed number (SB[n]) – similar to the SN, but it represents a signed number that is stored in B[n]
+ - string (S) – the string is encoded as sequence of bytes. First comes length is characters encoded as UN, when UTF16 characters encoded as UN.
+
+# File structure
+
+The first byte is a header:
+ - bits 2-1 – indicate a CMapType. Valid values are 1 and 2
+ - bit 0 – indicate WMode. Valid values are 0 and 1.
+
+Then records follow. The records starts from the record header encoded as B, where bits 7-5 indicate record type (see description of other bits below):
+ - 0 – codespacerange
+ - 1 – notdefrange
+ - 2 – cidchar
+ - 3 – cidrange
+ - 4 – bfchar
+ - 5 – bfrange
+ - 6 – reserved
+ - 7 – metadata
+
+## Metadata record
+
+The metadata record header bit 4-0 contain id of the metadata:
+ - 0 – comment, body of the record is encoded comment string (S)
+ - 1 – UseCMap, body of the record is usecmap id string (S)
+
+## Data records
+
+The records that have types 0 – 5, have the following fields in the header:
+ - bit 4 – indicate the char or start/end entries are stored in a sequence in this block
+ - bits 3-0 – contain length of the data size minus 1 in this block (dataSize)
+
+The amount of entries encoded as UN follows the header. The items records follow (see below).
+
+
+### codespacerange (0)
+
+Represents the following CMap block:
+
+ n begincodespacerange
+ <start> <end>
+ endcodespacerange
+
+First record format is:
+
+ - start as B[dataSize]
+ - endDelta as UB[dataSize], end is calculated as (start + endDelta)
+
+Next record format is:
+
+ - startDelta as UB[dataSize], start = end + startDelta
+ - endDelta as UB[dataSize], end = start + endDelta
+
+
+### notdefrange (1)
+
+Represents the following CMap block:
+
+ n beginnotdefrange
+ <start> <end> code
+ endnotdefrange
+
+First record format is:
+
+ - start as B[dataSize]
+ - endDelta as UB[dataSize], end is calculated as (start + endDelta)
+ - code as UN
+
+Next record format is:
+
+ - startDelta as UB[dataSize], start = end + startDelta
+ - endDelta as UB[dataSize], end = start + endDelta
+ - code as UN
+
+
+### cidchar (2)
+
+Represents the following CMap block:
+
+ n begincidchar
+ <char> code
+ endcidchar
+
+First record format is:
+
+ - char as B[dataSize]
+ - code as UN
+
+Next record format is:
+
+ - if sequence = 0, charDelta as UB[dataSize], char = char + charDelta + 1
+ - if sequence = 1, char = char + 1
+ - codeDelta as SN, code = code + codeDelta
+
+
+### cidrange (3)
+
+Represents the following CMap block:
+
+ n begincidrange
+ <start> <end> code
+ endcidrange
+
+First record format is:
+
+ - start as B[dataSize]
+ - endDelta as UN[dataSize], end is calculated as (start + endDelta)
+ - code as UN
+
+Next record format is:
+
+ - if sequence = 0, startDelta as UB[dataSize], start = end + startDelta + 1
+ - if sequence = 1, start = end + 1
+ - endDelta as UN[dataSize], end = start + endDelta
+ - code as UN
+
+
+### bfchar (4)
+
+Represents the following CMap block:
+
+ n beginbfchar
+ <char> <code>
+ endbfchar
+
+First record format is:
+
+ - char as B[ucs2Size], where ucs2Size = 2 (here and below)
+ - code as B[dataSize]
+
+Next record format is:
+
+ - if sequence = 0, charDelta as UN[ucs2Size], char = charDelta + charDelta + 1
+ - if sequence = 1, char = char + 1
+ - codeDelta as SB[dataSize], code = code + codeDelta
+
+
+### bfrange (5)
+
+Represents the following CMap block:
+
+ n beginbfrange
+ <start> <end> <code>
+ endbfrange
+
+First record format is:
+
+ - start as B[ucs2Size]
+ - endDelta as UB[ucs2Size], end is calculated as (start + endDelta)
+ - code as B[dataSize]
+
+Next record format is:
+
+ - if sequence = 0, startDelta as UB[ucs2Size], start = end + startDelta + 1
+ - if sequence = 1, start = end + 1
+ - endDelta as UB[ucs2Size], end = start + endDelta
+ - code as B[dataSize]
+
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/pkg-javascript/pdf.js.git
More information about the Pkg-javascript-commits
mailing list