Yet Another Realization of Multiplication Free Arithmetic Code Maxim Zakharov Algorithm description --------------------- Encoding algorithm: ------------------ Let the alphabet consist of the symbols i=0,1,...m. For the symbol i denotes its count of occurrence in the considering text (or in the current context) by A(i). With Ai we will consider the cumulative counts Q(i): Q(i) = A(0) + ... + A(i-1), Q(0)=0 (1) If for all i the corresponding A(i)>1 then sequence of the cumulative counts Q(i) singlevalued determines the sequence of the symbols i. Note, that the Q(i+1) and Q(i) differ by A(i), i.e. any number K: Q(i) <= K < Q(i+1) singlevalued define the symbol K. Thus, we would use the some bits of Q(i) for storage of the cumulative count of the next symbol of the input stream. Therefore, we can define the compressing unit as a adder-accumulator with the following rule: k(i) C(si) = ( C(s) + Q(i) ) 2 (2) where C(s) denotes the coding string for the text string s, C(si) denotes the coding string for text string s followed by the symbol i. ( C(e)=0, e - denotes the empty text string). k(i) Multiplication by 2 are equivalent to the arithmetical shift of the binary representation by k(i) bits. The values of k(i) one can define from the following ensures decodability: k(i) k(i) ( C(s) + Q(i) ) 2 + Q(j) < ( C(s) + Q(i+1) ) 2 for all symbols j. From (1) followed, that max Q(j) = Q(m), where m - the last symbol in the alphabet. Thus, k(i) k(i) ( C(s) + Q(i) ) 2 + Q(m) < ( C(s) + Q(i+1) ) 2 , k(i) ( Q(i+1) - Q(i) ) 2 > Q(m) From (1): Q(i+1) - Q(i) = A(i) Therefore, k(i)=[ log ( Q(m)/A(i)] + 1 (3) 2 [] - denotes the truncated integer. Decoding Algorithm: ------------------ Let we use n bits for the representation of the cumulative counts. Then, at first, read n bits from the coding stream. Denotes the receipting number by G. After that, one find the symbol r such, that Q(r) <= G < Q(r+1) and subtract the value of Q(r) from G, and shift the new value of G by k(r) bits and read k(r0 bits from coding string to G. Output symbol r to outstream. One find symbol r' such, that: Q(r') <= G < Q(r'+1) ... and so on. Simple Example of algorithm operation ------------------------------------- Let give the following text: abbadacad (9 symbols) Fixed model. ------------ Encoding: A(a)=4 A(b)=2 A(c)=1 A(d)=2 Huffman's coding: Q(a)=0 Q(b)=4 Q(c)=6 Q(d)=7 k(a)=1 k(b)=2 k(c)=3 k(d)=2 4 2 2 1 ------------------------------------ a b d c Input symbol | Codeword \ \ 1\ / 0 ------------------------------------ \ 1 \ 3 e | 000 1 \ \ / 0 a | 0 000 \ 5 b | 010 000 \ / 0 b | 01010 000 9 a | 010100 000 d | 01010011 100 a 1 a | 010100111 000 b 01 c | 010100111110 000 c 000 a | 0101001111100 000 d 001 d | 0101001111100 111 ------------------------------------ This coding method: 1.778 bit/symbol Huffman's coding: 1.889 bit/symbol This example selected speciality. But, in some case Huffman's coding is better. Decoding: ---------------------------------------------- Codeword (current 3 bits) | Output symbol ---------------------------------------------- 010 (2) | a 101 (5) | b 100 (4) | b 011 (3) | a 111 (7) | d 011 (3) | a 110 (6) | c 011 (3) | a 111 (7) | d ---------------------------------------------- Adaptive model (incomplete). ---------------------------- ------------------------------------------------- Input symbol | The State of encoding unit ------------------------------------------------- e | C = 000 | A(a)=1 A(b)=1 A(c)=1 A(d)=1 | Q(a)=0 Q(b)=1 Q(c)=2 Q(d)=3 ------------------------------------------------- a | C = 00 000 | A(a)=2 A(b)=1 A(c)=1 A(d)=1 | Q(a)=0 Q(b)=2 Q(c)=3 Q(d)=4 ------------------------------------------------- b | C = 00010 000 | A(a)=2 A(b)=2 A(c)=1 A(d)=1 | Q(a)=0 Q(b)=2 Q(c)=4 Q(d)=5 ------------------------------------------------- b | C = 0001001 000 | A(a)=2 A(b)=3 A(c)=1 A(d)=1 | Q(a)=0 Q(b)=2 Q(c)=5 Q(d)=6 ------------------------------------------------- . . . -------------------------------------------------- The State of decoding unit | Output symbol -------------------------------------------------- C = 000 | a A(a)=1 A(b)=1 A(c)=1 A(d)=1 | Q(a)=0 Q(b)=1 Q(c)=2 Q(d)=3 | -------------------------------------------------- C = 010 | b A(a)=2 A(b)=1 A(c)=1 A(d)=1 | Q(a)=0 Q(b)=2 Q(c)=3 Q(d)=4 | -------------------------------------------------- C = 010 | b A(a)=2 A(b)=2 A(c)=1 A(d)=1 | Q(a)=0 Q(b)=2 Q(c)=4 Q(d)=5 | -------------------------------------------------- . . . Compression ratio formulation ----------------------------- I can't judge by (3) about the efficiency of this technique. However, one notice, that Q(N) <= M, where M - the number of all symbols in the compressing text. Then from (3) followed: k(i) <= k'(i)=[ log ( M/A(i) ) ], 2 and if M tend to infinity: k'(i)=[ log 1/p(i) ], 2 where p(i) - the probability of occurrence of symbol i in the text. If we denotes the difference between the entropy and the average number of bits per symbol for this technique and this text by R(s). Then from (4) one can see, that 0 <= R(s) < 1 I was find the analogous asymptotical estimate for the Huffman's coding. Effect of externally introduced errors -------------------------------------- The correct function of the encoding and decoding units strongly depend of the introduced errors. But, one can use CRC for transmission via the channel with external noise. Reference --------- Other realization of the Multiplication-free Arithmetic coding one can see in: Jorma Rissanen and K.M.Mohiuddin A Multiplication-Free Multialphabet Arithmetic Code //IEEE Transaction on communications vol.37 No 2, february 1989.