Understanding Encoding (Beginner’s guide)

From Wikipedia

This article will describe the different type of process involves in encoding of data.

The term encoded data means wrapped data and the process of encoding is used to transform the data into a different format so that it can be easily understood by different type of system. For example ASCII characters are encoded by means of numbers ‘A’ is represented with 65, where as ‘B’ with 66 an so on.

As we known computer does not understand human languages therefore we need to encode the data into binary language which is easily readable by computer systems hence encoding is very important. It utilises such schemes that are widely available so that it can simply be reversed. Encoding means data transformation, not data encryption consequently it does not need a key in decoding.

URL Encoded

The internet only accepts URL’s in ASCII format, URL encoding entails encoding certain parts of the URL character set. This process takes one character and converts it into a character triplet that has a prefix of “%” followed by two digits in hexadecimal format.

 Character Encoded : %3A / %2F # %23 ? %3F & %24 @ %40 % %25 + %2B %20 ; %3B = %3D \$ %26 , %2C < %3C > %3E ^ %5E ` %60 \ %5C [ %5B ] %5D { %7B } %7D | %7C “ %22

Example :

Original URL: http://www.hackingarticles.in

Encoded URL: http%3A%2F%2Fwww.hackingarticles.in

Hexadecimal or Base 16 is a positional number system which consists of 16 distinct symbols which range from 0 to 9 in numerals and both upper and lowercase alphabets which range from A to F which represent numeric values 10 to 15

Step 1 – is to get the decimal value of an alphabet, this is different for both upper and lower case, eg: A = 65 and a = 97. In order to find the value of any alphabet, we count down to it from ”A” or “a”, the values are in single digit succession, eg: A = 65 B = 66 C = 67 and so on / a = 97 b = 98 c = 99 and so on.

Step 2 – To convert from decimal to hexadecimal, take the decimal value and divide it by 16, the hex value will be written beginning from the quotient all the way up to the remainder. So, the hex value of 97 will be 61.

Eg:

 16 97 1 6 6
 Source R a j Decimal Value 82 97 106 Hexadecimal value 52 61 6a

Base64

Each base64 digit represents exactly 6 bits of data.Is a radix-64 representation of ASCII string, here’s how we get it?

Step 1 – is to get the decimal value of an alphabet, this is different for both upper and lower case, eg: A = 65 and a = 97. In order to find the value of any alphabet, we count down to it from”A” or “a”, the values are in single digit succession, eg: A = 65 B = 66 C = 67 and so on / a = 97 b = 98 c = 99 and so on.

Step 2 – is to divide the decimal value by 2, where ever there is a reminder it is denoted as “1” and where ever the remainder is “0”, it is denoted as “0”, continue to divide till you reach 0 or 1 and cannot divide any further. The binary value will be the denoted 1’s and 0’s counted from last to first.

Eg:In order to get a 8-bit value we prefix a “0” to the value, eg: 01010010 and this gives us the binary value of “a”.

 2 97 1 2 48 0 2 24 0 2 12 0 2 6 0 2 3 1 1 1

Step 3 – Write the values of all the characters in binary and make pairs of 6 (6-bit), eg: binary value of “Raj” in 8-bit = 010100 100110 000101, binary value of “Raj” in 6-bit = 010100 100110 000101 101010.

Step 4 – Write the 6-bit decimal value of the pairs we make in Step 3 and adding all the values where we have 1’s

 32 16 8 4 2 1 0 1 0 1 0 0 20 1 0 0 1 1 0 38 0 0 0 1 0 1 5 1 0 1 0 1 0 42

Step 5 – Use the Base64 table to lookup the values we get in Step 4.

The Base64 index table:

 Value Char Value Char Value Char Value Char 0 A 16 Q 32 g 48 w 1 B 17 R 33 h 49 x 2 C 18 S 34 i 50 y 3 D 19 T 35 j 51 z 4 E 20 U 36 k 52 0 5 F 21 V 37 l 53 1 6 G 22 W 38 m 54 2 7 H 23 X 39 n 55 3 8 I 24 Y 40 o 56 4 9 J 25 Z 41 p 57 5 10 K 26 a 42 q 58 6 11 L 27 b 43 r 59 7 12 M 28 c 44 s 60 8 13 N 29 d 45 t 61 9 14 O 30 e 46 u 62 + 15 P 31 f 47 v 63 /

The Base64 encoded value of Raj is UmFq. Encoded in ASCII, the characters R, a, and j are stored as decimal values 82, 97, and 106, their 8-bit binary values are 01010010, 01100001, and 01101010. These three values are joined together into a 24-bit string, producing 010100100110000101101010. Groups of 6 are converted into individual numbers from left to right. While converting from 8-bit to 6-bit, 0’s are added to fill the last slots, so that a full pair of 6 can be made.

The full conversion of “Raj” to Base64 is shown in Table 1.1 and the individual conversion of “R” and “Ra” of “Raj” are shown in Tables 1.1 and 1.2 to show a breakdown of the process with explanation

Raj                                               82 97 106                             01010010 01100001 01101010

In the Table 1.2, for character “R” of “Raj”, the values in the Bit patternsection are in 8-Bit format and they are being converted into 6-Bit and the decimal value of the 6-Bit pairs are in the Index section.Table 1.1

The same process is repeated in Table 1.3 for characters “R” and “a” of “Raj”.

For each pair of extra 0’s that are added to complete a pair of 6, an “=” is added for each pair, so the ACHII value of “0 0” is “=”.

In table 1.4 to further build on the logic used in table 1.2 and 1.3, “Raaj” is converted to “UmFhag==” in Base64, with the addition of an additional “a”, the complexity of the conversion increases. In the Indexsection we can see an additon of 33, 26 and 32 due to the change in the bit pattern.

For each pair of extra 0’s that are added to complete a pair of 6, an “=” is added for each pair, so the ACHII value of “0 0” is “=”, as done in table 1.2 and 1.3.

Rot13

This is a letter substitution cypher, it’s conversion process from plain text to cypher test is dicinging the total number of alphabets in half: A to M and N to Z. The first half mirriors the second half and vice versa. So, A = N and N = A.

Eg: Rot13 of Raj = Enw

 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z