Programming Fundamentals

Character sets and Tokens

The words(tokens) and statements used in any language are formed from their basic character set. For example, the words used in English language are formed from its alphabet having 26 different symbols i.e. A,B, ..., Z or a,b, ..., z. The characters used in any language are basically divided into four categories.

            1. Letters or alphabets
            2. Digits
            3. Special characters
            4. White spaces

The letters include the uppercaseA,B, ... ,Z) and lowercase(a,b, ... , z) alphabets of English language. The digits include 0,1,2, ..., 9. It also uses special characters like ;(semicolon), ‘(single quotes), "(double quotes), +,-,*,/,%,>, = etc. for different purposes. The white spaces are used to separate the words or tokens. They are blank, tab and newline.

The character together makes special symbol or word knows as token. The examples of tokens are the words used to define data types like int, float, etc. Similarly any operator i.e. +,-,*,/,%,> etc, punctuation marks like (semicolon), braces ( l ete are also tokens. The constants used in programs like 12, 12.65 are also tokens.