Lexical Analysis

lex-i-cal: of or relating to words or the vocabulary of a language as distinguished from its grammar and construction

Webster's Dictionary

OVERVIEW

To translate a program from one language into another, a compiler must first pull it apart and understand its structure and meaning, then put it together in a different way. The front end of the compiler performs analysis; the back end does synthesis. The analysis is usually broken up into Lexical analysis: breaking the input into individual words or "tokens"; Syntax analysis: parsing the phrase structure of the program; and Semantic analysis: calculating the program's meaning. The lexical analyzer takes a stream of characters and produces a stream of names, keywords, and punctuation marks; it discards white space and comments between the tokens. It would unduly complicate the parser to have to account for possible white space and comments at every possible point; this is the main reason for separating lexical analysis from parsing.

Lexical analysis is not very complicated, but we will attack it with highpowered formalisms and tools, because similar formalisms will be useful in the study of parsing and similar tools have many apps in areas other than compilation.