CSCI4160 Project 2 Solved

30.00 $ 15.00 $




This assignment serves several purposes:

  • to be familiar with flex
  • to understand lexical analysis using the existing tool



In this assignment, you are required to use flex – a scanner generator to generate a scanner for Tiger language. The manual for Tiger language can be found at the class repository: Tiger\manual.pdf.



Tiger language:

The reserved words of the language are:  while, for, to, break, let, in, end, function, var, type, array, if, then, else, do, of, nil. 


The punctuation symbols used in the language are:

,  :  ;  (  )  [  ]  {  }  .  +  –  *   /  =  <>  <  <=  >  >=  &  |  := 


The string value that you return for a string literal should have all the escape sequences translated into their meanings.


There are no negative integer literals; return two separate tokens for -32.


Detect unclosed comments (at end of file) and unclosed strings.


Tips and Requirements:


Your lexer should use regular expressions to do the work EXCEPT for the nesting counting.  For example, don’t find a beginning quote and then use C++ code to look for matching end quote… use regular expression(s)….


The comments in Tiger language is the same as C style comments that start with /* and end with the matching subsequent */. Any symbol is allowed within a comment. However, the comments can be nested, i.e.,

/* this is a line

/* this is ok

As many lines as you desire

This one only has three

*/  this closes the nested comment */  // this closed the first one.


Nesting can be of any level.  Comments are ignored similar to whitespace. To handle nested comments, you will need a counter within lexer that counts the beginning of comment and decrements when ending.  The comment is completed when it reaches zero.


String literal is a sequence, between quotes (“), of zero or more printable characters, spaces, or escape sequences. Each escape sequence is introduced by the escape character \, and stands for a character sequence. The allowed escape sequences are as follows (all other uses of \ being illegal):

  • \n             a character interpreted by the system as end-of-line.
  • \t             TAB
  • \”             the double-quote character (“)
  • \\             the backslash character (\)


What to do in this project?

You need to provide rules to recognize reserved words (while, for, to, break, let, in, end, function, var, type, array, if, then, else, do, of, nil), punctuation symbols (,  :  ;  (  )  [  ]  {  }  .  +  –  *   /  =  <>  <  <=  >  >=  &  |  :=), comments, identifiers, integer literals, and string literals. More specifically,

  • For white space character like “ “, \n, and \t; recognize them and discard it.
  • For each recognized reserved words , and punctuation symbols, return their corresponding tokens in the rule action. All tokens are defined in tokens.h file.
  • For comments, just ignore all content in the comments. Make sure your rules recognize nested comments.
  • For integer literal, recognize it and return the INT token. Make sure you store the associated integer value to variable ival.
  • For identifier, recognize it and return the ID token. Make sure you store the associated name to variable sval.
  • For string literal, it is better to use start condition o For the double quote (“) marking the beginning of the string literal, reset the variable value to be an empty string, modify variables beginLine and beginCol, and start

STRING condition; o For all rules under STRING condition that recognize part of the string literal, append  the recognized part to the variable value;

  • For all rules under STRING condition that finds an error (like illegal escape sequence, unclosed string), report the error.
  • For the closing double quote (“), copy variable value to sval, just like the actions in identifier rule, and start the INITIAL condition.


Make sure the provided function newline() is called for every newline character in the source file.  Make sure the provided function error() is called for every recognized errors (illegal character, unclosed comments/strings, illegal escape character sequences, etc.). Other than calling function error(); there should be no output statement in rule actions.


Instructor provided files in the class repository


The following files are provided by the instructor:

  • Folder FlexProject. This contains a sample Visual Studio 2010 project. If you want to use this provided project for this assignment instead of creating your own project from scratch, please pay attention to the following:
    • You have installed the Flex and Bison as specified in the class repository document:

How-to\flex and bison\ flex and bison installation.doc

Make sure the folder (like c:\gnuWin32\bin) containing flex.exe/bison.exe is added to your path.

  • Make sure instructor provided FlexLexer.h and flex.skl are copied to the correct location as specified in the first 2 pages of the following document in the class repository:

How-to\flex and bison\windows\ HOWTO-C++-flex-bison-Visual Studio.doc

  • Skeleton source files provided in the sample project are listed below:
    • h: contains the definition of all tokens and definition of YYSTYPE, which is a structure to hold values associated with matched token.
    • h: contains the definition of error handler
    • cpp: the driver
    • ll: a flex skeleton file for this project. In this assignment, you only need to work on this file. No change on other files is necessary.
    • tig, test1.tig, test2.tig: different test program of tiger language
  • pdf: this file
  • doc: the rubric used to grade this assignment.
  • txt, test1.txt, and text2.txt. These are instructor provided output for test0.tig, test1.tig, and test2.tig, respectively. Your output may be different depending on how you handle string errors.