diff options
author | Chris Lattner <sabre@nondot.org> | 2007-10-22 04:32:37 +0000 |
---|---|---|
committer | Chris Lattner <sabre@nondot.org> | 2007-10-22 04:32:37 +0000 |
commit | dbca7df0f705638a0a0b336cae87829cf07eb54d (patch) | |
tree | 5e94542a46166f64687f1187dd5c685762f2b088 /docs | |
parent | 33d7a7657bc06363fe41d8773fa5467653d7e99b (diff) | |
download | external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.zip external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.tar.gz external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.tar.bz2 |
add part 1, review appreciated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@43215 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r-- | docs/tutorial/LangImpl1.html | 263 | ||||
-rw-r--r-- | docs/tutorial/index.html | 7 |
2 files changed, 267 insertions, 3 deletions
diff --git a/docs/tutorial/LangImpl1.html b/docs/tutorial/LangImpl1.html new file mode 100644 index 0000000..49ad181 --- /dev/null +++ b/docs/tutorial/LangImpl1.html @@ -0,0 +1,263 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" + "http://www.w3.org/TR/html4/strict.dtd"> + +<html> +<head> + <title>Kaleidoscope: The basic language, with its lexer</title> + <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> + <meta name="author" content="Chris Lattner"> + <link rel="stylesheet" href="../llvm.css" type="text/css"> +</head> + +<body> + +<div class="doc_title">Kaleidoscope: The basic language, with its lexer</div> + +<div class="doc_author"> + <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> +</div> + +<!-- *********************************************************************** --> +<div class="doc_section"><a name="intro">Tutorial Introduction</a></div> +<!-- *********************************************************************** --> + +<div class="doc_text"> + +<p>Welcome to the "Implementing a language with LLVM" tutorial. This tutorial +will run through implementation of a simple language, showing how fun and easy +it can be. This tutorial will get you up and started and build a framework you +can extend to other languages and to play with other things. +</p> + +</div> + +<!-- *********************************************************************** --> +<div class="doc_section"><a name="language">The basic language</a></div> +<!-- *********************************************************************** --> + +<div class="doc_text"> + +<p>This tutorial will be illustrated with a toy language that we'll call +"<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>". +Kaleidoscope is a procedural language that allows you to define functions, use +conditionals, math, etc. Over the course of the tutorial, we'll extend +Kaleidoscope to support if/then/else, operator overloading, JIT compilation with +a simple command line interface, etc.</p> + +<p>Because we want to keep things simple, in Kaleidoscope the only datatype is a +64-bit floating point type (aka 'double' in C parlance). As such, all values +are implicitly double precision and the language doesn't require type +declarations. This gives the language a very nice and simple syntax. For +example, A simple example computes <a +href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers</a>, +which looks like this:</p> + +<div class="doc_code"> +<pre> +# Compute the x'th fibonacci number. +def fib(x) + if x < 3 then + 1 + else + fib(x-1)+fib(x-2) + +# This expression will compute the 40th number. +fib(40) +</pre> +</div> + +<p>We also allow Kaleidoscope to call into standard library functions (this LLVM +JIT makes this completely trivial). This means that you can use the 'extern' +keyword to define a function before you use it (this is also useful for mutually +recursive functions). For example:</p> + +<div class="doc_code"> +<pre> +extern sin(arg); +extern cos(arg); +extern atan2(arg1 arg2); + +atan2(sin(.4), cos(42)) +</pre> +</div> + +<p>In the first incarnation of the language, we will only support basic +arithmetic: if/then/else will be added in a future installment. Another +interesting aspect of the first implementation is that it is a completely +functional language, which does not allow you to have side-effects etc. We will +eventually add side effects for those who prefer them.</p> + +<p>In order to make this tutorial +maximally understandable and hackable, we choose to implement everything in C++ +instead of using lexer and parser generators. LLVM obviously works just fine +with these tools, and choice of these tools doesn't impact overall design.</p> + +<p>A note about this tutorial: we expect you to extend the language and play +with it on your own. Take the code and go crazy hacking away at it. It can be +a lot of fun to play with languages!</p> + +</div> + +<!-- *********************************************************************** --> +<div class="doc_section"><a name="language">The Lexer</a></div> +<!-- *********************************************************************** --> + +<div class="doc_text"> + +<p>When it comes to implementing a language, the first thing needed is +the ability to process a text file and recognize what it says. The traditional +way to do this is to use a "<a +href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner') +to break the input up into "tokens". Each token returned by the lexer includes +a token code and potentially some metadata (e.g. the numeric value of a number). +First, we define the possibilities: +</p> + +<div class="doc_code"> +<pre> +// The lexer returns tokens [0-255] if it is an unknown character, otherwise one +// of these for known things. +enum Token { + tok_eof = -1, + + // commands + tok_def = -2, tok_extern = -3, + + // primary + tok_identifier = -4, tok_number = -5, +}; + +static std::string IdentifierStr; // Filled in if tok_identifier +static double NumVal; // Filled in if tok_number +</pre> +</div> + +<p>Each token returned by our lexer will either be one of the Token enum values +or it will be an 'unknown' character like '+' which is returned as its ascii +value. If the current token is an identifier, the <tt>IdentifierStr</tt> +global variable holds the name of the identifier. If the current token is a +numeric literal (like 1.0), <tt>NumVal</tt> holds its value. Note that we use +global variables for simplicity, this is not the best choice for a real language +implementation :). +</p> + +<p>The actual implementation of the lexer is a single function <tt>gettok</tt>. +<tt>gettok</tt> is called to return the next token from standard input. Its +definition starts as:</p> + +<div class="doc_code"> +<pre> +/// gettok - Return the next token from standard input. +static int gettok() { + static int LastChar = ' '; + + // Skip any whitespace. + while (isspace(LastChar)) + LastChar = getchar(); +</pre> +</div> + +<p> +<tt>gettok</tt> works by calling the C <tt>getchar()</tt> function to read +characters one at a time from standard input. It eats them as it recognizes +them and stores the last character read but not processed in LastChar. The +first thing that it has to do is ignore whitespace between tokens. This is +accomplished with the loop above.</p> + +<p>The next thing it needs to do is recognize identifiers, and specific keywords +like "def". Kaleidoscope does this with this simple loop:</p> + +<div class="doc_code"> +<pre> + if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* + IdentifierStr = LastChar; + while (isalnum((LastChar = getchar()))) + IdentifierStr += LastChar; + + if (IdentifierStr == "def") return tok_def; + if (IdentifierStr == "extern") return tok_extern; + return tok_identifier; + } +</pre> +</div> + +<p>Note that it sets the '<tt>IdentifierStr</tt>' global whenever it lexes an +identifier. Also, since language keywords are matched by the same loop, we +handle them here inline. Numeric values are similar:</p> + +<div class="doc_code"> +<pre> + if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ + std::string NumStr; + do { + NumStr += LastChar; + LastChar = getchar(); + } while (isdigit(LastChar) || LastChar == '.'); + + NumVal = strtod(NumStr.c_str(), 0); + return tok_number; + } +</pre> +</div> + +<p>This is all pretty straight-forward code for processing input. When reading +a numeric value from input, we use the C <tt>strtod</tt> function to convert it +to a numeric value that we store in <tt>NumVal</tt>. Note that this isn't doing +sufficient error checking: it will incorrect read "1.23.45.67" and handle it as +if you typed in "1.23". Feel free to extend it :). Next we handle comments: +</p> + +<div class="doc_code"> +<pre> + if (LastChar == '#') { + // Comment until end of line. + do LastChar = getchar(); + while (LastChar != EOF && LastChar != '\n' & LastChar != '\r'); + + if (LastChar != EOF) + return gettok(); + } +</pre> +</div> + +<p>We handle comments by skipping to the end of the line and then returnning the +next comment. Finally, if the input doesn't match one of the above cases, it is +either an operator character like '+', the end of file. These are handled with +this code:</p> + +<div class="doc_code"> +<pre> + // Check for end of file. Don't eat the EOF. + if (LastChar == EOF) + return tok_eof; + + // Otherwise, just return the character as its ascii value. + int ThisChar = LastChar; + LastChar = getchar(); + return ThisChar; +} +</pre> +</div> + +<p>With this, we have the complete lexer for the basic Kaleidoscope language. +Next we'll <a href="LangImpl2.html">build a simple parser that uses this to +build an Abstract Syntax Tree</a>. If you prefer, you can jump to the <a +href="index.html">main tutorial index page</a>. +</p> + +</div> + +<!-- *********************************************************************** --> +<hr> +<address> + <a href="http://jigsaw.w3.org/css-validator/check/referer"><img + src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> + <a href="http://validator.w3.org/check/referer"><img + src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a> + + <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> + <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> + Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $ +</address> +</body> +</html> diff --git a/docs/tutorial/index.html b/docs/tutorial/index.html index 43c5dab..acaee03 100644 --- a/docs/tutorial/index.html +++ b/docs/tutorial/index.html @@ -25,17 +25,18 @@ <li><!--<a href="Tutorial5.html">-->Invoking the JIT</li> </ol> </li> - <li>Implementing a simple language with LLVM + <li>Implementing a language with LLVM: Kaleidoscope <ol> <li><a href="LangImpl1.html">The basic language, with its lexer</a></li> <li>Implementing a Parser and AST</li> <li>Implementing code generation to LLVM IR</li> + <li>Adding JIT codegen support</li> <li>Extending the language: if/then/else</li> <li>Extending the language: operator overloading</li> - <li>Adding JIT codegen support</li> + <li>Extending the language: mutable variables</li> <li>Thoughts and ideas for extensions</li> </ol></li> </ol> </body> -</html>
\ No newline at end of file +</html> |