add part 1, review appreciated.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@43215 91177308-0d34-0410-b5e6-96231b3b80d8
author: Chris Lattner <sabre@nondot.org> 2007-10-22 04:32:37 +0000
committer: Chris Lattner <sabre@nondot.org> 2007-10-22 04:32:37 +0000
commit: dbca7df0f705638a0a0b336cae87829cf07eb54d (patch)
tree: 5e94542a46166f64687f1187dd5c685762f2b088 /docs
parent: 33d7a7657bc06363fe41d8773fa5467653d7e99b (diff)
download: external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.zip
external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.tar.gz
external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.tar.bz2
2 files changed, 267 insertions, 3 deletions
diff --git a/docs/tutorial/LangImpl1.html b/docs/tutorial/LangImpl1.html
new file mode 100644
index 0000000..49ad181
--- /dev/null
+++ b/docs/tutorial/LangImpl1.html
@@ -0,0 +1,263 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+                      "http://www.w3.org/TR/html4/strict.dtd">
+
+<html>
+<head>
+  <title>Kaleidoscope: The basic language, with its lexer</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+  <meta name="author" content="Chris Lattner">
+  <link rel="stylesheet" href="../llvm.css" type="text/css">
+</head>
+
+<body>
+
+<div class="doc_title">Kaleidoscope: The basic language, with its lexer</div>
+
+<div class="doc_author">
+  <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="intro">Tutorial Introduction</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>Welcome to the "Implementing a language with LLVM" tutorial.  This tutorial
+will run through implementation of a simple language, showing how fun and easy
+it can be.  This tutorial will get you up and started and build a framework you
+can extend to other languages and to play with other things.
+</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="language">The basic language</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>This tutorial will be illustrated with a toy language that we'll call
+"<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>".
+Kaleidoscope is a procedural language that allows you to define functions, use
+conditionals, math, etc.  Over the course of the tutorial, we'll extend
+Kaleidoscope to support if/then/else, operator overloading, JIT compilation with
+a simple command line interface, etc.</p>
+
+<p>Because we want to keep things simple, in Kaleidoscope the only datatype is a
+64-bit floating point type (aka 'double' in C parlance).  As such, all values
+are implicitly double precision and the language doesn't require type
+declarations.  This gives the language a very nice and simple syntax.  For
+example, A simple example computes <a 
+href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers</a>,
+which looks like this:</p>
+
+<div class="doc_code">
+<pre>
+# Compute the x'th fibonacci number.
+def fib(x)
+  if x < 3 then
+    1
+  else
+    fib(x-1)+fib(x-2)
+
+# This expression will compute the 40th number.
+fib(40)
+</pre>
+</div>
+
+<p>We also allow Kaleidoscope to call into standard library functions (this LLVM
+JIT makes this completely trivial).  This means that you can use the 'extern'
+keyword to define a function before you use it (this is also useful for mutually
+recursive functions).  For example:</p>
+
+<div class="doc_code">
+<pre>
+extern sin(arg);
+extern cos(arg);
+extern atan2(arg1 arg2);
+
+atan2(sin(.4), cos(42))
+</pre>
+</div>
+
+<p>In the first incarnation of the language, we will only support basic
+arithmetic: if/then/else will be added in a future installment.  Another
+interesting aspect of the first implementation is that it is a completely
+functional language, which does not allow you to have side-effects etc.  We will
+eventually add side effects for those who prefer them.</p>
+
+<p>In order to make this tutorial
+maximally understandable and hackable, we choose to implement everything in C++
+instead of using lexer and parser generators.  LLVM obviously works just fine
+with these tools, and choice of these tools doesn't impact overall design.</p>
+
+<p>A note about this tutorial: we expect you to extend the language and play
+with it on your own.  Take the code and go crazy hacking away at it.  It can be
+a lot of fun to play with languages!</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="language">The Lexer</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>When it comes to implementing a language, the first thing needed is
+the ability to process a text file and recognize what it says.  The traditional
+way to do this is to use a "<a 
+href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner')
+to break the input up into "tokens".  Each token returned by the lexer includes
+a token code and potentially some metadata (e.g. the numeric value of a number).
+First, we define the possibilities:
+</p>
+
+<div class="doc_code">
+<pre>
+// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
+// of these for known things.
+enum Token {
+  tok_eof = -1,
+
+  // commands
+  tok_def = -2, tok_extern = -3,
+
+  // primary
+  tok_identifier = -4, tok_number = -5,
+};
+
+static std::string IdentifierStr;  // Filled in if tok_identifier
+static double NumVal;              // Filled in if tok_number
+</pre>
+</div>
+
+<p>Each token returned by our lexer will either be one of the Token enum values
+or it will be an 'unknown' character like '+' which is returned as its ascii
+value.  If the current token is an identifier, the <tt>IdentifierStr</tt>
+global variable holds the name of the identifier.  If the current token is a
+numeric literal (like 1.0), <tt>NumVal</tt> holds its value.  Note that we use
+global variables for simplicity, this is not the best choice for a real language
+implementation :).
+</p>
+
+<p>The actual implementation of the lexer is a single function <tt>gettok</tt>.
+<tt>gettok</tt> is called to return the next token from standard input.  Its
+definition starts as:</p>
+
+<div class="doc_code">
+<pre>
+/// gettok - Return the next token from standard input.
+static int gettok() {
+  static int LastChar = ' ';
+
+  // Skip any whitespace.
+  while (isspace(LastChar))
+    LastChar = getchar();
+</pre>
+</div>
+
+<p>
+<tt>gettok</tt> works by calling the C <tt>getchar()</tt> function to read
+characters one at a time from standard input.  It eats them as it recognizes
+them and stores the last character read but not processed in LastChar.  The
+first thing that it has to do is ignore whitespace between tokens.  This is 
+accomplished with the loop above.</p>
+
+<p>The next thing it needs to do is recognize identifiers, and specific keywords
+like "def".  Kaleidoscope does this with this simple loop:</p>
+
+<div class="doc_code">
+<pre>
+  if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
+    IdentifierStr = LastChar;
+    while (isalnum((LastChar = getchar())))
+      IdentifierStr += LastChar;
+
+    if (IdentifierStr == "def") return tok_def;
+    if (IdentifierStr == "extern") return tok_extern;
+    return tok_identifier;
+  }
+</pre>
+</div>
+
+<p>Note that it sets the '<tt>IdentifierStr</tt>' global whenever it lexes an
+identifier.  Also, since language keywords are matched by the same loop, we
+handle them here inline.  Numeric values are similar:</p>
+
+<div class="doc_code">
+<pre>
+  if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
+    std::string NumStr;
+    do {
+      NumStr += LastChar;
+      LastChar = getchar();
+    } while (isdigit(LastChar) || LastChar == '.');
+
+    NumVal = strtod(NumStr.c_str(), 0);
+    return tok_number;
+  }
+</pre>
+</div>
+
+<p>This is all pretty straight-forward code for processing input.  When reading
+a numeric value from input, we use the C <tt>strtod</tt> function to convert it
+to a numeric value that we store in <tt>NumVal</tt>.  Note that this isn't doing
+sufficient error checking: it will incorrect read "1.23.45.67" and handle it as
+if you typed in "1.23".  Feel free to extend it :).  Next we handle comments:
+</p>
+
+<div class="doc_code">
+<pre>
+  if (LastChar == '#') {
+    // Comment until end of line.
+    do LastChar = getchar();
+    while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
+    
+    if (LastChar != EOF)
+      return gettok();
+  }
+</pre>
+</div>
+
+<p>We handle comments by skipping to the end of the line and then returnning the
+next comment.  Finally, if the input doesn't match one of the above cases, it is
+either an operator character like '+', the end of file.  These are handled with
+this code:</p>
+
+<div class="doc_code">
+<pre>
+  // Check for end of file.  Don't eat the EOF.
+  if (LastChar == EOF)
+    return tok_eof;
+  
+  // Otherwise, just return the character as its ascii value.
+  int ThisChar = LastChar;
+  LastChar = getchar();
+  return ThisChar;
+}
+</pre>
+</div>
+
+<p>With this, we have the complete lexer for the basic Kaleidoscope language.
+Next we'll <a href="LangImpl2.html">build a simple parser that uses this to 
+build an Abstract Syntax Tree</a>.  If you prefer, you can jump to the <a
+href="index.html">main tutorial index page</a>.
+</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<hr>
+<address>
+  <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
+  src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
+  <a href="http://validator.w3.org/check/referer"><img
+  src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a>
+
+  <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
+  <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
+  Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
+</address>
+</body>
+</html>
diff --git a/docs/tutorial/index.html b/docs/tutorial/index.html
index 43c5dab..acaee03 100644
--- a/docs/tutorial/index.html
+++ b/docs/tutorial/index.html
@@ -25,17 +25,18 @@
       <li><!--<a href="Tutorial5.html">-->Invoking the JIT</li>
     </ol>
   </li>
-  <li>Implementing a simple language with LLVM
+  <li>Implementing a language with LLVM: Kaleidoscope
   <ol>
     <li><a href="LangImpl1.html">The basic language, with its lexer</a></li>
     <li>Implementing a Parser and AST</li>
     <li>Implementing code generation to LLVM IR</li>
+    <li>Adding JIT codegen support</li>
     <li>Extending the language: if/then/else</li>
     <li>Extending the language: operator overloading</li>
-    <li>Adding JIT codegen support</li>
+    <li>Extending the language: mutable variables</li>
     <li>Thoughts and ideas for extensions</li>
   </ol></li>
 </ol>
 
 </body>
-</html>
-\ No newline at end of file
+</html>
author	Chris Lattner <sabre@nondot.org>	2007-10-22 04:32:37 +0000
committer	Chris Lattner <sabre@nondot.org>	2007-10-22 04:32:37 +0000
commit	dbca7df0f705638a0a0b336cae87829cf07eb54d (patch)
tree	5e94542a46166f64687f1187dd5c685762f2b088 /docs
parent	33d7a7657bc06363fe41d8773fa5467653d7e99b (diff)
download	external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.zip external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.tar.gz external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.tar.bz2