aboutsummaryrefslogtreecommitdiffstats
path: root/docs/tutorial
diff options
context:
space:
mode:
authormike-m <mikem.llvm@gmail.com>2010-05-06 23:45:43 +0000
committermike-m <mikem.llvm@gmail.com>2010-05-06 23:45:43 +0000
commit68cb31901c590cabceee6e6356d62c84142114cb (patch)
tree6444bddc975b662fbe47d63cd98a7b776a407c1a /docs/tutorial
parentc26ae5ab7e2d65b67c97524e66f50ce86445dec7 (diff)
downloadexternal_llvm-68cb31901c590cabceee6e6356d62c84142114cb.zip
external_llvm-68cb31901c590cabceee6e6356d62c84142114cb.tar.gz
external_llvm-68cb31901c590cabceee6e6356d62c84142114cb.tar.bz2
Overhauled llvm/clang docs builds. Closes PR6613.
NOTE: 2nd part changeset for cfe trunk to follow. *** PRE-PATCH ISSUES ADDRESSED - clang api docs fail build from objdir - clang/llvm api docs collide in install PREFIX/ - clang/llvm main docs collide in install - clang/llvm main docs have full of hard coded destination assumptions and make use of absolute root in static html files; namely CommandGuide tools hard codes a website destination for cross references and some html cross references assume website root paths *** IMPROVEMENTS - bumped Doxygen from 1.4.x -> 1.6.3 - splits llvm/clang docs into 'main' and 'api' (doxygen) build trees - provide consistent, reliable doc builds for both main+api docs - support buid vs. install vs. website intentions - support objdir builds - document targets with 'make help' - correct clean and uninstall operations - use recursive dir delete only where absolutely necessary - added call function fn.RMRF which safeguards against botched 'rm -rf'; if any target (or any variable is evaluated) which attempts to remove any dirs which match a hard-coded 'safelist', a verbose error will be printed and make will error-stop. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103213 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/tutorial')
-rw-r--r--docs/tutorial/LangImpl1.html348
-rw-r--r--docs/tutorial/LangImpl2.html1233
-rw-r--r--docs/tutorial/LangImpl3.html1269
-rw-r--r--docs/tutorial/LangImpl4.html1132
-rw-r--r--docs/tutorial/LangImpl5-cfg.pngbin38586 -> 0 bytes
-rw-r--r--docs/tutorial/LangImpl5.html1777
-rw-r--r--docs/tutorial/LangImpl6.html1814
-rw-r--r--docs/tutorial/LangImpl7.html2164
-rw-r--r--docs/tutorial/LangImpl8.html365
-rw-r--r--docs/tutorial/Makefile28
-rw-r--r--docs/tutorial/OCamlLangImpl1.html365
-rw-r--r--docs/tutorial/OCamlLangImpl2.html1045
-rw-r--r--docs/tutorial/OCamlLangImpl3.html1093
-rw-r--r--docs/tutorial/OCamlLangImpl4.html1029
-rw-r--r--docs/tutorial/OCamlLangImpl5.html1569
-rw-r--r--docs/tutorial/OCamlLangImpl6.html1574
-rw-r--r--docs/tutorial/OCamlLangImpl7.html1907
-rw-r--r--docs/tutorial/index.html48
18 files changed, 0 insertions, 18760 deletions
diff --git a/docs/tutorial/LangImpl1.html b/docs/tutorial/LangImpl1.html
deleted file mode 100644
index 66843db..0000000
--- a/docs/tutorial/LangImpl1.html
+++ /dev/null
@@ -1,348 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Tutorial Introduction and the Lexer</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Tutorial Introduction and the Lexer</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 1
- <ol>
- <li><a href="#intro">Tutorial Introduction</a></li>
- <li><a href="#language">The Basic Language</a></li>
- <li><a href="#lexer">The Lexer</a></li>
- </ol>
-</li>
-<li><a href="LangImpl2.html">Chapter 2</a>: Implementing a Parser and AST</li>
-</ul>
-
-<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Tutorial Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to the "Implementing a language with LLVM" tutorial. This tutorial
-runs through the implementation of a simple language, showing how fun and
-easy it can be. This tutorial will get you up and started as well as help to
-build a framework you can extend to other languages. The code in this tutorial
-can also be used as a playground to hack on other LLVM specific things.
-</p>
-
-<p>
-The goal of this tutorial is to progressively unveil our language, describing
-how it is built up over time. This will let us cover a fairly broad range of
-language design and LLVM-specific usage issues, showing and explaining the code
-for it all along the way, without overwhelming you with tons of details up
-front.</p>
-
-<p>It is useful to point out ahead of time that this tutorial is really about
-teaching compiler techniques and LLVM specifically, <em>not</em> about teaching
-modern and sane software engineering principles. In practice, this means that
-we'll take a number of shortcuts to simplify the exposition. For example, the
-code leaks memory, uses global variables all over the place, doesn't use nice
-design patterns like <a
-href="http://en.wikipedia.org/wiki/Visitor_pattern">visitors</a>, etc... but it
-is very simple. If you dig in and use the code as a basis for future projects,
-fixing these deficiencies shouldn't be hard.</p>
-
-<p>I've tried to put this tutorial together in a way that makes chapters easy to
-skip over if you are already familiar with or are uninterested in the various
-pieces. The structure of the tutorial is:
-</p>
-
-<ul>
-<li><b><a href="#language">Chapter #1</a>: Introduction to the Kaleidoscope
-language, and the definition of its Lexer</b> - This shows where we are going
-and the basic functionality that we want it to do. In order to make this
-tutorial maximally understandable and hackable, we choose to implement
-everything in C++ instead of using lexer and parser generators. LLVM obviously
-works just fine with such tools, feel free to use one if you prefer.</li>
-<li><b><a href="LangImpl2.html">Chapter #2</a>: Implementing a Parser and
-AST</b> - With the lexer in place, we can talk about parsing techniques and
-basic AST construction. This tutorial describes recursive descent parsing and
-operator precedence parsing. Nothing in Chapters 1 or 2 is LLVM-specific,
-the code doesn't even link in LLVM at this point. :)</li>
-<li><b><a href="LangImpl3.html">Chapter #3</a>: Code generation to LLVM IR</b> -
-With the AST ready, we can show off how easy generation of LLVM IR really
-is.</li>
-<li><b><a href="LangImpl4.html">Chapter #4</a>: Adding JIT and Optimizer
-Support</b> - Because a lot of people are interested in using LLVM as a JIT,
-we'll dive right into it and show you the 3 lines it takes to add JIT support.
-LLVM is also useful in many other ways, but this is one simple and "sexy" way
-to shows off its power. :)</li>
-<li><b><a href="LangImpl5.html">Chapter #5</a>: Extending the Language: Control
-Flow</b> - With the language up and running, we show how to extend it with
-control flow operations (if/then/else and a 'for' loop). This gives us a chance
-to talk about simple SSA construction and control flow.</li>
-<li><b><a href="LangImpl6.html">Chapter #6</a>: Extending the Language:
-User-defined Operators</b> - This is a silly but fun chapter that talks about
-extending the language to let the user program define their own arbitrary
-unary and binary operators (with assignable precedence!). This lets us build a
-significant piece of the "language" as library routines.</li>
-<li><b><a href="LangImpl7.html">Chapter #7</a>: Extending the Language: Mutable
-Variables</b> - This chapter talks about adding user-defined local variables
-along with an assignment operator. The interesting part about this is how
-easy and trivial it is to construct SSA form in LLVM: no, LLVM does <em>not</em>
-require your front-end to construct SSA form!</li>
-<li><b><a href="LangImpl8.html">Chapter #8</a>: Conclusion and other useful LLVM
-tidbits</b> - This chapter wraps up the series by talking about potential
-ways to extend the language, but also includes a bunch of pointers to info about
-"special topics" like adding garbage collection support, exceptions, debugging,
-support for "spaghetti stacks", and a bunch of other tips and tricks.</li>
-
-</ul>
-
-<p>By the end of the tutorial, we'll have written a bit less than 700 lines of
-non-comment, non-blank, lines of code. With this small amount of code, we'll
-have built up a very reasonable compiler for a non-trivial language including
-a hand-written lexer, parser, AST, as well as code generation support with a JIT
-compiler. While other systems may have interesting "hello world" tutorials,
-I think the breadth of this tutorial is a great testament to the strengths of
-LLVM and why you should consider it if you're interested in language or compiler
-design.</p>
-
-<p>A note about this tutorial: we expect you to extend the language and play
-with it on your own. Take the code and go crazy hacking away at it, compilers
-don't need to be scary creatures - it can be a lot of fun to play with
-languages!</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="language">The Basic Language</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>This tutorial will be illustrated with a toy language that we'll call
-"<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>" (derived
-from "meaning beautiful, form, and view").
-Kaleidoscope is a procedural language that allows you to define functions, use
-conditionals, math, etc. Over the course of the tutorial, we'll extend
-Kaleidoscope to support the if/then/else construct, a for loop, user defined
-operators, JIT compilation with a simple command line interface, etc.</p>
-
-<p>Because we want to keep things simple, the only datatype in Kaleidoscope is a
-64-bit floating point type (aka 'double' in C parlance). As such, all values
-are implicitly double precision and the language doesn't require type
-declarations. This gives the language a very nice and simple syntax. For
-example, the following simple example computes <a
-href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers:</a></p>
-
-<div class="doc_code">
-<pre>
-# Compute the x'th fibonacci number.
-def fib(x)
- if x &lt; 3 then
- 1
- else
- fib(x-1)+fib(x-2)
-
-# This expression will compute the 40th number.
-fib(40)
-</pre>
-</div>
-
-<p>We also allow Kaleidoscope to call into standard library functions (the LLVM
-JIT makes this completely trivial). This means that you can use the 'extern'
-keyword to define a function before you use it (this is also useful for mutually
-recursive functions). For example:</p>
-
-<div class="doc_code">
-<pre>
-extern sin(arg);
-extern cos(arg);
-extern atan2(arg1 arg2);
-
-atan2(sin(.4), cos(42))
-</pre>
-</div>
-
-<p>A more interesting example is included in Chapter 6 where we write a little
-Kaleidoscope application that <a href="LangImpl6.html#example">displays
-a Mandelbrot Set</a> at various levels of magnification.</p>
-
-<p>Lets dive into the implementation of this language!</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="lexer">The Lexer</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>When it comes to implementing a language, the first thing needed is
-the ability to process a text file and recognize what it says. The traditional
-way to do this is to use a "<a
-href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner')
-to break the input up into "tokens". Each token returned by the lexer includes
-a token code and potentially some metadata (e.g. the numeric value of a number).
-First, we define the possibilities:
-</p>
-
-<div class="doc_code">
-<pre>
-// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
-// of these for known things.
-enum Token {
- tok_eof = -1,
-
- // commands
- tok_def = -2, tok_extern = -3,
-
- // primary
- tok_identifier = -4, tok_number = -5,
-};
-
-static std::string IdentifierStr; // Filled in if tok_identifier
-static double NumVal; // Filled in if tok_number
-</pre>
-</div>
-
-<p>Each token returned by our lexer will either be one of the Token enum values
-or it will be an 'unknown' character like '+', which is returned as its ASCII
-value. If the current token is an identifier, the <tt>IdentifierStr</tt>
-global variable holds the name of the identifier. If the current token is a
-numeric literal (like 1.0), <tt>NumVal</tt> holds its value. Note that we use
-global variables for simplicity, this is not the best choice for a real language
-implementation :).
-</p>
-
-<p>The actual implementation of the lexer is a single function named
-<tt>gettok</tt>. The <tt>gettok</tt> function is called to return the next token
-from standard input. Its definition starts as:</p>
-
-<div class="doc_code">
-<pre>
-/// gettok - Return the next token from standard input.
-static int gettok() {
- static int LastChar = ' ';
-
- // Skip any whitespace.
- while (isspace(LastChar))
- LastChar = getchar();
-</pre>
-</div>
-
-<p>
-<tt>gettok</tt> works by calling the C <tt>getchar()</tt> function to read
-characters one at a time from standard input. It eats them as it recognizes
-them and stores the last character read, but not processed, in LastChar. The
-first thing that it has to do is ignore whitespace between tokens. This is
-accomplished with the loop above.</p>
-
-<p>The next thing <tt>gettok</tt> needs to do is recognize identifiers and
-specific keywords like "def". Kaleidoscope does this with this simple loop:</p>
-
-<div class="doc_code">
-<pre>
- if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
- IdentifierStr = LastChar;
- while (isalnum((LastChar = getchar())))
- IdentifierStr += LastChar;
-
- if (IdentifierStr == "def") return tok_def;
- if (IdentifierStr == "extern") return tok_extern;
- return tok_identifier;
- }
-</pre>
-</div>
-
-<p>Note that this code sets the '<tt>IdentifierStr</tt>' global whenever it
-lexes an identifier. Also, since language keywords are matched by the same
-loop, we handle them here inline. Numeric values are similar:</p>
-
-<div class="doc_code">
-<pre>
- if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
- std::string NumStr;
- do {
- NumStr += LastChar;
- LastChar = getchar();
- } while (isdigit(LastChar) || LastChar == '.');
-
- NumVal = strtod(NumStr.c_str(), 0);
- return tok_number;
- }
-</pre>
-</div>
-
-<p>This is all pretty straight-forward code for processing input. When reading
-a numeric value from input, we use the C <tt>strtod</tt> function to convert it
-to a numeric value that we store in <tt>NumVal</tt>. Note that this isn't doing
-sufficient error checking: it will incorrectly read "1.23.45.67" and handle it as
-if you typed in "1.23". Feel free to extend it :). Next we handle comments:
-</p>
-
-<div class="doc_code">
-<pre>
- if (LastChar == '#') {
- // Comment until end of line.
- do LastChar = getchar();
- while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
-
- if (LastChar != EOF)
- return gettok();
- }
-</pre>
-</div>
-
-<p>We handle comments by skipping to the end of the line and then return the
-next token. Finally, if the input doesn't match one of the above cases, it is
-either an operator character like '+' or the end of the file. These are handled
-with this code:</p>
-
-<div class="doc_code">
-<pre>
- // Check for end of file. Don't eat the EOF.
- if (LastChar == EOF)
- return tok_eof;
-
- // Otherwise, just return the character as its ascii value.
- int ThisChar = LastChar;
- LastChar = getchar();
- return ThisChar;
-}
-</pre>
-</div>
-
-<p>With this, we have the complete lexer for the basic Kaleidoscope language
-(the <a href="LangImpl2.html#code">full code listing</a> for the Lexer is
-available in the <a href="LangImpl2.html">next chapter</a> of the tutorial).
-Next we'll <a href="LangImpl2.html">build a simple parser that uses this to
-build an Abstract Syntax Tree</a>. When we have that, we'll include a driver
-so that you can use the lexer and parser together.
-</p>
-
-<a href="LangImpl2.html">Next: Implementing a Parser and AST</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/LangImpl2.html b/docs/tutorial/LangImpl2.html
deleted file mode 100644
index 9c13b48..0000000
--- a/docs/tutorial/LangImpl2.html
+++ /dev/null
@@ -1,1233 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Implementing a Parser and AST</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 2
- <ol>
- <li><a href="#intro">Chapter 2 Introduction</a></li>
- <li><a href="#ast">The Abstract Syntax Tree (AST)</a></li>
- <li><a href="#parserbasics">Parser Basics</a></li>
- <li><a href="#parserprimexprs">Basic Expression Parsing</a></li>
- <li><a href="#parserbinops">Binary Expression Parsing</a></li>
- <li><a href="#parsertop">Parsing the Rest</a></li>
- <li><a href="#driver">The Driver</a></li>
- <li><a href="#conclusions">Conclusions</a></li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="LangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li>
-</ul>
-
-<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 2 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. This chapter shows you how to use the lexer, built in
-<a href="LangImpl1.html">Chapter 1</a>, to build a full <a
-href="http://en.wikipedia.org/wiki/Parsing">parser</a> for
-our Kaleidoscope language. Once we have a parser, we'll define and build an <a
-href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
-Tree</a> (AST).</p>
-
-<p>The parser we will build uses a combination of <a
-href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
-Parsing</a> and <a href=
-"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
-Parsing</a> to parse the Kaleidoscope language (the latter for
-binary expressions and the former for everything else). Before we get to
-parsing though, lets talk about the output of the parser: the Abstract Syntax
-Tree.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>The AST for a program captures its behavior in such a way that it is easy for
-later stages of the compiler (e.g. code generation) to interpret. We basically
-want one object for each construct in the language, and the AST should closely
-model the language. In Kaleidoscope, we have expressions, a prototype, and a
-function object. We'll start with expressions first:</p>
-
-<div class="doc_code">
-<pre>
-/// ExprAST - Base class for all expression nodes.
-class ExprAST {
-public:
- virtual ~ExprAST() {}
-};
-
-/// NumberExprAST - Expression class for numeric literals like "1.0".
-class NumberExprAST : public ExprAST {
- double Val;
-public:
- NumberExprAST(double val) : Val(val) {}
-};
-</pre>
-</div>
-
-<p>The code above shows the definition of the base ExprAST class and one
-subclass which we use for numeric literals. The important thing to note about
-this code is that the NumberExprAST class captures the numeric value of the
-literal as an instance variable. This allows later phases of the compiler to
-know what the stored numeric value is.</p>
-
-<p>Right now we only create the AST, so there are no useful accessor methods on
-them. It would be very easy to add a virtual method to pretty print the code,
-for example. Here are the other expression AST node definitions that we'll use
-in the basic form of the Kaleidoscope language:
-</p>
-
-<div class="doc_code">
-<pre>
-/// VariableExprAST - Expression class for referencing a variable, like "a".
-class VariableExprAST : public ExprAST {
- std::string Name;
-public:
- VariableExprAST(const std::string &amp;name) : Name(name) {}
-};
-
-/// BinaryExprAST - Expression class for a binary operator.
-class BinaryExprAST : public ExprAST {
- char Op;
- ExprAST *LHS, *RHS;
-public:
- BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
- : Op(op), LHS(lhs), RHS(rhs) {}
-};
-
-/// CallExprAST - Expression class for function calls.
-class CallExprAST : public ExprAST {
- std::string Callee;
- std::vector&lt;ExprAST*&gt; Args;
-public:
- CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
- : Callee(callee), Args(args) {}
-};
-</pre>
-</div>
-
-<p>This is all (intentionally) rather straight-forward: variables capture the
-variable name, binary operators capture their opcode (e.g. '+'), and calls
-capture a function name as well as a list of any argument expressions. One thing
-that is nice about our AST is that it captures the language features without
-talking about the syntax of the language. Note that there is no discussion about
-precedence of binary operators, lexical structure, etc.</p>
-
-<p>For our basic language, these are all of the expression nodes we'll define.
-Because it doesn't have conditional control flow, it isn't Turing-complete;
-we'll fix that in a later installment. The two things we need next are a way
-to talk about the interface to a function, and a way to talk about functions
-themselves:</p>
-
-<div class="doc_code">
-<pre>
-/// PrototypeAST - This class represents the "prototype" for a function,
-/// which captures its name, and its argument names (thus implicitly the number
-/// of arguments the function takes).
-class PrototypeAST {
- std::string Name;
- std::vector&lt;std::string&gt; Args;
-public:
- PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
- : Name(name), Args(args) {}
-};
-
-/// FunctionAST - This class represents a function definition itself.
-class FunctionAST {
- PrototypeAST *Proto;
- ExprAST *Body;
-public:
- FunctionAST(PrototypeAST *proto, ExprAST *body)
- : Proto(proto), Body(body) {}
-};
-</pre>
-</div>
-
-<p>In Kaleidoscope, functions are typed with just a count of their arguments.
-Since all values are double precision floating point, the type of each argument
-doesn't need to be stored anywhere. In a more aggressive and realistic
-language, the "ExprAST" class would probably have a type field.</p>
-
-<p>With this scaffolding, we can now talk about parsing expressions and function
-bodies in Kaleidoscope.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="parserbasics">Parser Basics</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Now that we have an AST to build, we need to define the parser code to build
-it. The idea here is that we want to parse something like "x+y" (which is
-returned as three tokens by the lexer) into an AST that could be generated with
-calls like this:</p>
-
-<div class="doc_code">
-<pre>
- ExprAST *X = new VariableExprAST("x");
- ExprAST *Y = new VariableExprAST("y");
- ExprAST *Result = new BinaryExprAST('+', X, Y);
-</pre>
-</div>
-
-<p>In order to do this, we'll start by defining some basic helper routines:</p>
-
-<div class="doc_code">
-<pre>
-/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
-/// token the parser is looking at. getNextToken reads another token from the
-/// lexer and updates CurTok with its results.
-static int CurTok;
-static int getNextToken() {
- return CurTok = gettok();
-}
-</pre>
-</div>
-
-<p>
-This implements a simple token buffer around the lexer. This allows
-us to look one token ahead at what the lexer is returning. Every function in
-our parser will assume that CurTok is the current token that needs to be
-parsed.</p>
-
-<div class="doc_code">
-<pre>
-
-/// Error* - These are little helper functions for error handling.
-ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
-PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
-FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
-</pre>
-</div>
-
-<p>
-The <tt>Error</tt> routines are simple helper routines that our parser will use
-to handle errors. The error recovery in our parser will not be the best and
-is not particular user-friendly, but it will be enough for our tutorial. These
-routines make it easier to handle errors in routines that have various return
-types: they always return null.</p>
-
-<p>With these basic helper functions, we can implement the first
-piece of our grammar: numeric literals.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="parserprimexprs">Basic Expression
- Parsing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>We start with numeric literals, because they are the simplest to process.
-For each production in our grammar, we'll define a function which parses that
-production. For numeric literals, we have:
-</p>
-
-<div class="doc_code">
-<pre>
-/// numberexpr ::= number
-static ExprAST *ParseNumberExpr() {
- ExprAST *Result = new NumberExprAST(NumVal);
- getNextToken(); // consume the number
- return Result;
-}
-</pre>
-</div>
-
-<p>This routine is very simple: it expects to be called when the current token
-is a <tt>tok_number</tt> token. It takes the current number value, creates
-a <tt>NumberExprAST</tt> node, advances the lexer to the next token, and finally
-returns.</p>
-
-<p>There are some interesting aspects to this. The most important one is that
-this routine eats all of the tokens that correspond to the production and
-returns the lexer buffer with the next token (which is not part of the grammar
-production) ready to go. This is a fairly standard way to go for recursive
-descent parsers. For a better example, the parenthesis operator is defined like
-this:</p>
-
-<div class="doc_code">
-<pre>
-/// parenexpr ::= '(' expression ')'
-static ExprAST *ParseParenExpr() {
- getNextToken(); // eat (.
- ExprAST *V = ParseExpression();
- if (!V) return 0;
-
- if (CurTok != ')')
- return Error("expected ')'");
- getNextToken(); // eat ).
- return V;
-}
-</pre>
-</div>
-
-<p>This function illustrates a number of interesting things about the
-parser:</p>
-
-<p>
-1) It shows how we use the Error routines. When called, this function expects
-that the current token is a '(' token, but after parsing the subexpression, it
-is possible that there is no ')' waiting. For example, if the user types in
-"(4 x" instead of "(4)", the parser should emit an error. Because errors can
-occur, the parser needs a way to indicate that they happened: in our parser, we
-return null on an error.</p>
-
-<p>2) Another interesting aspect of this function is that it uses recursion by
-calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call
-<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle
-recursive grammars, and keeps each production very simple. Note that
-parentheses do not cause construction of AST nodes themselves. While we could
-do it this way, the most important role of parentheses are to guide the parser
-and provide grouping. Once the parser constructs the AST, parentheses are not
-needed.</p>
-
-<p>The next simple production is for handling variable references and function
-calls:</p>
-
-<div class="doc_code">
-<pre>
-/// identifierexpr
-/// ::= identifier
-/// ::= identifier '(' expression* ')'
-static ExprAST *ParseIdentifierExpr() {
- std::string IdName = IdentifierStr;
-
- getNextToken(); // eat identifier.
-
- if (CurTok != '(') // Simple variable ref.
- return new VariableExprAST(IdName);
-
- // Call.
- getNextToken(); // eat (
- std::vector&lt;ExprAST*&gt; Args;
- if (CurTok != ')') {
- while (1) {
- ExprAST *Arg = ParseExpression();
- if (!Arg) return 0;
- Args.push_back(Arg);
-
- if (CurTok == ')') break;
-
- if (CurTok != ',')
- return Error("Expected ')' or ',' in argument list");
- getNextToken();
- }
- }
-
- // Eat the ')'.
- getNextToken();
-
- return new CallExprAST(IdName, Args);
-}
-</pre>
-</div>
-
-<p>This routine follows the same style as the other routines. (It expects to be
-called if the current token is a <tt>tok_identifier</tt> token). It also has
-recursion and error handling. One interesting aspect of this is that it uses
-<em>look-ahead</em> to determine if the current identifier is a stand alone
-variable reference or if it is a function call expression. It handles this by
-checking to see if the token after the identifier is a '(' token, constructing
-either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate.
-</p>
-
-<p>Now that we have all of our simple expression-parsing logic in place, we can
-define a helper function to wrap it together into one entry point. We call this
-class of expressions "primary" expressions, for reasons that will become more
-clear <a href="LangImpl6.html#unary">later in the tutorial</a>. In order to
-parse an arbitrary primary expression, we need to determine what sort of
-expression it is:</p>
-
-<div class="doc_code">
-<pre>
-/// primary
-/// ::= identifierexpr
-/// ::= numberexpr
-/// ::= parenexpr
-static ExprAST *ParsePrimary() {
- switch (CurTok) {
- default: return Error("unknown token when expecting an expression");
- case tok_identifier: return ParseIdentifierExpr();
- case tok_number: return ParseNumberExpr();
- case '(': return ParseParenExpr();
- }
-}
-</pre>
-</div>
-
-<p>Now that you see the definition of this function, it is more obvious why we
-can assume the state of CurTok in the various functions. This uses look-ahead
-to determine which sort of expression is being inspected, and then parses it
-with a function call.</p>
-
-<p>Now that basic expressions are handled, we need to handle binary expressions.
-They are a bit more complex.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="parserbinops">Binary Expression
- Parsing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Binary expressions are significantly harder to parse because they are often
-ambiguous. For example, when given the string "x+y*z", the parser can choose
-to parse it as either "(x+y)*z" or "x+(y*z)". With common definitions from
-mathematics, we expect the later parse, because "*" (multiplication) has
-higher <em>precedence</em> than "+" (addition).</p>
-
-<p>There are many ways to handle this, but an elegant and efficient way is to
-use <a href=
-"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
-Parsing</a>. This parsing technique uses the precedence of binary operators to
-guide recursion. To start with, we need a table of precedences:</p>
-
-<div class="doc_code">
-<pre>
-/// BinopPrecedence - This holds the precedence for each binary operator that is
-/// defined.
-static std::map&lt;char, int&gt; BinopPrecedence;
-
-/// GetTokPrecedence - Get the precedence of the pending binary operator token.
-static int GetTokPrecedence() {
- if (!isascii(CurTok))
- return -1;
-
- // Make sure it's a declared binop.
- int TokPrec = BinopPrecedence[CurTok];
- if (TokPrec &lt;= 0) return -1;
- return TokPrec;
-}
-
-int main() {
- // Install standard binary operators.
- // 1 is lowest precedence.
- BinopPrecedence['&lt;'] = 10;
- BinopPrecedence['+'] = 20;
- BinopPrecedence['-'] = 20;
- BinopPrecedence['*'] = 40; // highest.
- ...
-}
-</pre>
-</div>
-
-<p>For the basic form of Kaleidoscope, we will only support 4 binary operators
-(this can obviously be extended by you, our brave and intrepid reader). The
-<tt>GetTokPrecedence</tt> function returns the precedence for the current token,
-or -1 if the token is not a binary operator. Having a map makes it easy to add
-new operators and makes it clear that the algorithm doesn't depend on the
-specific operators involved, but it would be easy enough to eliminate the map
-and do the comparisons in the <tt>GetTokPrecedence</tt> function. (Or just use
-a fixed-size array).</p>
-
-<p>With the helper above defined, we can now start parsing binary expressions.
-The basic idea of operator precedence parsing is to break down an expression
-with potentially ambiguous binary operators into pieces. Consider ,for example,
-the expression "a+b+(c+d)*e*f+g". Operator precedence parsing considers this
-as a stream of primary expressions separated by binary operators. As such,
-it will first parse the leading primary expression "a", then it will see the
-pairs [+, b] [+, (c+d)] [*, e] [*, f] and [+, g]. Note that because parentheses
-are primary expressions, the binary expression parser doesn't need to worry
-about nested subexpressions like (c+d) at all.
-</p>
-
-<p>
-To start, an expression is a primary expression potentially followed by a
-sequence of [binop,primaryexpr] pairs:</p>
-
-<div class="doc_code">
-<pre>
-/// expression
-/// ::= primary binoprhs
-///
-static ExprAST *ParseExpression() {
- ExprAST *LHS = ParsePrimary();
- if (!LHS) return 0;
-
- return ParseBinOpRHS(0, LHS);
-}
-</pre>
-</div>
-
-<p><tt>ParseBinOpRHS</tt> is the function that parses the sequence of pairs for
-us. It takes a precedence and a pointer to an expression for the part that has been
-parsed so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is
-allowed to be empty, in which case it returns the expression that is passed into
-it. In our example above, the code passes the expression for "a" into
-<tt>ParseBinOpRHS</tt> and the current token is "+".</p>
-
-<p>The precedence value passed into <tt>ParseBinOpRHS</tt> indicates the <em>
-minimal operator precedence</em> that the function is allowed to eat. For
-example, if the current pair stream is [+, x] and <tt>ParseBinOpRHS</tt> is
-passed in a precedence of 40, it will not consume any tokens (because the
-precedence of '+' is only 20). With this in mind, <tt>ParseBinOpRHS</tt> starts
-with:</p>
-
-<div class="doc_code">
-<pre>
-/// binoprhs
-/// ::= ('+' primary)*
-static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
- // If this is a binop, find its precedence.
- while (1) {
- int TokPrec = GetTokPrecedence();
-
- // If this is a binop that binds at least as tightly as the current binop,
- // consume it, otherwise we are done.
- if (TokPrec &lt; ExprPrec)
- return LHS;
-</pre>
-</div>
-
-<p>This code gets the precedence of the current token and checks to see if if is
-too low. Because we defined invalid tokens to have a precedence of -1, this
-check implicitly knows that the pair-stream ends when the token stream runs out
-of binary operators. If this check succeeds, we know that the token is a binary
-operator and that it will be included in this expression:</p>
-
-<div class="doc_code">
-<pre>
- // Okay, we know this is a binop.
- int BinOp = CurTok;
- getNextToken(); // eat binop
-
- // Parse the primary expression after the binary operator.
- ExprAST *RHS = ParsePrimary();
- if (!RHS) return 0;
-</pre>
-</div>
-
-<p>As such, this code eats (and remembers) the binary operator and then parses
-the primary expression that follows. This builds up the whole pair, the first of
-which is [+, b] for the running example.</p>
-
-<p>Now that we parsed the left-hand side of an expression and one pair of the
-RHS sequence, we have to decide which way the expression associates. In
-particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)".
-To determine this, we look ahead at "binop" to determine its precedence and
-compare it to BinOp's precedence (which is '+' in this case):</p>
-
-<div class="doc_code">
-<pre>
- // If BinOp binds less tightly with RHS than the operator after RHS, let
- // the pending operator take RHS as its LHS.
- int NextPrec = GetTokPrecedence();
- if (TokPrec &lt; NextPrec) {
-</pre>
-</div>
-
-<p>If the precedence of the binop to the right of "RHS" is lower or equal to the
-precedence of our current operator, then we know that the parentheses associate
-as "(a+b) binop ...". In our example, the current operator is "+" and the next
-operator is "+", we know that they have the same precedence. In this case we'll
-create the AST node for "a+b", and then continue parsing:</p>
-
-<div class="doc_code">
-<pre>
- ... if body omitted ...
- }
-
- // Merge LHS/RHS.
- LHS = new BinaryExprAST(BinOp, LHS, RHS);
- } // loop around to the top of the while loop.
-}
-</pre>
-</div>
-
-<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next
-iteration of the loop, with "+" as the current token. The code above will eat,
-remember, and parse "(c+d)" as the primary expression, which makes the
-current pair equal to [+, (c+d)]. It will then evaluate the 'if' conditional above with
-"*" as the binop to the right of the primary. In this case, the precedence of "*" is
-higher than the precedence of "+" so the if condition will be entered.</p>
-
-<p>The critical question left here is "how can the if condition parse the right
-hand side in full"? In particular, to build the AST correctly for our example,
-it needs to get all of "(c+d)*e*f" as the RHS expression variable. The code to
-do this is surprisingly simple (code from the above two blocks duplicated for
-context):</p>
-
-<div class="doc_code">
-<pre>
- // If BinOp binds less tightly with RHS than the operator after RHS, let
- // the pending operator take RHS as its LHS.
- int NextPrec = GetTokPrecedence();
- if (TokPrec &lt; NextPrec) {
- <b>RHS = ParseBinOpRHS(TokPrec+1, RHS);
- if (RHS == 0) return 0;</b>
- }
- // Merge LHS/RHS.
- LHS = new BinaryExprAST(BinOp, LHS, RHS);
- } // loop around to the top of the while loop.
-}
-</pre>
-</div>
-
-<p>At this point, we know that the binary operator to the RHS of our primary
-has higher precedence than the binop we are currently parsing. As such, we know
-that any sequence of pairs whose operators are all higher precedence than "+"
-should be parsed together and returned as "RHS". To do this, we recursively
-invoke the <tt>ParseBinOpRHS</tt> function specifying "TokPrec+1" as the minimum
-precedence required for it to continue. In our example above, this will cause
-it to return the AST node for "(c+d)*e*f" as RHS, which is then set as the RHS
-of the '+' expression.</p>
-
-<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed
-and added to the AST. With this little bit of code (14 non-trivial lines), we
-correctly handle fully general binary expression parsing in a very elegant way.
-This was a whirlwind tour of this code, and it is somewhat subtle. I recommend
-running through it with a few tough examples to see how it works.
-</p>
-
-<p>This wraps up handling of expressions. At this point, we can point the
-parser at an arbitrary token stream and build an expression from it, stopping
-at the first token that is not part of the expression. Next up we need to
-handle function definitions, etc.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="parsertop">Parsing the Rest</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-The next thing missing is handling of function prototypes. In Kaleidoscope,
-these are used both for 'extern' function declarations as well as function body
-definitions. The code to do this is straight-forward and not very interesting
-(once you've survived expressions):
-</p>
-
-<div class="doc_code">
-<pre>
-/// prototype
-/// ::= id '(' id* ')'
-static PrototypeAST *ParsePrototype() {
- if (CurTok != tok_identifier)
- return ErrorP("Expected function name in prototype");
-
- std::string FnName = IdentifierStr;
- getNextToken();
-
- if (CurTok != '(')
- return ErrorP("Expected '(' in prototype");
-
- // Read the list of argument names.
- std::vector&lt;std::string&gt; ArgNames;
- while (getNextToken() == tok_identifier)
- ArgNames.push_back(IdentifierStr);
- if (CurTok != ')')
- return ErrorP("Expected ')' in prototype");
-
- // success.
- getNextToken(); // eat ')'.
-
- return new PrototypeAST(FnName, ArgNames);
-}
-</pre>
-</div>
-
-<p>Given this, a function definition is very simple, just a prototype plus
-an expression to implement the body:</p>
-
-<div class="doc_code">
-<pre>
-/// definition ::= 'def' prototype expression
-static FunctionAST *ParseDefinition() {
- getNextToken(); // eat def.
- PrototypeAST *Proto = ParsePrototype();
- if (Proto == 0) return 0;
-
- if (ExprAST *E = ParseExpression())
- return new FunctionAST(Proto, E);
- return 0;
-}
-</pre>
-</div>
-
-<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as
-well as to support forward declaration of user functions. These 'extern's are just
-prototypes with no body:</p>
-
-<div class="doc_code">
-<pre>
-/// external ::= 'extern' prototype
-static PrototypeAST *ParseExtern() {
- getNextToken(); // eat extern.
- return ParsePrototype();
-}
-</pre>
-</div>
-
-<p>Finally, we'll also let the user type in arbitrary top-level expressions and
-evaluate them on the fly. We will handle this by defining anonymous nullary
-(zero argument) functions for them:</p>
-
-<div class="doc_code">
-<pre>
-/// toplevelexpr ::= expression
-static FunctionAST *ParseTopLevelExpr() {
- if (ExprAST *E = ParseExpression()) {
- // Make an anonymous proto.
- PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
- return new FunctionAST(Proto, E);
- }
- return 0;
-}
-</pre>
-</div>
-
-<p>Now that we have all the pieces, let's build a little driver that will let us
-actually <em>execute</em> this code we've built!</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="driver">The Driver</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>The driver for this simply invokes all of the parsing pieces with a top-level
-dispatch loop. There isn't much interesting here, so I'll just include the
-top-level loop. See <a href="#code">below</a> for full code in the "Top-Level
-Parsing" section.</p>
-
-<div class="doc_code">
-<pre>
-/// top ::= definition | external | expression | ';'
-static void MainLoop() {
- while (1) {
- fprintf(stderr, "ready&gt; ");
- switch (CurTok) {
- case tok_eof: return;
- case ';': getNextToken(); break; // ignore top-level semicolons.
- case tok_def: HandleDefinition(); break;
- case tok_extern: HandleExtern(); break;
- default: HandleTopLevelExpression(); break;
- }
- }
-}
-</pre>
-</div>
-
-<p>The most interesting part of this is that we ignore top-level semicolons.
-Why is this, you ask? The basic reason is that if you type "4 + 5" at the
-command line, the parser doesn't know whether that is the end of what you will type
-or not. For example, on the next line you could type "def foo..." in which case
-4+5 is the end of a top-level expression. Alternatively you could type "* 6",
-which would continue the expression. Having top-level semicolons allows you to
-type "4+5;", and the parser will know you are done.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="conclusions">Conclusions</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>With just under 400 lines of commented code (240 lines of non-comment,
-non-blank code), we fully defined our minimal language, including a lexer,
-parser, and AST builder. With this done, the executable will validate
-Kaleidoscope code and tell us if it is grammatically invalid. For
-example, here is a sample interaction:</p>
-
-<div class="doc_code">
-<pre>
-$ <b>./a.out</b>
-ready&gt; <b>def foo(x y) x+foo(y, 4.0);</b>
-Parsed a function definition.
-ready&gt; <b>def foo(x y) x+y y;</b>
-Parsed a function definition.
-Parsed a top-level expr
-ready&gt; <b>def foo(x y) x+y );</b>
-Parsed a function definition.
-Error: unknown token when expecting an expression
-ready&gt; <b>extern sin(a);</b>
-ready&gt; Parsed an extern
-ready&gt; <b>^D</b>
-$
-</pre>
-</div>
-
-<p>There is a lot of room for extension here. You can define new AST nodes,
-extend the language in many ways, etc. In the <a href="LangImpl3.html">next
-installment</a>, we will describe how to generate LLVM Intermediate
-Representation (IR) from the AST.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for this and the previous chapter.
-Note that it is fully self-contained: you don't need LLVM or any external
-libraries at all for this. (Besides the C and C++ standard libraries, of
-course.) To build this, just compile with:</p>
-
-<div class="doc_code">
-<pre>
- # Compile
- g++ -g -O3 toy.cpp
- # Run
- ./a.out
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<div class="doc_code">
-<pre>
-#include &lt;cstdio&gt;
-#include &lt;cstdlib&gt;
-#include &lt;string&gt;
-#include &lt;map&gt;
-#include &lt;vector&gt;
-
-//===----------------------------------------------------------------------===//
-// Lexer
-//===----------------------------------------------------------------------===//
-
-// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
-// of these for known things.
-enum Token {
- tok_eof = -1,
-
- // commands
- tok_def = -2, tok_extern = -3,
-
- // primary
- tok_identifier = -4, tok_number = -5
-};
-
-static std::string IdentifierStr; // Filled in if tok_identifier
-static double NumVal; // Filled in if tok_number
-
-/// gettok - Return the next token from standard input.
-static int gettok() {
- static int LastChar = ' ';
-
- // Skip any whitespace.
- while (isspace(LastChar))
- LastChar = getchar();
-
- if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
- IdentifierStr = LastChar;
- while (isalnum((LastChar = getchar())))
- IdentifierStr += LastChar;
-
- if (IdentifierStr == "def") return tok_def;
- if (IdentifierStr == "extern") return tok_extern;
- return tok_identifier;
- }
-
- if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
- std::string NumStr;
- do {
- NumStr += LastChar;
- LastChar = getchar();
- } while (isdigit(LastChar) || LastChar == '.');
-
- NumVal = strtod(NumStr.c_str(), 0);
- return tok_number;
- }
-
- if (LastChar == '#') {
- // Comment until end of line.
- do LastChar = getchar();
- while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
-
- if (LastChar != EOF)
- return gettok();
- }
-
- // Check for end of file. Don't eat the EOF.
- if (LastChar == EOF)
- return tok_eof;
-
- // Otherwise, just return the character as its ascii value.
- int ThisChar = LastChar;
- LastChar = getchar();
- return ThisChar;
-}
-
-//===----------------------------------------------------------------------===//
-// Abstract Syntax Tree (aka Parse Tree)
-//===----------------------------------------------------------------------===//
-
-/// ExprAST - Base class for all expression nodes.
-class ExprAST {
-public:
- virtual ~ExprAST() {}
-};
-
-/// NumberExprAST - Expression class for numeric literals like "1.0".
-class NumberExprAST : public ExprAST {
- double Val;
-public:
- NumberExprAST(double val) : Val(val) {}
-};
-
-/// VariableExprAST - Expression class for referencing a variable, like "a".
-class VariableExprAST : public ExprAST {
- std::string Name;
-public:
- VariableExprAST(const std::string &amp;name) : Name(name) {}
-};
-
-/// BinaryExprAST - Expression class for a binary operator.
-class BinaryExprAST : public ExprAST {
- char Op;
- ExprAST *LHS, *RHS;
-public:
- BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
- : Op(op), LHS(lhs), RHS(rhs) {}
-};
-
-/// CallExprAST - Expression class for function calls.
-class CallExprAST : public ExprAST {
- std::string Callee;
- std::vector&lt;ExprAST*&gt; Args;
-public:
- CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
- : Callee(callee), Args(args) {}
-};
-
-/// PrototypeAST - This class represents the "prototype" for a function,
-/// which captures its name, and its argument names (thus implicitly the number
-/// of arguments the function takes).
-class PrototypeAST {
- std::string Name;
- std::vector&lt;std::string&gt; Args;
-public:
- PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
- : Name(name), Args(args) {}
-
-};
-
-/// FunctionAST - This class represents a function definition itself.
-class FunctionAST {
- PrototypeAST *Proto;
- ExprAST *Body;
-public:
- FunctionAST(PrototypeAST *proto, ExprAST *body)
- : Proto(proto), Body(body) {}
-
-};
-
-//===----------------------------------------------------------------------===//
-// Parser
-//===----------------------------------------------------------------------===//
-
-/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
-/// token the parser is looking at. getNextToken reads another token from the
-/// lexer and updates CurTok with its results.
-static int CurTok;
-static int getNextToken() {
- return CurTok = gettok();
-}
-
-/// BinopPrecedence - This holds the precedence for each binary operator that is
-/// defined.
-static std::map&lt;char, int&gt; BinopPrecedence;
-
-/// GetTokPrecedence - Get the precedence of the pending binary operator token.
-static int GetTokPrecedence() {
- if (!isascii(CurTok))
- return -1;
-
- // Make sure it's a declared binop.
- int TokPrec = BinopPrecedence[CurTok];
- if (TokPrec &lt;= 0) return -1;
- return TokPrec;
-}
-
-/// Error* - These are little helper functions for error handling.
-ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
-PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
-FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
-
-static ExprAST *ParseExpression();
-
-/// identifierexpr
-/// ::= identifier
-/// ::= identifier '(' expression* ')'
-static ExprAST *ParseIdentifierExpr() {
- std::string IdName = IdentifierStr;
-
- getNextToken(); // eat identifier.
-
- if (CurTok != '(') // Simple variable ref.
- return new VariableExprAST(IdName);
-
- // Call.
- getNextToken(); // eat (
- std::vector&lt;ExprAST*&gt; Args;
- if (CurTok != ')') {
- while (1) {
- ExprAST *Arg = ParseExpression();
- if (!Arg) return 0;
- Args.push_back(Arg);
-
- if (CurTok == ')') break;
-
- if (CurTok != ',')
- return Error("Expected ')' or ',' in argument list");
- getNextToken();
- }
- }
-
- // Eat the ')'.
- getNextToken();
-
- return new CallExprAST(IdName, Args);
-}
-
-/// numberexpr ::= number
-static ExprAST *ParseNumberExpr() {
- ExprAST *Result = new NumberExprAST(NumVal);
- getNextToken(); // consume the number
- return Result;
-}
-
-/// parenexpr ::= '(' expression ')'
-static ExprAST *ParseParenExpr() {
- getNextToken(); // eat (.
- ExprAST *V = ParseExpression();
- if (!V) return 0;
-
- if (CurTok != ')')
- return Error("expected ')'");
- getNextToken(); // eat ).
- return V;
-}
-
-/// primary
-/// ::= identifierexpr
-/// ::= numberexpr
-/// ::= parenexpr
-static ExprAST *ParsePrimary() {
- switch (CurTok) {
- default: return Error("unknown token when expecting an expression");
- case tok_identifier: return ParseIdentifierExpr();
- case tok_number: return ParseNumberExpr();
- case '(': return ParseParenExpr();
- }
-}
-
-/// binoprhs
-/// ::= ('+' primary)*
-static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
- // If this is a binop, find its precedence.
- while (1) {
- int TokPrec = GetTokPrecedence();
-
- // If this is a binop that binds at least as tightly as the current binop,
- // consume it, otherwise we are done.
- if (TokPrec &lt; ExprPrec)
- return LHS;
-
- // Okay, we know this is a binop.
- int BinOp = CurTok;
- getNextToken(); // eat binop
-
- // Parse the primary expression after the binary operator.
- ExprAST *RHS = ParsePrimary();
- if (!RHS) return 0;
-
- // If BinOp binds less tightly with RHS than the operator after RHS, let
- // the pending operator take RHS as its LHS.
- int NextPrec = GetTokPrecedence();
- if (TokPrec &lt; NextPrec) {
- RHS = ParseBinOpRHS(TokPrec+1, RHS);
- if (RHS == 0) return 0;
- }
-
- // Merge LHS/RHS.
- LHS = new BinaryExprAST(BinOp, LHS, RHS);
- }
-}
-
-/// expression
-/// ::= primary binoprhs
-///
-static ExprAST *ParseExpression() {
- ExprAST *LHS = ParsePrimary();
- if (!LHS) return 0;
-
- return ParseBinOpRHS(0, LHS);
-}
-
-/// prototype
-/// ::= id '(' id* ')'
-static PrototypeAST *ParsePrototype() {
- if (CurTok != tok_identifier)
- return ErrorP("Expected function name in prototype");
-
- std::string FnName = IdentifierStr;
- getNextToken();
-
- if (CurTok != '(')
- return ErrorP("Expected '(' in prototype");
-
- std::vector&lt;std::string&gt; ArgNames;
- while (getNextToken() == tok_identifier)
- ArgNames.push_back(IdentifierStr);
- if (CurTok != ')')
- return ErrorP("Expected ')' in prototype");
-
- // success.
- getNextToken(); // eat ')'.
-
- return new PrototypeAST(FnName, ArgNames);
-}
-
-/// definition ::= 'def' prototype expression
-static FunctionAST *ParseDefinition() {
- getNextToken(); // eat def.
- PrototypeAST *Proto = ParsePrototype();
- if (Proto == 0) return 0;
-
- if (ExprAST *E = ParseExpression())
- return new FunctionAST(Proto, E);
- return 0;
-}
-
-/// toplevelexpr ::= expression
-static FunctionAST *ParseTopLevelExpr() {
- if (ExprAST *E = ParseExpression()) {
- // Make an anonymous proto.
- PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
- return new FunctionAST(Proto, E);
- }
- return 0;
-}
-
-/// external ::= 'extern' prototype
-static PrototypeAST *ParseExtern() {
- getNextToken(); // eat extern.
- return ParsePrototype();
-}
-
-//===----------------------------------------------------------------------===//
-// Top-Level parsing
-//===----------------------------------------------------------------------===//
-
-static void HandleDefinition() {
- if (ParseDefinition()) {
- fprintf(stderr, "Parsed a function definition.\n");
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleExtern() {
- if (ParseExtern()) {
- fprintf(stderr, "Parsed an extern\n");
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleTopLevelExpression() {
- // Evaluate a top-level expression into an anonymous function.
- if (ParseTopLevelExpr()) {
- fprintf(stderr, "Parsed a top-level expr\n");
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-/// top ::= definition | external | expression | ';'
-static void MainLoop() {
- while (1) {
- fprintf(stderr, "ready&gt; ");
- switch (CurTok) {
- case tok_eof: return;
- case ';': getNextToken(); break; // ignore top-level semicolons.
- case tok_def: HandleDefinition(); break;
- case tok_extern: HandleExtern(); break;
- default: HandleTopLevelExpression(); break;
- }
- }
-}
-
-//===----------------------------------------------------------------------===//
-// Main driver code.
-//===----------------------------------------------------------------------===//
-
-int main() {
- // Install standard binary operators.
- // 1 is lowest precedence.
- BinopPrecedence['&lt;'] = 10;
- BinopPrecedence['+'] = 20;
- BinopPrecedence['-'] = 20;
- BinopPrecedence['*'] = 40; // highest.
-
- // Prime the first token.
- fprintf(stderr, "ready&gt; ");
- getNextToken();
-
- // Run the main "interpreter loop" now.
- MainLoop();
-
- return 0;
-}
-</pre>
-</div>
-<a href="LangImpl3.html">Next: Implementing Code Generation to LLVM IR</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/LangImpl3.html b/docs/tutorial/LangImpl3.html
deleted file mode 100644
index fe28d41..0000000
--- a/docs/tutorial/LangImpl3.html
+++ /dev/null
@@ -1,1269 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 3
- <ol>
- <li><a href="#intro">Chapter 3 Introduction</a></li>
- <li><a href="#basics">Code Generation Setup</a></li>
- <li><a href="#exprs">Expression Code Generation</a></li>
- <li><a href="#funcs">Function Code Generation</a></li>
- <li><a href="#driver">Driver Changes and Closing Thoughts</a></li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="LangImpl4.html">Chapter 4</a>: Adding JIT and Optimizer
-Support</li>
-</ul>
-
-<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 3 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. This chapter shows you how to transform the <a
-href="LangImpl2.html">Abstract Syntax Tree</a>, built in Chapter 2, into LLVM IR.
-This will teach you a little bit about how LLVM does things, as well as
-demonstrate how easy it is to use. It's much more work to build a lexer and
-parser than it is to generate LLVM IR code. :)
-</p>
-
-<p><b>Please note</b>: the code in this chapter and later require LLVM 2.2 or
-later. LLVM 2.1 and before will not work with it. Also note that you need
-to use a version of this tutorial that matches your LLVM release: If you are
-using an official LLVM release, use the version of the documentation included
-with your release or on the <a href="http://llvm.org/releases/">llvm.org
-releases page</a>.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="basics">Code Generation Setup</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-In order to generate LLVM IR, we want some simple setup to get started. First
-we define virtual code generation (codegen) methods in each AST class:</p>
-
-<div class="doc_code">
-<pre>
-/// ExprAST - Base class for all expression nodes.
-class ExprAST {
-public:
- virtual ~ExprAST() {}
- <b>virtual Value *Codegen() = 0;</b>
-};
-
-/// NumberExprAST - Expression class for numeric literals like "1.0".
-class NumberExprAST : public ExprAST {
- double Val;
-public:
- NumberExprAST(double val) : Val(val) {}
- <b>virtual Value *Codegen();</b>
-};
-...
-</pre>
-</div>
-
-<p>The Codegen() method says to emit IR for that AST node along with all the things it
-depends on, and they all return an LLVM Value object.
-"Value" is the class used to represent a "<a
-href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
-Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
-of SSA values is that their value is computed as the related instruction
-executes, and it does not get a new value until (and if) the instruction
-re-executes. In other words, there is no way to "change" an SSA value. For
-more information, please read up on <a
-href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
-Assignment</a> - the concepts are really quite natural once you grok them.</p>
-
-<p>Note that instead of adding virtual methods to the ExprAST class hierarchy,
-it could also make sense to use a <a
-href="http://en.wikipedia.org/wiki/Visitor_pattern">visitor pattern</a> or some
-other way to model this. Again, this tutorial won't dwell on good software
-engineering practices: for our purposes, adding a virtual method is
-simplest.</p>
-
-<p>The
-second thing we want is an "Error" method like we used for the parser, which will
-be used to report errors found during code generation (for example, use of an
-undeclared parameter):</p>
-
-<div class="doc_code">
-<pre>
-Value *ErrorV(const char *Str) { Error(Str); return 0; }
-
-static Module *TheModule;
-static IRBuilder&lt;&gt; Builder(getGlobalContext());
-static std::map&lt;std::string, Value*&gt; NamedValues;
-</pre>
-</div>
-
-<p>The static variables will be used during code generation. <tt>TheModule</tt>
-is the LLVM construct that contains all of the functions and global variables in
-a chunk of code. In many ways, it is the top-level structure that the LLVM IR
-uses to contain code.</p>
-
-<p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
-LLVM instructions. Instances of the <a
-href="http://llvm.org/doxygen/IRBuilder_8h-source.html"><tt>IRBuilder</tt></a>
-class template keep track of the current place to insert instructions and has
-methods to create new instructions.</p>
-
-<p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
-current scope and what their LLVM representation is. (In other words, it is a
-symbol table for the code). In this form of Kaleidoscope, the only things that
-can be referenced are function parameters. As such, function parameters will
-be in this map when generating code for their function body.</p>
-
-<p>
-With these basics in place, we can start talking about how to generate code for
-each expression. Note that this assumes that the <tt>Builder</tt> has been set
-up to generate code <em>into</em> something. For now, we'll assume that this
-has already been done, and we'll just use it to emit code.
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Generating LLVM code for expression nodes is very straightforward: less
-than 45 lines of commented code for all four of our expression nodes. First
-we'll do numeric literals:</p>
-
-<div class="doc_code">
-<pre>
-Value *NumberExprAST::Codegen() {
- return ConstantFP::get(getGlobalContext(), APFloat(Val));
-}
-</pre>
-</div>
-
-<p>In the LLVM IR, numeric constants are represented with the
-<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
-internally (<tt>APFloat</tt> has the capability of holding floating point
-constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just
-creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR
-that constants are all uniqued together and shared. For this reason, the API
-uses the "foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".</p>
-
-<div class="doc_code">
-<pre>
-Value *VariableExprAST::Codegen() {
- // Look this variable up in the function.
- Value *V = NamedValues[Name];
- return V ? V : ErrorV("Unknown variable name");
-}
-</pre>
-</div>
-
-<p>References to variables are also quite simple using LLVM. In the simple version
-of Kaleidoscope, we assume that the variable has already been emitted somewhere
-and its value is available. In practice, the only values that can be in the
-<tt>NamedValues</tt> map are function arguments. This
-code simply checks to see that the specified name is in the map (if not, an
-unknown variable is being referenced) and returns the value for it. In future
-chapters, we'll add support for <a href="LangImpl5.html#for">loop induction
-variables</a> in the symbol table, and for <a
-href="LangImpl7.html#localvars">local variables</a>.</p>
-
-<div class="doc_code">
-<pre>
-Value *BinaryExprAST::Codegen() {
- Value *L = LHS-&gt;Codegen();
- Value *R = RHS-&gt;Codegen();
- if (L == 0 || R == 0) return 0;
-
- switch (Op) {
- case '+': return Builder.CreateAdd(L, R, "addtmp");
- case '-': return Builder.CreateSub(L, R, "subtmp");
- case '*': return Builder.CreateMul(L, R, "multmp");
- case '&lt;':
- L = Builder.CreateFCmpULT(L, R, "cmptmp");
- // Convert bool 0/1 to double 0.0 or 1.0
- return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
- "booltmp");
- default: return ErrorV("invalid binary operator");
- }
-}
-</pre>
-</div>
-
-<p>Binary operators start to get more interesting. The basic idea here is that
-we recursively emit code for the left-hand side of the expression, then the
-right-hand side, then we compute the result of the binary expression. In this
-code, we do a simple switch on the opcode to create the right LLVM instruction.
-</p>
-
-<p>In the example above, the LLVM builder class is starting to show its value.
-IRBuilder knows where to insert the newly created instruction, all you have to
-do is specify what instruction to create (e.g. with <tt>CreateAdd</tt>), which
-operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name
-for the generated instruction.</p>
-
-<p>One nice thing about LLVM is that the name is just a hint. For instance, if
-the code above emits multiple "addtmp" variables, LLVM will automatically
-provide each one with an increasing, unique numeric suffix. Local value names
-for instructions are purely optional, but it makes it much easier to read the
-IR dumps.</p>
-
-<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained by
-strict rules: for example, the Left and Right operators of
-an <a href="../LangRef.html#i_add">add instruction</a> must have the same
-type, and the result type of the add must match the operand types. Because
-all values in Kaleidoscope are doubles, this makes for very simple code for add,
-sub and mul.</p>
-
-<p>On the other hand, LLVM specifies that the <a
-href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
-(a one bit integer). The problem with this is that Kaleidoscope wants the value to be a 0.0 or 1.0 value. In order to get these semantics, we combine the fcmp instruction with
-a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction
-converts its input integer into a floating point value by treating the input
-as an unsigned value. In contrast, if we used the <a
-href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '&lt;'
-operator would return 0.0 and -1.0, depending on the input value.</p>
-
-<div class="doc_code">
-<pre>
-Value *CallExprAST::Codegen() {
- // Look up the name in the global module table.
- Function *CalleeF = TheModule-&gt;getFunction(Callee);
- if (CalleeF == 0)
- return ErrorV("Unknown function referenced");
-
- // If argument mismatch error.
- if (CalleeF-&gt;arg_size() != Args.size())
- return ErrorV("Incorrect # arguments passed");
-
- std::vector&lt;Value*&gt; ArgsV;
- for (unsigned i = 0, e = Args.size(); i != e; ++i) {
- ArgsV.push_back(Args[i]-&gt;Codegen());
- if (ArgsV.back() == 0) return 0;
- }
-
- return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
-}
-</pre>
-</div>
-
-<p>Code generation for function calls is quite straightforward with LLVM. The
-code above initially does a function name lookup in the LLVM Module's symbol
-table. Recall that the LLVM Module is the container that holds all of the
-functions we are JIT'ing. By giving each function the same name as what the
-user specifies, we can use the LLVM symbol table to resolve function names for
-us.</p>
-
-<p>Once we have the function to call, we recursively codegen each argument that
-is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
-instruction</a>. Note that LLVM uses the native C calling conventions by
-default, allowing these calls to also call into standard library functions like
-"sin" and "cos", with no additional effort.</p>
-
-<p>This wraps up our handling of the four basic expressions that we have so far
-in Kaleidoscope. Feel free to go in and add some more. For example, by
-browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find
-several other interesting instructions that are really easy to plug into our
-basic framework.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="funcs">Function Code Generation</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Code generation for prototypes and functions must handle a number of
-details, which make their code less beautiful than expression code
-generation, but allows us to illustrate some important points. First, lets
-talk about code generation for prototypes: they are used both for function
-bodies and external function declarations. The code starts with:</p>
-
-<div class="doc_code">
-<pre>
-Function *PrototypeAST::Codegen() {
- // Make the function type: double(double,double) etc.
- std::vector&lt;const Type*&gt; Doubles(Args.size(),
- Type::getDoubleTy(getGlobalContext()));
- FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
- Doubles, false);
-
- Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
-</pre>
-</div>
-
-<p>This code packs a lot of power into a few lines. Note first that this
-function returns a "Function*" instead of a "Value*". Because a "prototype"
-really talks about the external interface for a function (not the value computed
-by an expression), it makes sense for it to return the LLVM Function it
-corresponds to when codegen'd.</p>
-
-<p>The call to <tt>FunctionType::get</tt> creates
-the <tt>FunctionType</tt> that should be used for a given Prototype. Since all
-function arguments in Kaleidoscope are of type double, the first line creates
-a vector of "N" LLVM double types. It then uses the <tt>Functiontype::get</tt>
-method to create a function type that takes "N" doubles as arguments, returns
-one double as a result, and that is not vararg (the false parameter indicates
-this). Note that Types in LLVM are uniqued just like Constants are, so you
-don't "new" a type, you "get" it.</p>
-
-<p>The final line above actually creates the function that the prototype will
-correspond to. This indicates the type, linkage and name to use, as well as which
-module to insert into. "<a href="../LangRef.html#linkage">external linkage</a>"
-means that the function may be defined outside the current module and/or that it
-is callable by functions outside the module. The Name passed in is the name the
-user specified: since "<tt>TheModule</tt>" is specified, this name is registered
-in "<tt>TheModule</tt>"s symbol table, which is used by the function call code
-above.</p>
-
-<div class="doc_code">
-<pre>
- // If F conflicted, there was already something named 'Name'. If it has a
- // body, don't allow redefinition or reextern.
- if (F-&gt;getName() != Name) {
- // Delete the one we just made and get the existing one.
- F-&gt;eraseFromParent();
- F = TheModule-&gt;getFunction(Name);
-</pre>
-</div>
-
-<p>The Module symbol table works just like the Function symbol table when it
-comes to name conflicts: if a new function is created with a name was previously
-added to the symbol table, it will get implicitly renamed when added to the
-Module. The code above exploits this fact to determine if there was a previous
-definition of this function.</p>
-
-<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases:
-first, we want to allow 'extern'ing a function more than once, as long as the
-prototypes for the externs match (since all arguments have the same type, we
-just have to check that the number of arguments match). Second, we want to
-allow 'extern'ing a function and then defining a body for it. This is useful
-when defining mutually recursive functions.</p>
-
-<p>In order to implement this, the code above first checks to see if there is
-a collision on the name of the function. If so, it deletes the function we just
-created (by calling <tt>eraseFromParent</tt>) and then calling
-<tt>getFunction</tt> to get the existing function with the specified name. Note
-that many APIs in LLVM have "erase" forms and "remove" forms. The "remove" form
-unlinks the object from its parent (e.g. a Function from a Module) and returns
-it. The "erase" form unlinks the object and then deletes it.</p>
-
-<div class="doc_code">
-<pre>
- // If F already has a body, reject this.
- if (!F-&gt;empty()) {
- ErrorF("redefinition of function");
- return 0;
- }
-
- // If F took a different number of args, reject.
- if (F-&gt;arg_size() != Args.size()) {
- ErrorF("redefinition of function with different # args");
- return 0;
- }
- }
-</pre>
-</div>
-
-<p>In order to verify the logic above, we first check to see if the pre-existing
-function is "empty". In this case, empty means that it has no basic blocks in
-it, which means it has no body. If it has no body, it is a forward
-declaration. Since we don't allow anything after a full definition of the
-function, the code rejects this case. If the previous reference to a function
-was an 'extern', we simply verify that the number of arguments for that
-definition and this one match up. If not, we emit an error.</p>
-
-<div class="doc_code">
-<pre>
- // Set names for all arguments.
- unsigned Idx = 0;
- for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
- ++AI, ++Idx) {
- AI-&gt;setName(Args[Idx]);
-
- // Add arguments to variable symbol table.
- NamedValues[Args[Idx]] = AI;
- }
- return F;
-}
-</pre>
-</div>
-
-<p>The last bit of code for prototypes loops over all of the arguments in the
-function, setting the name of the LLVM Argument objects to match, and registering
-the arguments in the <tt>NamedValues</tt> map for future use by the
-<tt>VariableExprAST</tt> AST node. Once this is set up, it returns the Function
-object to the caller. Note that we don't check for conflicting
-argument names here (e.g. "extern foo(a b a)"). Doing so would be very
-straight-forward with the mechanics we have already used above.</p>
-
-<div class="doc_code">
-<pre>
-Function *FunctionAST::Codegen() {
- NamedValues.clear();
-
- Function *TheFunction = Proto-&gt;Codegen();
- if (TheFunction == 0)
- return 0;
-</pre>
-</div>
-
-<p>Code generation for function definitions starts out simply enough: we just
-codegen the prototype (Proto) and verify that it is ok. We then clear out the
-<tt>NamedValues</tt> map to make sure that there isn't anything in it from the
-last function we compiled. Code generation of the prototype ensures that there
-is an LLVM Function object that is ready to go for us.</p>
-
-<div class="doc_code">
-<pre>
- // Create a new basic block to start insertion into.
- BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
- Builder.SetInsertPoint(BB);
-
- if (Value *RetVal = Body-&gt;Codegen()) {
-</pre>
-</div>
-
-<p>Now we get to the point where the <tt>Builder</tt> is set up. The first
-line creates a new <a href="http://en.wikipedia.org/wiki/Basic_block">basic
-block</a> (named "entry"), which is inserted into <tt>TheFunction</tt>. The
-second line then tells the builder that new instructions should be inserted into
-the end of the new basic block. Basic blocks in LLVM are an important part
-of functions that define the <a
-href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>.
-Since we don't have any control flow, our functions will only contain one
-block at this point. We'll fix this in <a href="LangImpl5.html">Chapter 5</a> :).</p>
-
-<div class="doc_code">
-<pre>
- if (Value *RetVal = Body-&gt;Codegen()) {
- // Finish off the function.
- Builder.CreateRet(RetVal);
-
- // Validate the generated code, checking for consistency.
- verifyFunction(*TheFunction);
-
- return TheFunction;
- }
-</pre>
-</div>
-
-<p>Once the insertion point is set up, we call the <tt>CodeGen()</tt> method for
-the root expression of the function. If no error happens, this emits code to
-compute the expression into the entry block and returns the value that was
-computed. Assuming no error, we then create an LLVM <a
-href="../LangRef.html#i_ret">ret instruction</a>, which completes the function.
-Once the function is built, we call <tt>verifyFunction</tt>, which
-is provided by LLVM. This function does a variety of consistency checks on the
-generated code, to determine if our compiler is doing everything right. Using
-this is important: it can catch a lot of bugs. Once the function is finished
-and validated, we return it.</p>
-
-<div class="doc_code">
-<pre>
- // Error reading body, remove function.
- TheFunction-&gt;eraseFromParent();
- return 0;
-}
-</pre>
-</div>
-
-<p>The only piece left here is handling of the error case. For simplicity, we
-handle this by merely deleting the function we produced with the
-<tt>eraseFromParent</tt> method. This allows the user to redefine a function
-that they incorrectly typed in before: if we didn't delete it, it would live in
-the symbol table, with a body, preventing future redefinition.</p>
-
-<p>This code does have a bug, though. Since the <tt>PrototypeAST::Codegen</tt>
-can return a previously defined forward declaration, our code can actually delete
-a forward declaration. There are a number of ways to fix this bug, see what you
-can come up with! Here is a testcase:</p>
-
-<div class="doc_code">
-<pre>
-extern foo(a b); # ok, defines foo.
-def foo(a b) c; # error, 'c' is invalid.
-def bar() foo(1, 2); # error, unknown function "foo"
-</pre>
-</div>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="driver">Driver Changes and
-Closing Thoughts</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-For now, code generation to LLVM doesn't really get us much, except that we can
-look at the pretty IR calls. The sample code inserts calls to Codegen into the
-"<tt>HandleDefinition</tt>", "<tt>HandleExtern</tt>" etc functions, and then
-dumps out the LLVM IR. This gives a nice way to look at the LLVM IR for simple
-functions. For example:
-</p>
-
-<div class="doc_code">
-<pre>
-ready> <b>4+5</b>;
-Read top-level expression:
-define double @""() {
-entry:
- ret double 9.000000e+00
-}
-</pre>
-</div>
-
-<p>Note how the parser turns the top-level expression into anonymous functions
-for us. This will be handy when we add <a href="LangImpl4.html#jit">JIT
-support</a> in the next chapter. Also note that the code is very literally
-transcribed, no optimizations are being performed except simple constant
-folding done by IRBuilder. We will
-<a href="LangImpl4.html#trivialconstfold">add optimizations</a> explicitly in
-the next chapter.</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def foo(a b) a*a + 2*a*b + b*b;</b>
-Read function definition:
-define double @foo(double %a, double %b) {
-entry:
- %multmp = fmul double %a, %a
- %multmp1 = fmul double 2.000000e+00, %a
- %multmp2 = fmul double %multmp1, %b
- %addtmp = fadd double %multmp, %multmp2
- %multmp3 = fmul double %b, %b
- %addtmp4 = fadd double %addtmp, %multmp3
- ret double %addtmp4
-}
-</pre>
-</div>
-
-<p>This shows some simple arithmetic. Notice the striking similarity to the
-LLVM builder calls that we use to create the instructions.</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def bar(a) foo(a, 4.0) + bar(31337);</b>
-Read function definition:
-define double @bar(double %a) {
-entry:
- %calltmp = call double @foo( double %a, double 4.000000e+00 )
- %calltmp1 = call double @bar( double 3.133700e+04 )
- %addtmp = fadd double %calltmp, %calltmp1
- ret double %addtmp
-}
-</pre>
-</div>
-
-<p>This shows some function calls. Note that this function will take a long
-time to execute if you call it. In the future we'll add conditional control
-flow to actually make recursion useful :).</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>extern cos(x);</b>
-Read extern:
-declare double @cos(double)
-
-ready&gt; <b>cos(1.234);</b>
-Read top-level expression:
-define double @""() {
-entry:
- %calltmp = call double @cos( double 1.234000e+00 )
- ret double %calltmp
-}
-</pre>
-</div>
-
-<p>This shows an extern for the libm "cos" function, and a call to it.</p>
-
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>^D</b>
-; ModuleID = 'my cool jit'
-
-define double @""() {
-entry:
- %addtmp = fadd double 4.000000e+00, 5.000000e+00
- ret double %addtmp
-}
-
-define double @foo(double %a, double %b) {
-entry:
- %multmp = fmul double %a, %a
- %multmp1 = fmul double 2.000000e+00, %a
- %multmp2 = fmul double %multmp1, %b
- %addtmp = fadd double %multmp, %multmp2
- %multmp3 = fmul double %b, %b
- %addtmp4 = fadd double %addtmp, %multmp3
- ret double %addtmp4
-}
-
-define double @bar(double %a) {
-entry:
- %calltmp = call double @foo( double %a, double 4.000000e+00 )
- %calltmp1 = call double @bar( double 3.133700e+04 )
- %addtmp = fadd double %calltmp, %calltmp1
- ret double %addtmp
-}
-
-declare double @cos(double)
-
-define double @""() {
-entry:
- %calltmp = call double @cos( double 1.234000e+00 )
- ret double %calltmp
-}
-</pre>
-</div>
-
-<p>When you quit the current demo, it dumps out the IR for the entire module
-generated. Here you can see the big picture with all the functions referencing
-each other.</p>
-
-<p>This wraps up the third chapter of the Kaleidoscope tutorial. Up next, we'll
-describe how to <a href="LangImpl4.html">add JIT codegen and optimizer
-support</a> to this so we can actually start running code!</p>
-
-</div>
-
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for our running example, enhanced with the
-LLVM code generator. Because this uses the LLVM libraries, we need to link
-them in. To do this, we use the <a
-href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform
-our makefile/command line about which options to use:</p>
-
-<div class="doc_code">
-<pre>
- # Compile
- g++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
- # Run
- ./toy
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<div class="doc_code">
-<pre>
-// To build this:
-// See example below.
-
-#include "llvm/DerivedTypes.h"
-#include "llvm/LLVMContext.h"
-#include "llvm/Module.h"
-#include "llvm/Analysis/Verifier.h"
-#include "llvm/Support/IRBuilder.h"
-#include &lt;cstdio&gt;
-#include &lt;string&gt;
-#include &lt;map&gt;
-#include &lt;vector&gt;
-using namespace llvm;
-
-//===----------------------------------------------------------------------===//
-// Lexer
-//===----------------------------------------------------------------------===//
-
-// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
-// of these for known things.
-enum Token {
- tok_eof = -1,
-
- // commands
- tok_def = -2, tok_extern = -3,
-
- // primary
- tok_identifier = -4, tok_number = -5
-};
-
-static std::string IdentifierStr; // Filled in if tok_identifier
-static double NumVal; // Filled in if tok_number
-
-/// gettok - Return the next token from standard input.
-static int gettok() {
- static int LastChar = ' ';
-
- // Skip any whitespace.
- while (isspace(LastChar))
- LastChar = getchar();
-
- if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
- IdentifierStr = LastChar;
- while (isalnum((LastChar = getchar())))
- IdentifierStr += LastChar;
-
- if (IdentifierStr == "def") return tok_def;
- if (IdentifierStr == "extern") return tok_extern;
- return tok_identifier;
- }
-
- if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
- std::string NumStr;
- do {
- NumStr += LastChar;
- LastChar = getchar();
- } while (isdigit(LastChar) || LastChar == '.');
-
- NumVal = strtod(NumStr.c_str(), 0);
- return tok_number;
- }
-
- if (LastChar == '#') {
- // Comment until end of line.
- do LastChar = getchar();
- while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
-
- if (LastChar != EOF)
- return gettok();
- }
-
- // Check for end of file. Don't eat the EOF.
- if (LastChar == EOF)
- return tok_eof;
-
- // Otherwise, just return the character as its ascii value.
- int ThisChar = LastChar;
- LastChar = getchar();
- return ThisChar;
-}
-
-//===----------------------------------------------------------------------===//
-// Abstract Syntax Tree (aka Parse Tree)
-//===----------------------------------------------------------------------===//
-
-/// ExprAST - Base class for all expression nodes.
-class ExprAST {
-public:
- virtual ~ExprAST() {}
- virtual Value *Codegen() = 0;
-};
-
-/// NumberExprAST - Expression class for numeric literals like "1.0".
-class NumberExprAST : public ExprAST {
- double Val;
-public:
- NumberExprAST(double val) : Val(val) {}
- virtual Value *Codegen();
-};
-
-/// VariableExprAST - Expression class for referencing a variable, like "a".
-class VariableExprAST : public ExprAST {
- std::string Name;
-public:
- VariableExprAST(const std::string &amp;name) : Name(name) {}
- virtual Value *Codegen();
-};
-
-/// BinaryExprAST - Expression class for a binary operator.
-class BinaryExprAST : public ExprAST {
- char Op;
- ExprAST *LHS, *RHS;
-public:
- BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
- : Op(op), LHS(lhs), RHS(rhs) {}
- virtual Value *Codegen();
-};
-
-/// CallExprAST - Expression class for function calls.
-class CallExprAST : public ExprAST {
- std::string Callee;
- std::vector&lt;ExprAST*&gt; Args;
-public:
- CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
- : Callee(callee), Args(args) {}
- virtual Value *Codegen();
-};
-
-/// PrototypeAST - This class represents the "prototype" for a function,
-/// which captures its name, and its argument names (thus implicitly the number
-/// of arguments the function takes).
-class PrototypeAST {
- std::string Name;
- std::vector&lt;std::string&gt; Args;
-public:
- PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
- : Name(name), Args(args) {}
-
- Function *Codegen();
-};
-
-/// FunctionAST - This class represents a function definition itself.
-class FunctionAST {
- PrototypeAST *Proto;
- ExprAST *Body;
-public:
- FunctionAST(PrototypeAST *proto, ExprAST *body)
- : Proto(proto), Body(body) {}
-
- Function *Codegen();
-};
-
-//===----------------------------------------------------------------------===//
-// Parser
-//===----------------------------------------------------------------------===//
-
-/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
-/// token the parser is looking at. getNextToken reads another token from the
-/// lexer and updates CurTok with its results.
-static int CurTok;
-static int getNextToken() {
- return CurTok = gettok();
-}
-
-/// BinopPrecedence - This holds the precedence for each binary operator that is
-/// defined.
-static std::map&lt;char, int&gt; BinopPrecedence;
-
-/// GetTokPrecedence - Get the precedence of the pending binary operator token.
-static int GetTokPrecedence() {
- if (!isascii(CurTok))
- return -1;
-
- // Make sure it's a declared binop.
- int TokPrec = BinopPrecedence[CurTok];
- if (TokPrec &lt;= 0) return -1;
- return TokPrec;
-}
-
-/// Error* - These are little helper functions for error handling.
-ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
-PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
-FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
-
-static ExprAST *ParseExpression();
-
-/// identifierexpr
-/// ::= identifier
-/// ::= identifier '(' expression* ')'
-static ExprAST *ParseIdentifierExpr() {
- std::string IdName = IdentifierStr;
-
- getNextToken(); // eat identifier.
-
- if (CurTok != '(') // Simple variable ref.
- return new VariableExprAST(IdName);
-
- // Call.
- getNextToken(); // eat (
- std::vector&lt;ExprAST*&gt; Args;
- if (CurTok != ')') {
- while (1) {
- ExprAST *Arg = ParseExpression();
- if (!Arg) return 0;
- Args.push_back(Arg);
-
- if (CurTok == ')') break;
-
- if (CurTok != ',')
- return Error("Expected ')' or ',' in argument list");
- getNextToken();
- }
- }
-
- // Eat the ')'.
- getNextToken();
-
- return new CallExprAST(IdName, Args);
-}
-
-/// numberexpr ::= number
-static ExprAST *ParseNumberExpr() {
- ExprAST *Result = new NumberExprAST(NumVal);
- getNextToken(); // consume the number
- return Result;
-}
-
-/// parenexpr ::= '(' expression ')'
-static ExprAST *ParseParenExpr() {
- getNextToken(); // eat (.
- ExprAST *V = ParseExpression();
- if (!V) return 0;
-
- if (CurTok != ')')
- return Error("expected ')'");
- getNextToken(); // eat ).
- return V;
-}
-
-/// primary
-/// ::= identifierexpr
-/// ::= numberexpr
-/// ::= parenexpr
-static ExprAST *ParsePrimary() {
- switch (CurTok) {
- default: return Error("unknown token when expecting an expression");
- case tok_identifier: return ParseIdentifierExpr();
- case tok_number: return ParseNumberExpr();
- case '(': return ParseParenExpr();
- }
-}
-
-/// binoprhs
-/// ::= ('+' primary)*
-static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
- // If this is a binop, find its precedence.
- while (1) {
- int TokPrec = GetTokPrecedence();
-
- // If this is a binop that binds at least as tightly as the current binop,
- // consume it, otherwise we are done.
- if (TokPrec &lt; ExprPrec)
- return LHS;
-
- // Okay, we know this is a binop.
- int BinOp = CurTok;
- getNextToken(); // eat binop
-
- // Parse the primary expression after the binary operator.
- ExprAST *RHS = ParsePrimary();
- if (!RHS) return 0;
-
- // If BinOp binds less tightly with RHS than the operator after RHS, let
- // the pending operator take RHS as its LHS.
- int NextPrec = GetTokPrecedence();
- if (TokPrec &lt; NextPrec) {
- RHS = ParseBinOpRHS(TokPrec+1, RHS);
- if (RHS == 0) return 0;
- }
-
- // Merge LHS/RHS.
- LHS = new BinaryExprAST(BinOp, LHS, RHS);
- }
-}
-
-/// expression
-/// ::= primary binoprhs
-///
-static ExprAST *ParseExpression() {
- ExprAST *LHS = ParsePrimary();
- if (!LHS) return 0;
-
- return ParseBinOpRHS(0, LHS);
-}
-
-/// prototype
-/// ::= id '(' id* ')'
-static PrototypeAST *ParsePrototype() {
- if (CurTok != tok_identifier)
- return ErrorP("Expected function name in prototype");
-
- std::string FnName = IdentifierStr;
- getNextToken();
-
- if (CurTok != '(')
- return ErrorP("Expected '(' in prototype");
-
- std::vector&lt;std::string&gt; ArgNames;
- while (getNextToken() == tok_identifier)
- ArgNames.push_back(IdentifierStr);
- if (CurTok != ')')
- return ErrorP("Expected ')' in prototype");
-
- // success.
- getNextToken(); // eat ')'.
-
- return new PrototypeAST(FnName, ArgNames);
-}
-
-/// definition ::= 'def' prototype expression
-static FunctionAST *ParseDefinition() {
- getNextToken(); // eat def.
- PrototypeAST *Proto = ParsePrototype();
- if (Proto == 0) return 0;
-
- if (ExprAST *E = ParseExpression())
- return new FunctionAST(Proto, E);
- return 0;
-}
-
-/// toplevelexpr ::= expression
-static FunctionAST *ParseTopLevelExpr() {
- if (ExprAST *E = ParseExpression()) {
- // Make an anonymous proto.
- PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
- return new FunctionAST(Proto, E);
- }
- return 0;
-}
-
-/// external ::= 'extern' prototype
-static PrototypeAST *ParseExtern() {
- getNextToken(); // eat extern.
- return ParsePrototype();
-}
-
-//===----------------------------------------------------------------------===//
-// Code Generation
-//===----------------------------------------------------------------------===//
-
-static Module *TheModule;
-static IRBuilder&lt;&gt; Builder(getGlobalContext());
-static std::map&lt;std::string, Value*&gt; NamedValues;
-
-Value *ErrorV(const char *Str) { Error(Str); return 0; }
-
-Value *NumberExprAST::Codegen() {
- return ConstantFP::get(getGlobalContext(), APFloat(Val));
-}
-
-Value *VariableExprAST::Codegen() {
- // Look this variable up in the function.
- Value *V = NamedValues[Name];
- return V ? V : ErrorV("Unknown variable name");
-}
-
-Value *BinaryExprAST::Codegen() {
- Value *L = LHS-&gt;Codegen();
- Value *R = RHS-&gt;Codegen();
- if (L == 0 || R == 0) return 0;
-
- switch (Op) {
- case '+': return Builder.CreateAdd(L, R, "addtmp");
- case '-': return Builder.CreateSub(L, R, "subtmp");
- case '*': return Builder.CreateMul(L, R, "multmp");
- case '&lt;':
- L = Builder.CreateFCmpULT(L, R, "cmptmp");
- // Convert bool 0/1 to double 0.0 or 1.0
- return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
- "booltmp");
- default: return ErrorV("invalid binary operator");
- }
-}
-
-Value *CallExprAST::Codegen() {
- // Look up the name in the global module table.
- Function *CalleeF = TheModule-&gt;getFunction(Callee);
- if (CalleeF == 0)
- return ErrorV("Unknown function referenced");
-
- // If argument mismatch error.
- if (CalleeF-&gt;arg_size() != Args.size())
- return ErrorV("Incorrect # arguments passed");
-
- std::vector&lt;Value*&gt; ArgsV;
- for (unsigned i = 0, e = Args.size(); i != e; ++i) {
- ArgsV.push_back(Args[i]-&gt;Codegen());
- if (ArgsV.back() == 0) return 0;
- }
-
- return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
-}
-
-Function *PrototypeAST::Codegen() {
- // Make the function type: double(double,double) etc.
- std::vector&lt;const Type*&gt; Doubles(Args.size(),
- Type::getDoubleTy(getGlobalContext()));
- FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
- Doubles, false);
-
- Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
-
- // If F conflicted, there was already something named 'Name'. If it has a
- // body, don't allow redefinition or reextern.
- if (F-&gt;getName() != Name) {
- // Delete the one we just made and get the existing one.
- F-&gt;eraseFromParent();
- F = TheModule-&gt;getFunction(Name);
-
- // If F already has a body, reject this.
- if (!F-&gt;empty()) {
- ErrorF("redefinition of function");
- return 0;
- }
-
- // If F took a different number of args, reject.
- if (F-&gt;arg_size() != Args.size()) {
- ErrorF("redefinition of function with different # args");
- return 0;
- }
- }
-
- // Set names for all arguments.
- unsigned Idx = 0;
- for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
- ++AI, ++Idx) {
- AI-&gt;setName(Args[Idx]);
-
- // Add arguments to variable symbol table.
- NamedValues[Args[Idx]] = AI;
- }
-
- return F;
-}
-
-Function *FunctionAST::Codegen() {
- NamedValues.clear();
-
- Function *TheFunction = Proto-&gt;Codegen();
- if (TheFunction == 0)
- return 0;
-
- // Create a new basic block to start insertion into.
- BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
- Builder.SetInsertPoint(BB);
-
- if (Value *RetVal = Body-&gt;Codegen()) {
- // Finish off the function.
- Builder.CreateRet(RetVal);
-
- // Validate the generated code, checking for consistency.
- verifyFunction(*TheFunction);
-
- return TheFunction;
- }
-
- // Error reading body, remove function.
- TheFunction-&gt;eraseFromParent();
- return 0;
-}
-
-//===----------------------------------------------------------------------===//
-// Top-Level parsing and JIT Driver
-//===----------------------------------------------------------------------===//
-
-static void HandleDefinition() {
- if (FunctionAST *F = ParseDefinition()) {
- if (Function *LF = F-&gt;Codegen()) {
- fprintf(stderr, "Read function definition:");
- LF-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleExtern() {
- if (PrototypeAST *P = ParseExtern()) {
- if (Function *F = P-&gt;Codegen()) {
- fprintf(stderr, "Read extern: ");
- F-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleTopLevelExpression() {
- // Evaluate a top-level expression into an anonymous function.
- if (FunctionAST *F = ParseTopLevelExpr()) {
- if (Function *LF = F-&gt;Codegen()) {
- fprintf(stderr, "Read top-level expression:");
- LF-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-/// top ::= definition | external | expression | ';'
-static void MainLoop() {
- while (1) {
- fprintf(stderr, "ready&gt; ");
- switch (CurTok) {
- case tok_eof: return;
- case ';': getNextToken(); break; // ignore top-level semicolons.
- case tok_def: HandleDefinition(); break;
- case tok_extern: HandleExtern(); break;
- default: HandleTopLevelExpression(); break;
- }
- }
-}
-
-//===----------------------------------------------------------------------===//
-// "Library" functions that can be "extern'd" from user code.
-//===----------------------------------------------------------------------===//
-
-/// putchard - putchar that takes a double and returns 0.
-extern "C"
-double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-
-//===----------------------------------------------------------------------===//
-// Main driver code.
-//===----------------------------------------------------------------------===//
-
-int main() {
- LLVMContext &amp;Context = getGlobalContext();
-
- // Install standard binary operators.
- // 1 is lowest precedence.
- BinopPrecedence['&lt;'] = 10;
- BinopPrecedence['+'] = 20;
- BinopPrecedence['-'] = 20;
- BinopPrecedence['*'] = 40; // highest.
-
- // Prime the first token.
- fprintf(stderr, "ready&gt; ");
- getNextToken();
-
- // Make the module, which holds all the code.
- TheModule = new Module("my cool jit", Context);
-
- // Run the main "interpreter loop" now.
- MainLoop();
-
- // Print out all of the generated code.
- TheModule-&gt;dump();
-
- return 0;
-}
-</pre>
-</div>
-<a href="LangImpl4.html">Next: Adding JIT and Optimizer Support</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/LangImpl4.html b/docs/tutorial/LangImpl4.html
deleted file mode 100644
index 230e6e5..0000000
--- a/docs/tutorial/LangImpl4.html
+++ /dev/null
@@ -1,1132 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Adding JIT and Optimizer Support</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Adding JIT and Optimizer Support</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 4
- <ol>
- <li><a href="#intro">Chapter 4 Introduction</a></li>
- <li><a href="#trivialconstfold">Trivial Constant Folding</a></li>
- <li><a href="#optimizerpasses">LLVM Optimization Passes</a></li>
- <li><a href="#jit">Adding a JIT Compiler</a></li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="LangImpl5.html">Chapter 5</a>: Extending the Language: Control
-Flow</li>
-</ul>
-
-<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 4 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 4 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. Chapters 1-3 described the implementation of a simple
-language and added support for generating LLVM IR. This chapter describes
-two new techniques: adding optimizer support to your language, and adding JIT
-compiler support. These additions will demonstrate how to get nice, efficient code
-for the Kaleidoscope language.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="trivialconstfold">Trivial Constant
-Folding</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Our demonstration for Chapter 3 is elegant and easy to extend. Unfortunately,
-it does not produce wonderful code. The IRBuilder, however, does give us
-obvious optimizations when compiling simple code:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def test(x) 1+2+x;</b>
-Read function definition:
-define double @test(double %x) {
-entry:
- %addtmp = fadd double 3.000000e+00, %x
- ret double %addtmp
-}
-</pre>
-</div>
-
-<p>This code is not a literal transcription of the AST built by parsing the
-input. That would be:
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def test(x) 1+2+x;</b>
-Read function definition:
-define double @test(double %x) {
-entry:
- %addtmp = fadd double 2.000000e+00, 1.000000e+00
- %addtmp1 = fadd double %addtmp, %x
- ret double %addtmp1
-}
-</pre>
-</div>
-
-<p>Constant folding, as seen above, in particular, is a very common and very
-important optimization: so much so that many language implementors implement
-constant folding support in their AST representation.</p>
-
-<p>With LLVM, you don't need this support in the AST. Since all calls to build
-LLVM IR go through the LLVM IR builder, the builder itself checked to see if
-there was a constant folding opportunity when you call it. If so, it just does
-the constant fold and return the constant instead of creating an instruction.
-
-<p>Well, that was easy :). In practice, we recommend always using
-<tt>IRBuilder</tt> when generating code like this. It has no
-"syntactic overhead" for its use (you don't have to uglify your compiler with
-constant checks everywhere) and it can dramatically reduce the amount of
-LLVM IR that is generated in some cases (particular for languages with a macro
-preprocessor or that use a lot of constants).</p>
-
-<p>On the other hand, the <tt>IRBuilder</tt> is limited by the fact
-that it does all of its analysis inline with the code as it is built. If you
-take a slightly more complex example:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def test(x) (1+2+x)*(x+(1+2));</b>
-ready> Read function definition:
-define double @test(double %x) {
-entry:
- %addtmp = fadd double 3.000000e+00, %x
- %addtmp1 = fadd double %x, 3.000000e+00
- %multmp = fmul double %addtmp, %addtmp1
- ret double %multmp
-}
-</pre>
-</div>
-
-<p>In this case, the LHS and RHS of the multiplication are the same value. We'd
-really like to see this generate "<tt>tmp = x+3; result = tmp*tmp;</tt>" instead
-of computing "<tt>x+3</tt>" twice.</p>
-
-<p>Unfortunately, no amount of local analysis will be able to detect and correct
-this. This requires two transformations: reassociation of expressions (to
-make the add's lexically identical) and Common Subexpression Elimination (CSE)
-to delete the redundant add instruction. Fortunately, LLVM provides a broad
-range of optimizations that you can use, in the form of "passes".</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="optimizerpasses">LLVM Optimization
- Passes</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>LLVM provides many optimization passes, which do many different sorts of
-things and have different tradeoffs. Unlike other systems, LLVM doesn't hold
-to the mistaken notion that one set of optimizations is right for all languages
-and for all situations. LLVM allows a compiler implementor to make complete
-decisions about what optimizations to use, in which order, and in what
-situation.</p>
-
-<p>As a concrete example, LLVM supports both "whole module" passes, which look
-across as large of body of code as they can (often a whole file, but if run
-at link time, this can be a substantial portion of the whole program). It also
-supports and includes "per-function" passes which just operate on a single
-function at a time, without looking at other functions. For more information
-on passes and how they are run, see the <a href="../WritingAnLLVMPass.html">How
-to Write a Pass</a> document and the <a href="../Passes.html">List of LLVM
-Passes</a>.</p>
-
-<p>For Kaleidoscope, we are currently generating functions on the fly, one at
-a time, as the user types them in. We aren't shooting for the ultimate
-optimization experience in this setting, but we also want to catch the easy and
-quick stuff where possible. As such, we will choose to run a few per-function
-optimizations as the user types the function in. If we wanted to make a "static
-Kaleidoscope compiler", we would use exactly the code we have now, except that
-we would defer running the optimizer until the entire file has been parsed.</p>
-
-<p>In order to get per-function optimizations going, we need to set up a
-<a href="../WritingAnLLVMPass.html#passmanager">FunctionPassManager</a> to hold and
-organize the LLVM optimizations that we want to run. Once we have that, we can
-add a set of optimizations to run. The code looks like this:</p>
-
-<div class="doc_code">
-<pre>
- FunctionPassManager OurFPM(TheModule);
-
- // Set up the optimizer pipeline. Start with registering info about how the
- // target lays out data structures.
- OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
- // Do simple "peephole" optimizations and bit-twiddling optzns.
- OurFPM.add(createInstructionCombiningPass());
- // Reassociate expressions.
- OurFPM.add(createReassociatePass());
- // Eliminate Common SubExpressions.
- OurFPM.add(createGVNPass());
- // Simplify the control flow graph (deleting unreachable blocks, etc).
- OurFPM.add(createCFGSimplificationPass());
-
- OurFPM.doInitialization();
-
- // Set the global so the code gen can use this.
- TheFPM = &amp;OurFPM;
-
- // Run the main "interpreter loop" now.
- MainLoop();
-</pre>
-</div>
-
-<p>This code defines a <tt>FunctionPassManager</tt>, "<tt>OurFPM</tt>". It
-requires a pointer to the <tt>Module</tt> to construct itself. Once it is set
-up, we use a series of "add" calls to add a bunch of LLVM passes. The first
-pass is basically boilerplate, it adds a pass so that later optimizations know
-how the data structures in the program are laid out. The
-"<tt>TheExecutionEngine</tt>" variable is related to the JIT, which we will get
-to in the next section.</p>
-
-<p>In this case, we choose to add 4 optimization passes. The passes we chose
-here are a pretty standard set of "cleanup" optimizations that are useful for
-a wide variety of code. I won't delve into what they do but, believe me,
-they are a good starting place :).</p>
-
-<p>Once the PassManager is set up, we need to make use of it. We do this by
-running it after our newly created function is constructed (in
-<tt>FunctionAST::Codegen</tt>), but before it is returned to the client:</p>
-
-<div class="doc_code">
-<pre>
- if (Value *RetVal = Body->Codegen()) {
- // Finish off the function.
- Builder.CreateRet(RetVal);
-
- // Validate the generated code, checking for consistency.
- verifyFunction(*TheFunction);
-
- <b>// Optimize the function.
- TheFPM-&gt;run(*TheFunction);</b>
-
- return TheFunction;
- }
-</pre>
-</div>
-
-<p>As you can see, this is pretty straightforward. The
-<tt>FunctionPassManager</tt> optimizes and updates the LLVM Function* in place,
-improving (hopefully) its body. With this in place, we can try our test above
-again:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def test(x) (1+2+x)*(x+(1+2));</b>
-ready> Read function definition:
-define double @test(double %x) {
-entry:
- %addtmp = fadd double %x, 3.000000e+00
- %multmp = fmul double %addtmp, %addtmp
- ret double %multmp
-}
-</pre>
-</div>
-
-<p>As expected, we now get our nicely optimized code, saving a floating point
-add instruction from every execution of this function.</p>
-
-<p>LLVM provides a wide variety of optimizations that can be used in certain
-circumstances. Some <a href="../Passes.html">documentation about the various
-passes</a> is available, but it isn't very complete. Another good source of
-ideas can come from looking at the passes that <tt>llvm-gcc</tt> or
-<tt>llvm-ld</tt> run to get started. The "<tt>opt</tt>" tool allows you to
-experiment with passes from the command line, so you can see if they do
-anything.</p>
-
-<p>Now that we have reasonable code coming out of our front-end, lets talk about
-executing it!</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="jit">Adding a JIT Compiler</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Code that is available in LLVM IR can have a wide variety of tools
-applied to it. For example, you can run optimizations on it (as we did above),
-you can dump it out in textual or binary forms, you can compile the code to an
-assembly file (.s) for some target, or you can JIT compile it. The nice thing
-about the LLVM IR representation is that it is the "common currency" between
-many different parts of the compiler.
-</p>
-
-<p>In this section, we'll add JIT compiler support to our interpreter. The
-basic idea that we want for Kaleidoscope is to have the user enter function
-bodies as they do now, but immediately evaluate the top-level expressions they
-type in. For example, if they type in "1 + 2;", we should evaluate and print
-out 3. If they define a function, they should be able to call it from the
-command line.</p>
-
-<p>In order to do this, we first declare and initialize the JIT. This is done
-by adding a global variable and a call in <tt>main</tt>:</p>
-
-<div class="doc_code">
-<pre>
-<b>static ExecutionEngine *TheExecutionEngine;</b>
-...
-int main() {
- ..
- <b>// Create the JIT. This takes ownership of the module.
- TheExecutionEngine = EngineBuilder(TheModule).create();</b>
- ..
-}
-</pre>
-</div>
-
-<p>This creates an abstract "Execution Engine" which can be either a JIT
-compiler or the LLVM interpreter. LLVM will automatically pick a JIT compiler
-for you if one is available for your platform, otherwise it will fall back to
-the interpreter.</p>
-
-<p>Once the <tt>ExecutionEngine</tt> is created, the JIT is ready to be used.
-There are a variety of APIs that are useful, but the simplest one is the
-"<tt>getPointerToFunction(F)</tt>" method. This method JIT compiles the
-specified LLVM Function and returns a function pointer to the generated machine
-code. In our case, this means that we can change the code that parses a
-top-level expression to look like this:</p>
-
-<div class="doc_code">
-<pre>
-static void HandleTopLevelExpression() {
- // Evaluate a top-level expression into an anonymous function.
- if (FunctionAST *F = ParseTopLevelExpr()) {
- if (Function *LF = F-&gt;Codegen()) {
- LF->dump(); // Dump the function for exposition purposes.
-
- <b>// JIT the function, returning a function pointer.
- void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
-
- // Cast it to the right type (takes no arguments, returns a double) so we
- // can call it as a native function.
- double (*FP)() = (double (*)())(intptr_t)FPtr;
- fprintf(stderr, "Evaluated to %f\n", FP());</b>
- }
-</pre>
-</div>
-
-<p>Recall that we compile top-level expressions into a self-contained LLVM
-function that takes no arguments and returns the computed double. Because the
-LLVM JIT compiler matches the native platform ABI, this means that you can just
-cast the result pointer to a function pointer of that type and call it directly.
-This means, there is no difference between JIT compiled code and native machine
-code that is statically linked into your application.</p>
-
-<p>With just these two changes, lets see how Kaleidoscope works now!</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>4+5;</b>
-define double @""() {
-entry:
- ret double 9.000000e+00
-}
-
-<em>Evaluated to 9.000000</em>
-</pre>
-</div>
-
-<p>Well this looks like it is basically working. The dump of the function
-shows the "no argument function that always returns double" that we synthesize
-for each top-level expression that is typed in. This demonstrates very basic
-functionality, but can we do more?</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def testfunc(x y) x + y*2; </b>
-Read function definition:
-define double @testfunc(double %x, double %y) {
-entry:
- %multmp = fmul double %y, 2.000000e+00
- %addtmp = fadd double %multmp, %x
- ret double %addtmp
-}
-
-ready&gt; <b>testfunc(4, 10);</b>
-define double @""() {
-entry:
- %calltmp = call double @testfunc( double 4.000000e+00, double 1.000000e+01 )
- ret double %calltmp
-}
-
-<em>Evaluated to 24.000000</em>
-</pre>
-</div>
-
-<p>This illustrates that we can now call user code, but there is something a bit
-subtle going on here. Note that we only invoke the JIT on the anonymous
-functions that <em>call testfunc</em>, but we never invoked it
-on <em>testfunc</em> itself. What actually happened here is that the JIT
-scanned for all non-JIT'd functions transitively called from the anonymous
-function and compiled all of them before returning
-from <tt>getPointerToFunction()</tt>.</p>
-
-<p>The JIT provides a number of other more advanced interfaces for things like
-freeing allocated machine code, rejit'ing functions to update them, etc.
-However, even with this simple code, we get some surprisingly powerful
-capabilities - check this out (I removed the dump of the anonymous functions,
-you should get the idea by now :) :</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>extern sin(x);</b>
-Read extern:
-declare double @sin(double)
-
-ready&gt; <b>extern cos(x);</b>
-Read extern:
-declare double @cos(double)
-
-ready&gt; <b>sin(1.0);</b>
-<em>Evaluated to 0.841471</em>
-
-ready&gt; <b>def foo(x) sin(x)*sin(x) + cos(x)*cos(x);</b>
-Read function definition:
-define double @foo(double %x) {
-entry:
- %calltmp = call double @sin( double %x )
- %multmp = fmul double %calltmp, %calltmp
- %calltmp2 = call double @cos( double %x )
- %multmp4 = fmul double %calltmp2, %calltmp2
- %addtmp = fadd double %multmp, %multmp4
- ret double %addtmp
-}
-
-ready&gt; <b>foo(4.0);</b>
-<em>Evaluated to 1.000000</em>
-</pre>
-</div>
-
-<p>Whoa, how does the JIT know about sin and cos? The answer is surprisingly
-simple: in this
-example, the JIT started execution of a function and got to a function call. It
-realized that the function was not yet JIT compiled and invoked the standard set
-of routines to resolve the function. In this case, there is no body defined
-for the function, so the JIT ended up calling "<tt>dlsym("sin")</tt>" on the
-Kaleidoscope process itself.
-Since "<tt>sin</tt>" is defined within the JIT's address space, it simply
-patches up calls in the module to call the libm version of <tt>sin</tt>
-directly.</p>
-
-<p>The LLVM JIT provides a number of interfaces (look in the
-<tt>ExecutionEngine.h</tt> file) for controlling how unknown functions get
-resolved. It allows you to establish explicit mappings between IR objects and
-addresses (useful for LLVM global variables that you want to map to static
-tables, for example), allows you to dynamically decide on the fly based on the
-function name, and even allows you to have the JIT compile functions lazily the
-first time they're called.</p>
-
-<p>One interesting application of this is that we can now extend the language
-by writing arbitrary C++ code to implement operations. For example, if we add:
-</p>
-
-<div class="doc_code">
-<pre>
-/// putchard - putchar that takes a double and returns 0.
-extern "C"
-double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-</pre>
-</div>
-
-<p>Now we can produce simple output to the console by using things like:
-"<tt>extern putchard(x); putchard(120);</tt>", which prints a lowercase 'x' on
-the console (120 is the ASCII code for 'x'). Similar code could be used to
-implement file I/O, console input, and many other capabilities in
-Kaleidoscope.</p>
-
-<p>This completes the JIT and optimizer chapter of the Kaleidoscope tutorial. At
-this point, we can compile a non-Turing-complete programming language, optimize
-and JIT compile it in a user-driven way. Next up we'll look into <a
-href="LangImpl5.html">extending the language with control flow constructs</a>,
-tackling some interesting LLVM IR issues along the way.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for our running example, enhanced with the
-LLVM JIT and optimizer. To build this example, use:
-</p>
-
-<div class="doc_code">
-<pre>
- # Compile
- g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
- # Run
- ./toy
-</pre>
-</div>
-
-<p>
-If you are compiling this on Linux, make sure to add the "-rdynamic" option
-as well. This makes sure that the external functions are resolved properly
-at runtime.</p>
-
-<p>Here is the code:</p>
-
-<div class="doc_code">
-<pre>
-#include "llvm/DerivedTypes.h"
-#include "llvm/ExecutionEngine/ExecutionEngine.h"
-#include "llvm/ExecutionEngine/JIT.h"
-#include "llvm/LLVMContext.h"
-#include "llvm/Module.h"
-#include "llvm/PassManager.h"
-#include "llvm/Analysis/Verifier.h"
-#include "llvm/Target/TargetData.h"
-#include "llvm/Target/TargetSelect.h"
-#include "llvm/Transforms/Scalar.h"
-#include "llvm/Support/IRBuilder.h"
-#include &lt;cstdio&gt;
-#include &lt;string&gt;
-#include &lt;map&gt;
-#include &lt;vector&gt;
-using namespace llvm;
-
-//===----------------------------------------------------------------------===//
-// Lexer
-//===----------------------------------------------------------------------===//
-
-// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
-// of these for known things.
-enum Token {
- tok_eof = -1,
-
- // commands
- tok_def = -2, tok_extern = -3,
-
- // primary
- tok_identifier = -4, tok_number = -5
-};
-
-static std::string IdentifierStr; // Filled in if tok_identifier
-static double NumVal; // Filled in if tok_number
-
-/// gettok - Return the next token from standard input.
-static int gettok() {
- static int LastChar = ' ';
-
- // Skip any whitespace.
- while (isspace(LastChar))
- LastChar = getchar();
-
- if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
- IdentifierStr = LastChar;
- while (isalnum((LastChar = getchar())))
- IdentifierStr += LastChar;
-
- if (IdentifierStr == "def") return tok_def;
- if (IdentifierStr == "extern") return tok_extern;
- return tok_identifier;
- }
-
- if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
- std::string NumStr;
- do {
- NumStr += LastChar;
- LastChar = getchar();
- } while (isdigit(LastChar) || LastChar == '.');
-
- NumVal = strtod(NumStr.c_str(), 0);
- return tok_number;
- }
-
- if (LastChar == '#') {
- // Comment until end of line.
- do LastChar = getchar();
- while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
-
- if (LastChar != EOF)
- return gettok();
- }
-
- // Check for end of file. Don't eat the EOF.
- if (LastChar == EOF)
- return tok_eof;
-
- // Otherwise, just return the character as its ascii value.
- int ThisChar = LastChar;
- LastChar = getchar();
- return ThisChar;
-}
-
-//===----------------------------------------------------------------------===//
-// Abstract Syntax Tree (aka Parse Tree)
-//===----------------------------------------------------------------------===//
-
-/// ExprAST - Base class for all expression nodes.
-class ExprAST {
-public:
- virtual ~ExprAST() {}
- virtual Value *Codegen() = 0;
-};
-
-/// NumberExprAST - Expression class for numeric literals like "1.0".
-class NumberExprAST : public ExprAST {
- double Val;
-public:
- NumberExprAST(double val) : Val(val) {}
- virtual Value *Codegen();
-};
-
-/// VariableExprAST - Expression class for referencing a variable, like "a".
-class VariableExprAST : public ExprAST {
- std::string Name;
-public:
- VariableExprAST(const std::string &amp;name) : Name(name) {}
- virtual Value *Codegen();
-};
-
-/// BinaryExprAST - Expression class for a binary operator.
-class BinaryExprAST : public ExprAST {
- char Op;
- ExprAST *LHS, *RHS;
-public:
- BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
- : Op(op), LHS(lhs), RHS(rhs) {}
- virtual Value *Codegen();
-};
-
-/// CallExprAST - Expression class for function calls.
-class CallExprAST : public ExprAST {
- std::string Callee;
- std::vector&lt;ExprAST*&gt; Args;
-public:
- CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
- : Callee(callee), Args(args) {}
- virtual Value *Codegen();
-};
-
-/// PrototypeAST - This class represents the "prototype" for a function,
-/// which captures its name, and its argument names (thus implicitly the number
-/// of arguments the function takes).
-class PrototypeAST {
- std::string Name;
- std::vector&lt;std::string&gt; Args;
-public:
- PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
- : Name(name), Args(args) {}
-
- Function *Codegen();
-};
-
-/// FunctionAST - This class represents a function definition itself.
-class FunctionAST {
- PrototypeAST *Proto;
- ExprAST *Body;
-public:
- FunctionAST(PrototypeAST *proto, ExprAST *body)
- : Proto(proto), Body(body) {}
-
- Function *Codegen();
-};
-
-//===----------------------------------------------------------------------===//
-// Parser
-//===----------------------------------------------------------------------===//
-
-/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
-/// token the parser is looking at. getNextToken reads another token from the
-/// lexer and updates CurTok with its results.
-static int CurTok;
-static int getNextToken() {
- return CurTok = gettok();
-}
-
-/// BinopPrecedence - This holds the precedence for each binary operator that is
-/// defined.
-static std::map&lt;char, int&gt; BinopPrecedence;
-
-/// GetTokPrecedence - Get the precedence of the pending binary operator token.
-static int GetTokPrecedence() {
- if (!isascii(CurTok))
- return -1;
-
- // Make sure it's a declared binop.
- int TokPrec = BinopPrecedence[CurTok];
- if (TokPrec &lt;= 0) return -1;
- return TokPrec;
-}
-
-/// Error* - These are little helper functions for error handling.
-ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
-PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
-FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
-
-static ExprAST *ParseExpression();
-
-/// identifierexpr
-/// ::= identifier
-/// ::= identifier '(' expression* ')'
-static ExprAST *ParseIdentifierExpr() {
- std::string IdName = IdentifierStr;
-
- getNextToken(); // eat identifier.
-
- if (CurTok != '(') // Simple variable ref.
- return new VariableExprAST(IdName);
-
- // Call.
- getNextToken(); // eat (
- std::vector&lt;ExprAST*&gt; Args;
- if (CurTok != ')') {
- while (1) {
- ExprAST *Arg = ParseExpression();
- if (!Arg) return 0;
- Args.push_back(Arg);
-
- if (CurTok == ')') break;
-
- if (CurTok != ',')
- return Error("Expected ')' or ',' in argument list");
- getNextToken();
- }
- }
-
- // Eat the ')'.
- getNextToken();
-
- return new CallExprAST(IdName, Args);
-}
-
-/// numberexpr ::= number
-static ExprAST *ParseNumberExpr() {
- ExprAST *Result = new NumberExprAST(NumVal);
- getNextToken(); // consume the number
- return Result;
-}
-
-/// parenexpr ::= '(' expression ')'
-static ExprAST *ParseParenExpr() {
- getNextToken(); // eat (.
- ExprAST *V = ParseExpression();
- if (!V) return 0;
-
- if (CurTok != ')')
- return Error("expected ')'");
- getNextToken(); // eat ).
- return V;
-}
-
-/// primary
-/// ::= identifierexpr
-/// ::= numberexpr
-/// ::= parenexpr
-static ExprAST *ParsePrimary() {
- switch (CurTok) {
- default: return Error("unknown token when expecting an expression");
- case tok_identifier: return ParseIdentifierExpr();
- case tok_number: return ParseNumberExpr();
- case '(': return ParseParenExpr();
- }
-}
-
-/// binoprhs
-/// ::= ('+' primary)*
-static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
- // If this is a binop, find its precedence.
- while (1) {
- int TokPrec = GetTokPrecedence();
-
- // If this is a binop that binds at least as tightly as the current binop,
- // consume it, otherwise we are done.
- if (TokPrec &lt; ExprPrec)
- return LHS;
-
- // Okay, we know this is a binop.
- int BinOp = CurTok;
- getNextToken(); // eat binop
-
- // Parse the primary expression after the binary operator.
- ExprAST *RHS = ParsePrimary();
- if (!RHS) return 0;
-
- // If BinOp binds less tightly with RHS than the operator after RHS, let
- // the pending operator take RHS as its LHS.
- int NextPrec = GetTokPrecedence();
- if (TokPrec &lt; NextPrec) {
- RHS = ParseBinOpRHS(TokPrec+1, RHS);
- if (RHS == 0) return 0;
- }
-
- // Merge LHS/RHS.
- LHS = new BinaryExprAST(BinOp, LHS, RHS);
- }
-}
-
-/// expression
-/// ::= primary binoprhs
-///
-static ExprAST *ParseExpression() {
- ExprAST *LHS = ParsePrimary();
- if (!LHS) return 0;
-
- return ParseBinOpRHS(0, LHS);
-}
-
-/// prototype
-/// ::= id '(' id* ')'
-static PrototypeAST *ParsePrototype() {
- if (CurTok != tok_identifier)
- return ErrorP("Expected function name in prototype");
-
- std::string FnName = IdentifierStr;
- getNextToken();
-
- if (CurTok != '(')
- return ErrorP("Expected '(' in prototype");
-
- std::vector&lt;std::string&gt; ArgNames;
- while (getNextToken() == tok_identifier)
- ArgNames.push_back(IdentifierStr);
- if (CurTok != ')')
- return ErrorP("Expected ')' in prototype");
-
- // success.
- getNextToken(); // eat ')'.
-
- return new PrototypeAST(FnName, ArgNames);
-}
-
-/// definition ::= 'def' prototype expression
-static FunctionAST *ParseDefinition() {
- getNextToken(); // eat def.
- PrototypeAST *Proto = ParsePrototype();
- if (Proto == 0) return 0;
-
- if (ExprAST *E = ParseExpression())
- return new FunctionAST(Proto, E);
- return 0;
-}
-
-/// toplevelexpr ::= expression
-static FunctionAST *ParseTopLevelExpr() {
- if (ExprAST *E = ParseExpression()) {
- // Make an anonymous proto.
- PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
- return new FunctionAST(Proto, E);
- }
- return 0;
-}
-
-/// external ::= 'extern' prototype
-static PrototypeAST *ParseExtern() {
- getNextToken(); // eat extern.
- return ParsePrototype();
-}
-
-//===----------------------------------------------------------------------===//
-// Code Generation
-//===----------------------------------------------------------------------===//
-
-static Module *TheModule;
-static IRBuilder&lt;&gt; Builder(getGlobalContext());
-static std::map&lt;std::string, Value*&gt; NamedValues;
-static FunctionPassManager *TheFPM;
-
-Value *ErrorV(const char *Str) { Error(Str); return 0; }
-
-Value *NumberExprAST::Codegen() {
- return ConstantFP::get(getGlobalContext(), APFloat(Val));
-}
-
-Value *VariableExprAST::Codegen() {
- // Look this variable up in the function.
- Value *V = NamedValues[Name];
- return V ? V : ErrorV("Unknown variable name");
-}
-
-Value *BinaryExprAST::Codegen() {
- Value *L = LHS-&gt;Codegen();
- Value *R = RHS-&gt;Codegen();
- if (L == 0 || R == 0) return 0;
-
- switch (Op) {
- case '+': return Builder.CreateAdd(L, R, "addtmp");
- case '-': return Builder.CreateSub(L, R, "subtmp");
- case '*': return Builder.CreateMul(L, R, "multmp");
- case '&lt;':
- L = Builder.CreateFCmpULT(L, R, "cmptmp");
- // Convert bool 0/1 to double 0.0 or 1.0
- return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
- "booltmp");
- default: return ErrorV("invalid binary operator");
- }
-}
-
-Value *CallExprAST::Codegen() {
- // Look up the name in the global module table.
- Function *CalleeF = TheModule-&gt;getFunction(Callee);
- if (CalleeF == 0)
- return ErrorV("Unknown function referenced");
-
- // If argument mismatch error.
- if (CalleeF-&gt;arg_size() != Args.size())
- return ErrorV("Incorrect # arguments passed");
-
- std::vector&lt;Value*&gt; ArgsV;
- for (unsigned i = 0, e = Args.size(); i != e; ++i) {
- ArgsV.push_back(Args[i]-&gt;Codegen());
- if (ArgsV.back() == 0) return 0;
- }
-
- return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
-}
-
-Function *PrototypeAST::Codegen() {
- // Make the function type: double(double,double) etc.
- std::vector&lt;const Type*&gt; Doubles(Args.size(),
- Type::getDoubleTy(getGlobalContext()));
- FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
- Doubles, false);
-
- Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
-
- // If F conflicted, there was already something named 'Name'. If it has a
- // body, don't allow redefinition or reextern.
- if (F-&gt;getName() != Name) {
- // Delete the one we just made and get the existing one.
- F-&gt;eraseFromParent();
- F = TheModule-&gt;getFunction(Name);
-
- // If F already has a body, reject this.
- if (!F-&gt;empty()) {
- ErrorF("redefinition of function");
- return 0;
- }
-
- // If F took a different number of args, reject.
- if (F-&gt;arg_size() != Args.size()) {
- ErrorF("redefinition of function with different # args");
- return 0;
- }
- }
-
- // Set names for all arguments.
- unsigned Idx = 0;
- for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
- ++AI, ++Idx) {
- AI-&gt;setName(Args[Idx]);
-
- // Add arguments to variable symbol table.
- NamedValues[Args[Idx]] = AI;
- }
-
- return F;
-}
-
-Function *FunctionAST::Codegen() {
- NamedValues.clear();
-
- Function *TheFunction = Proto-&gt;Codegen();
- if (TheFunction == 0)
- return 0;
-
- // Create a new basic block to start insertion into.
- BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
- Builder.SetInsertPoint(BB);
-
- if (Value *RetVal = Body-&gt;Codegen()) {
- // Finish off the function.
- Builder.CreateRet(RetVal);
-
- // Validate the generated code, checking for consistency.
- verifyFunction(*TheFunction);
-
- // Optimize the function.
- TheFPM-&gt;run(*TheFunction);
-
- return TheFunction;
- }
-
- // Error reading body, remove function.
- TheFunction-&gt;eraseFromParent();
- return 0;
-}
-
-//===----------------------------------------------------------------------===//
-// Top-Level parsing and JIT Driver
-//===----------------------------------------------------------------------===//
-
-static ExecutionEngine *TheExecutionEngine;
-
-static void HandleDefinition() {
- if (FunctionAST *F = ParseDefinition()) {
- if (Function *LF = F-&gt;Codegen()) {
- fprintf(stderr, "Read function definition:");
- LF-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleExtern() {
- if (PrototypeAST *P = ParseExtern()) {
- if (Function *F = P-&gt;Codegen()) {
- fprintf(stderr, "Read extern: ");
- F-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleTopLevelExpression() {
- // Evaluate a top-level expression into an anonymous function.
- if (FunctionAST *F = ParseTopLevelExpr()) {
- if (Function *LF = F-&gt;Codegen()) {
- // JIT the function, returning a function pointer.
- void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
-
- // Cast it to the right type (takes no arguments, returns a double) so we
- // can call it as a native function.
- double (*FP)() = (double (*)())(intptr_t)FPtr;
- fprintf(stderr, "Evaluated to %f\n", FP());
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-/// top ::= definition | external | expression | ';'
-static void MainLoop() {
- while (1) {
- fprintf(stderr, "ready&gt; ");
- switch (CurTok) {
- case tok_eof: return;
- case ';': getNextToken(); break; // ignore top-level semicolons.
- case tok_def: HandleDefinition(); break;
- case tok_extern: HandleExtern(); break;
- default: HandleTopLevelExpression(); break;
- }
- }
-}
-
-//===----------------------------------------------------------------------===//
-// "Library" functions that can be "extern'd" from user code.
-//===----------------------------------------------------------------------===//
-
-/// putchard - putchar that takes a double and returns 0.
-extern "C"
-double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-
-//===----------------------------------------------------------------------===//
-// Main driver code.
-//===----------------------------------------------------------------------===//
-
-int main() {
- InitializeNativeTarget();
- LLVMContext &amp;Context = getGlobalContext();
-
- // Install standard binary operators.
- // 1 is lowest precedence.
- BinopPrecedence['&lt;'] = 10;
- BinopPrecedence['+'] = 20;
- BinopPrecedence['-'] = 20;
- BinopPrecedence['*'] = 40; // highest.
-
- // Prime the first token.
- fprintf(stderr, "ready&gt; ");
- getNextToken();
-
- // Make the module, which holds all the code.
- TheModule = new Module("my cool jit", Context);
-
- // Create the JIT. This takes ownership of the module.
- std::string ErrStr;
- TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create();
- if (!TheExecutionEngine) {
- fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
- exit(1);
- }
-
- FunctionPassManager OurFPM(TheModule);
-
- // Set up the optimizer pipeline. Start with registering info about how the
- // target lays out data structures.
- OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
- // Do simple "peephole" optimizations and bit-twiddling optzns.
- OurFPM.add(createInstructionCombiningPass());
- // Reassociate expressions.
- OurFPM.add(createReassociatePass());
- // Eliminate Common SubExpressions.
- OurFPM.add(createGVNPass());
- // Simplify the control flow graph (deleting unreachable blocks, etc).
- OurFPM.add(createCFGSimplificationPass());
-
- OurFPM.doInitialization();
-
- // Set the global so the code gen can use this.
- TheFPM = &amp;OurFPM;
-
- // Run the main "interpreter loop" now.
- MainLoop();
-
- TheFPM = 0;
-
- // Print out all of the generated code.
- TheModule-&gt;dump();
-
- return 0;
-}
-</pre>
-</div>
-
-<a href="LangImpl5.html">Next: Extending the language: control flow</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/LangImpl5-cfg.png b/docs/tutorial/LangImpl5-cfg.png
deleted file mode 100644
index cdba92f..0000000
--- a/docs/tutorial/LangImpl5-cfg.png
+++ /dev/null
Binary files differ
diff --git a/docs/tutorial/LangImpl5.html b/docs/tutorial/LangImpl5.html
deleted file mode 100644
index 7136351..0000000
--- a/docs/tutorial/LangImpl5.html
+++ /dev/null
@@ -1,1777 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Extending the Language: Control Flow</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 5
- <ol>
- <li><a href="#intro">Chapter 5 Introduction</a></li>
- <li><a href="#ifthen">If/Then/Else</a>
- <ol>
- <li><a href="#iflexer">Lexer Extensions</a></li>
- <li><a href="#ifast">AST Extensions</a></li>
- <li><a href="#ifparser">Parser Extensions</a></li>
- <li><a href="#ifir">LLVM IR</a></li>
- <li><a href="#ifcodegen">Code Generation</a></li>
- </ol>
- </li>
- <li><a href="#for">'for' Loop Expression</a>
- <ol>
- <li><a href="#forlexer">Lexer Extensions</a></li>
- <li><a href="#forast">AST Extensions</a></li>
- <li><a href="#forparser">Parser Extensions</a></li>
- <li><a href="#forir">LLVM IR</a></li>
- <li><a href="#forcodegen">Code Generation</a></li>
- </ol>
- </li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="LangImpl6.html">Chapter 6</a>: Extending the Language:
-User-defined Operators</li>
-</ul>
-
-<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 5 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple
-Kaleidoscope language and included support for generating LLVM IR, followed by
-optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is
-mostly useless: it has no control flow other than call and return. This means
-that you can't have conditional branches in the code, significantly limiting its
-power. In this episode of "build that compiler", we'll extend Kaleidoscope to
-have an if/then/else expression plus a simple 'for' loop.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="ifthen">If/Then/Else</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Extending Kaleidoscope to support if/then/else is quite straightforward. It
-basically requires adding lexer support for this "new" concept to the lexer,
-parser, AST, and LLVM code emitter. This example is nice, because it shows how
-easy it is to "grow" a language over time, incrementally extending it as new
-ideas are discovered.</p>
-
-<p>Before we get going on "how" we add this extension, lets talk about "what" we
-want. The basic idea is that we want to be able to write this sort of thing:
-</p>
-
-<div class="doc_code">
-<pre>
-def fib(x)
- if x &lt; 3 then
- 1
- else
- fib(x-1)+fib(x-2);
-</pre>
-</div>
-
-<p>In Kaleidoscope, every construct is an expression: there are no statements.
-As such, the if/then/else expression needs to return a value like any other.
-Since we're using a mostly functional form, we'll have it evaluate its
-conditional, then return the 'then' or 'else' value based on how the condition
-was resolved. This is very similar to the C "?:" expression.</p>
-
-<p>The semantics of the if/then/else expression is that it evaluates the
-condition to a boolean equality value: 0.0 is considered to be false and
-everything else is considered to be true.
-If the condition is true, the first subexpression is evaluated and returned, if
-the condition is false, the second subexpression is evaluated and returned.
-Since Kaleidoscope allows side-effects, this behavior is important to nail down.
-</p>
-
-<p>Now that we know what we "want", lets break this down into its constituent
-pieces.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for
-If/Then/Else</a></div>
-<!-- ======================================================================= -->
-
-
-<div class="doc_text">
-
-<p>The lexer extensions are straightforward. First we add new enum values
-for the relevant tokens:</p>
-
-<div class="doc_code">
-<pre>
- // control
- tok_if = -6, tok_then = -7, tok_else = -8,
-</pre>
-</div>
-
-<p>Once we have that, we recognize the new keywords in the lexer. This is pretty simple
-stuff:</p>
-
-<div class="doc_code">
-<pre>
- ...
- if (IdentifierStr == "def") return tok_def;
- if (IdentifierStr == "extern") return tok_extern;
- <b>if (IdentifierStr == "if") return tok_if;
- if (IdentifierStr == "then") return tok_then;
- if (IdentifierStr == "else") return tok_else;</b>
- return tok_identifier;
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="ifast">AST Extensions for
- If/Then/Else</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>To represent the new expression we add a new AST node for it:</p>
-
-<div class="doc_code">
-<pre>
-/// IfExprAST - Expression class for if/then/else.
-class IfExprAST : public ExprAST {
- ExprAST *Cond, *Then, *Else;
-public:
- IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
- : Cond(cond), Then(then), Else(_else) {}
- virtual Value *Codegen();
-};
-</pre>
-</div>
-
-<p>The AST node just has pointers to the various subexpressions.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="ifparser">Parser Extensions for
-If/Then/Else</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>Now that we have the relevant tokens coming from the lexer and we have the
-AST node to build, our parsing logic is relatively straightforward. First we
-define a new parsing function:</p>
-
-<div class="doc_code">
-<pre>
-/// ifexpr ::= 'if' expression 'then' expression 'else' expression
-static ExprAST *ParseIfExpr() {
- getNextToken(); // eat the if.
-
- // condition.
- ExprAST *Cond = ParseExpression();
- if (!Cond) return 0;
-
- if (CurTok != tok_then)
- return Error("expected then");
- getNextToken(); // eat the then
-
- ExprAST *Then = ParseExpression();
- if (Then == 0) return 0;
-
- if (CurTok != tok_else)
- return Error("expected else");
-
- getNextToken();
-
- ExprAST *Else = ParseExpression();
- if (!Else) return 0;
-
- return new IfExprAST(Cond, Then, Else);
-}
-</pre>
-</div>
-
-<p>Next we hook it up as a primary expression:</p>
-
-<div class="doc_code">
-<pre>
-static ExprAST *ParsePrimary() {
- switch (CurTok) {
- default: return Error("unknown token when expecting an expression");
- case tok_identifier: return ParseIdentifierExpr();
- case tok_number: return ParseNumberExpr();
- case '(': return ParseParenExpr();
- <b>case tok_if: return ParseIfExpr();</b>
- }
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>Now that we have it parsing and building the AST, the final piece is adding
-LLVM code generation support. This is the most interesting part of the
-if/then/else example, because this is where it starts to introduce new concepts.
-All of the code above has been thoroughly described in previous chapters.
-</p>
-
-<p>To motivate the code we want to produce, lets take a look at a simple
-example. Consider:</p>
-
-<div class="doc_code">
-<pre>
-extern foo();
-extern bar();
-def baz(x) if x then foo() else bar();
-</pre>
-</div>
-
-<p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
-looks like this:</p>
-
-<div class="doc_code">
-<pre>
-declare double @foo()
-
-declare double @bar()
-
-define double @baz(double %x) {
-entry:
- %ifcond = fcmp one double %x, 0.000000e+00
- br i1 %ifcond, label %then, label %else
-
-then: ; preds = %entry
- %calltmp = call double @foo()
- br label %ifcont
-
-else: ; preds = %entry
- %calltmp1 = call double @bar()
- br label %ifcont
-
-ifcont: ; preds = %else, %then
- %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
- ret double %iftmp
-}
-</pre>
-</div>
-
-<p>To visualize the control flow graph, you can use a nifty feature of the LLVM
-'<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR
-into "t.ll" and run "<tt>llvm-as &lt; t.ll | opt -analyze -view-cfg</tt>", <a
-href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
-see this graph:</p>
-
-<div style="text-align: center"><img src="LangImpl5-cfg.png" alt="Example CFG" width="423"
-height="315"></div>
-
-<p>Another way to get this is to call "<tt>F-&gt;viewCFG()</tt>" or
-"<tt>F-&gt;viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by
-inserting actual calls into the code and recompiling or by calling these in the
-debugger. LLVM has many nice features for visualizing various graphs.</p>
-
-<p>Getting back to the generated code, it is fairly simple: the entry block
-evaluates the conditional expression ("x" in our case here) and compares the
-result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
-instruction ('one' is "Ordered and Not Equal"). Based on the result of this
-expression, the code jumps to either the "then" or "else" blocks, which contain
-the expressions for the true/false cases.</p>
-
-<p>Once the then/else blocks are finished executing, they both branch back to the
-'ifcont' block to execute the code that happens after the if/then/else. In this
-case the only thing left to do is to return to the caller of the function. The
-question then becomes: how does the code know which expression to return?</p>
-
-<p>The answer to this question involves an important SSA operation: the
-<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
-operation</a>. If you're not familiar with SSA, <a
-href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
-article</a> is a good introduction and there are various other introductions to
-it available on your favorite search engine. The short version is that
-"execution" of the Phi operation requires "remembering" which block control came
-from. The Phi operation takes on the value corresponding to the input control
-block. In this case, if control comes in from the "then" block, it gets the
-value of "calltmp". If control comes from the "else" block, it gets the value
-of "calltmp1".</p>
-
-<p>At this point, you are probably starting to think "Oh no! This means my
-simple and elegant front-end will have to start generating SSA form in order to
-use LLVM!". Fortunately, this is not the case, and we strongly advise
-<em>not</em> implementing an SSA construction algorithm in your front-end
-unless there is an amazingly good reason to do so. In practice, there are two
-sorts of values that float around in code written for your average imperative
-programming language that might need Phi nodes:</p>
-
-<ol>
-<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
-<li>Values that are implicit in the structure of your AST, such as the Phi node
-in this case.</li>
-</ol>
-
-<p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable
-variables"), we'll talk about #1
-in depth. For now, just believe me that you don't need SSA construction to
-handle this case. For #2, you have the choice of using the techniques that we will
-describe for #1, or you can insert Phi nodes directly, if convenient. In this
-case, it is really really easy to generate the Phi node, so we choose to do it
-directly.</p>
-
-<p>Okay, enough of the motivation and overview, lets generate code!</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="ifcodegen">Code Generation for
-If/Then/Else</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>In order to generate code for this, we implement the <tt>Codegen</tt> method
-for <tt>IfExprAST</tt>:</p>
-
-<div class="doc_code">
-<pre>
-Value *IfExprAST::Codegen() {
- Value *CondV = Cond-&gt;Codegen();
- if (CondV == 0) return 0;
-
- // Convert condition to a bool by comparing equal to 0.0.
- CondV = Builder.CreateFCmpONE(CondV,
- ConstantFP::get(getGlobalContext(), APFloat(0.0)),
- "ifcond");
-</pre>
-</div>
-
-<p>This code is straightforward and similar to what we saw before. We emit the
-expression for the condition, then compare that value to zero to get a truth
-value as a 1-bit (bool) value.</p>
-
-<div class="doc_code">
-<pre>
- Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
-
- // Create blocks for the then and else cases. Insert the 'then' block at the
- // end of the function.
- BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
- BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
- BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
-
- Builder.CreateCondBr(CondV, ThenBB, ElseBB);
-</pre>
-</div>
-
-<p>This code creates the basic blocks that are related to the if/then/else
-statement, and correspond directly to the blocks in the example above. The
-first line gets the current Function object that is being built. It
-gets this by asking the builder for the current BasicBlock, and asking that
-block for its "parent" (the function it is currently embedded into).</p>
-
-<p>Once it has that, it creates three blocks. Note that it passes "TheFunction"
-into the constructor for the "then" block. This causes the constructor to
-automatically insert the new block into the end of the specified function. The
-other two blocks are created, but aren't yet inserted into the function.</p>
-
-<p>Once the blocks are created, we can emit the conditional branch that chooses
-between them. Note that creating new blocks does not implicitly affect the
-IRBuilder, so it is still inserting into the block that the condition
-went into. Also note that it is creating a branch to the "then" block and the
-"else" block, even though the "else" block isn't inserted into the function yet.
-This is all ok: it is the standard way that LLVM supports forward
-references.</p>
-
-<div class="doc_code">
-<pre>
- // Emit then value.
- Builder.SetInsertPoint(ThenBB);
-
- Value *ThenV = Then-&gt;Codegen();
- if (ThenV == 0) return 0;
-
- Builder.CreateBr(MergeBB);
- // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
- ThenBB = Builder.GetInsertBlock();
-</pre>
-</div>
-
-<p>After the conditional branch is inserted, we move the builder to start
-inserting into the "then" block. Strictly speaking, this call moves the
-insertion point to be at the end of the specified block. However, since the
-"then" block is empty, it also starts out by inserting at the beginning of the
-block. :)</p>
-
-<p>Once the insertion point is set, we recursively codegen the "then" expression
-from the AST. To finish off the "then" block, we create an unconditional branch
-to the merge block. One interesting (and very important) aspect of the LLVM IR
-is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
-to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
-instruction</a> such as return or branch. This means that all control flow,
-<em>including fall throughs</em> must be made explicit in the LLVM IR. If you
-violate this rule, the verifier will emit an error.</p>
-
-<p>The final line here is quite subtle, but is very important. The basic issue
-is that when we create the Phi node in the merge block, we need to set up the
-block/value pairs that indicate how the Phi will work. Importantly, the Phi
-node expects to have an entry for each predecessor of the block in the CFG. Why
-then, are we getting the current block when we just set it to ThenBB 5 lines
-above? The problem is that the "Then" expression may actually itself change the
-block that the Builder is emitting into if, for example, it contains a nested
-"if/then/else" expression. Because calling Codegen recursively could
-arbitrarily change the notion of the current block, we are required to get an
-up-to-date value for code that will set up the Phi node.</p>
-
-<div class="doc_code">
-<pre>
- // Emit else block.
- TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
- Builder.SetInsertPoint(ElseBB);
-
- Value *ElseV = Else-&gt;Codegen();
- if (ElseV == 0) return 0;
-
- Builder.CreateBr(MergeBB);
- // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
- ElseBB = Builder.GetInsertBlock();
-</pre>
-</div>
-
-<p>Code generation for the 'else' block is basically identical to codegen for
-the 'then' block. The only significant difference is the first line, which adds
-the 'else' block to the function. Recall previously that the 'else' block was
-created, but not added to the function. Now that the 'then' and 'else' blocks
-are emitted, we can finish up with the merge code:</p>
-
-<div class="doc_code">
-<pre>
- // Emit merge block.
- TheFunction->getBasicBlockList().push_back(MergeBB);
- Builder.SetInsertPoint(MergeBB);
- PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()),
- "iftmp");
-
- PN->addIncoming(ThenV, ThenBB);
- PN->addIncoming(ElseV, ElseBB);
- return PN;
-}
-</pre>
-</div>
-
-<p>The first two lines here are now familiar: the first adds the "merge" block
-to the Function object (it was previously floating, like the else block above).
-The second block changes the insertion point so that newly created code will go
-into the "merge" block. Once that is done, we need to create the PHI node and
-set up the block/value pairs for the PHI.</p>
-
-<p>Finally, the CodeGen function returns the phi node as the value computed by
-the if/then/else expression. In our example above, this returned value will
-feed into the code for the top-level function, which will create the return
-instruction.</p>
-
-<p>Overall, we now have the ability to execute conditional code in
-Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language
-that can calculate a wide variety of numeric functions. Next up we'll add
-another useful expression that is familiar from non-functional languages...</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="for">'for' Loop Expression</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Now that we know how to add basic control flow constructs to the language,
-we have the tools to add more powerful things. Lets add something more
-aggressive, a 'for' expression:</p>
-
-<div class="doc_code">
-<pre>
- extern putchard(char)
- def printstar(n)
- for i = 1, i &lt; n, 1.0 in
- putchard(42); # ascii 42 = '*'
-
- # print 100 '*' characters
- printstar(100);
-</pre>
-</div>
-
-<p>This expression defines a new variable ("i" in this case) which iterates from
-a starting value, while the condition ("i &lt; n" in this case) is true,
-incrementing by an optional step value ("1.0" in this case). If the step value
-is omitted, it defaults to 1.0. While the loop is true, it executes its
-body expression. Because we don't have anything better to return, we'll just
-define the loop as always returning 0.0. In the future when we have mutable
-variables, it will get more useful.</p>
-
-<p>As before, lets talk about the changes that we need to Kaleidoscope to
-support this.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for
-the 'for' Loop</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>The lexer extensions are the same sort of thing as for if/then/else:</p>
-
-<div class="doc_code">
-<pre>
- ... in enum Token ...
- // control
- tok_if = -6, tok_then = -7, tok_else = -8,
-<b> tok_for = -9, tok_in = -10</b>
-
- ... in gettok ...
- if (IdentifierStr == "def") return tok_def;
- if (IdentifierStr == "extern") return tok_extern;
- if (IdentifierStr == "if") return tok_if;
- if (IdentifierStr == "then") return tok_then;
- if (IdentifierStr == "else") return tok_else;
- <b>if (IdentifierStr == "for") return tok_for;
- if (IdentifierStr == "in") return tok_in;</b>
- return tok_identifier;
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="forast">AST Extensions for
-the 'for' Loop</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>The AST node is just as simple. It basically boils down to capturing
-the variable name and the constituent expressions in the node.</p>
-
-<div class="doc_code">
-<pre>
-/// ForExprAST - Expression class for for/in.
-class ForExprAST : public ExprAST {
- std::string VarName;
- ExprAST *Start, *End, *Step, *Body;
-public:
- ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
- ExprAST *step, ExprAST *body)
- : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
- virtual Value *Codegen();
-};
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="forparser">Parser Extensions for
-the 'for' Loop</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>The parser code is also fairly standard. The only interesting thing here is
-handling of the optional step value. The parser code handles it by checking to
-see if the second comma is present. If not, it sets the step value to null in
-the AST node:</p>
-
-<div class="doc_code">
-<pre>
-/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
-static ExprAST *ParseForExpr() {
- getNextToken(); // eat the for.
-
- if (CurTok != tok_identifier)
- return Error("expected identifier after for");
-
- std::string IdName = IdentifierStr;
- getNextToken(); // eat identifier.
-
- if (CurTok != '=')
- return Error("expected '=' after for");
- getNextToken(); // eat '='.
-
-
- ExprAST *Start = ParseExpression();
- if (Start == 0) return 0;
- if (CurTok != ',')
- return Error("expected ',' after for start value");
- getNextToken();
-
- ExprAST *End = ParseExpression();
- if (End == 0) return 0;
-
- // The step value is optional.
- ExprAST *Step = 0;
- if (CurTok == ',') {
- getNextToken();
- Step = ParseExpression();
- if (Step == 0) return 0;
- }
-
- if (CurTok != tok_in)
- return Error("expected 'in' after for");
- getNextToken(); // eat 'in'.
-
- ExprAST *Body = ParseExpression();
- if (Body == 0) return 0;
-
- return new ForExprAST(IdName, Start, End, Step, Body);
-}
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="forir">LLVM IR for
-the 'for' Loop</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>Now we get to the good part: the LLVM IR we want to generate for this thing.
-With the simple example above, we get this LLVM IR (note that this dump is
-generated with optimizations disabled for clarity):
-</p>
-
-<div class="doc_code">
-<pre>
-declare double @putchard(double)
-
-define double @printstar(double %n) {
-entry:
- ; initial value = 1.0 (inlined into phi)
- br label %loop
-
-loop: ; preds = %loop, %entry
- %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
- ; body
- %calltmp = call double @putchard( double 4.200000e+01 )
- ; increment
- %nextvar = fadd double %i, 1.000000e+00
-
- ; termination test
- %cmptmp = fcmp ult double %i, %n
- %booltmp = uitofp i1 %cmptmp to double
- %loopcond = fcmp one double %booltmp, 0.000000e+00
- br i1 %loopcond, label %loop, label %afterloop
-
-afterloop: ; preds = %loop
- ; loop always returns 0.0
- ret double 0.000000e+00
-}
-</pre>
-</div>
-
-<p>This loop contains all the same constructs we saw before: a phi node, several
-expressions, and some basic blocks. Lets see how this fits together.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="forcodegen">Code Generation for
-the 'for' Loop</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>The first part of Codegen is very simple: we just output the start expression
-for the loop value:</p>
-
-<div class="doc_code">
-<pre>
-Value *ForExprAST::Codegen() {
- // Emit the start code first, without 'variable' in scope.
- Value *StartVal = Start-&gt;Codegen();
- if (StartVal == 0) return 0;
-</pre>
-</div>
-
-<p>With this out of the way, the next step is to set up the LLVM basic block
-for the start of the loop body. In the case above, the whole loop body is one
-block, but remember that the body code itself could consist of multiple blocks
-(e.g. if it contains an if/then/else or a for/in expression).</p>
-
-<div class="doc_code">
-<pre>
- // Make the new basic block for the loop header, inserting after current
- // block.
- Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
- BasicBlock *PreheaderBB = Builder.GetInsertBlock();
- BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
-
- // Insert an explicit fall through from the current block to the LoopBB.
- Builder.CreateBr(LoopBB);
-</pre>
-</div>
-
-<p>This code is similar to what we saw for if/then/else. Because we will need
-it to create the Phi node, we remember the block that falls through into the
-loop. Once we have that, we create the actual block that starts the loop and
-create an unconditional branch for the fall-through between the two blocks.</p>
-
-<div class="doc_code">
-<pre>
- // Start insertion in LoopBB.
- Builder.SetInsertPoint(LoopBB);
-
- // Start the PHI node with an entry for Start.
- PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), VarName.c_str());
- Variable-&gt;addIncoming(StartVal, PreheaderBB);
-</pre>
-</div>
-
-<p>Now that the "preheader" for the loop is set up, we switch to emitting code
-for the loop body. To begin with, we move the insertion point and create the
-PHI node for the loop induction variable. Since we already know the incoming
-value for the starting value, we add it to the Phi node. Note that the Phi will
-eventually get a second value for the backedge, but we can't set it up yet
-(because it doesn't exist!).</p>
-
-<div class="doc_code">
-<pre>
- // Within the loop, the variable is defined equal to the PHI node. If it
- // shadows an existing variable, we have to restore it, so save it now.
- Value *OldVal = NamedValues[VarName];
- NamedValues[VarName] = Variable;
-
- // Emit the body of the loop. This, like any other expr, can change the
- // current BB. Note that we ignore the value computed by the body, but don't
- // allow an error.
- if (Body-&gt;Codegen() == 0)
- return 0;
-</pre>
-</div>
-
-<p>Now the code starts to get more interesting. Our 'for' loop introduces a new
-variable to the symbol table. This means that our symbol table can now contain
-either function arguments or loop variables. To handle this, before we codegen
-the body of the loop, we add the loop variable as the current value for its
-name. Note that it is possible that there is a variable of the same name in the
-outer scope. It would be easy to make this an error (emit an error and return
-null if there is already an entry for VarName) but we choose to allow shadowing
-of variables. In order to handle this correctly, we remember the Value that
-we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is
-no shadowed variable).</p>
-
-<p>Once the loop variable is set into the symbol table, the code recursively
-codegen's the body. This allows the body to use the loop variable: any
-references to it will naturally find it in the symbol table.</p>
-
-<div class="doc_code">
-<pre>
- // Emit the step value.
- Value *StepVal;
- if (Step) {
- StepVal = Step-&gt;Codegen();
- if (StepVal == 0) return 0;
- } else {
- // If not specified, use 1.0.
- StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
- }
-
- Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
-</pre>
-</div>
-
-<p>Now that the body is emitted, we compute the next value of the iteration
-variable by adding the step value, or 1.0 if it isn't present. '<tt>NextVar</tt>'
-will be the value of the loop variable on the next iteration of the loop.</p>
-
-<div class="doc_code">
-<pre>
- // Compute the end condition.
- Value *EndCond = End-&gt;Codegen();
- if (EndCond == 0) return EndCond;
-
- // Convert condition to a bool by comparing equal to 0.0.
- EndCond = Builder.CreateFCmpONE(EndCond,
- ConstantFP::get(getGlobalContext(), APFloat(0.0)),
- "loopcond");
-</pre>
-</div>
-
-<p>Finally, we evaluate the exit value of the loop, to determine whether the
-loop should exit. This mirrors the condition evaluation for the if/then/else
-statement.</p>
-
-<div class="doc_code">
-<pre>
- // Create the "after loop" block and insert it.
- BasicBlock *LoopEndBB = Builder.GetInsertBlock();
- BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
-
- // Insert the conditional branch into the end of LoopEndBB.
- Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
-
- // Any new code will be inserted in AfterBB.
- Builder.SetInsertPoint(AfterBB);
-</pre>
-</div>
-
-<p>With the code for the body of the loop complete, we just need to finish up
-the control flow for it. This code remembers the end block (for the phi node), then creates the block for the loop exit ("afterloop"). Based on the value of the
-exit condition, it creates a conditional branch that chooses between executing
-the loop again and exiting the loop. Any future code is emitted in the
-"afterloop" block, so it sets the insertion position to it.</p>
-
-<div class="doc_code">
-<pre>
- // Add a new entry to the PHI node for the backedge.
- Variable-&gt;addIncoming(NextVar, LoopEndBB);
-
- // Restore the unshadowed variable.
- if (OldVal)
- NamedValues[VarName] = OldVal;
- else
- NamedValues.erase(VarName);
-
- // for expr always returns 0.0.
- return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
-}
-</pre>
-</div>
-
-<p>The final code handles various cleanups: now that we have the "NextVar"
-value, we can add the incoming value to the loop PHI node. After that, we
-remove the loop variable from the symbol table, so that it isn't in scope after
-the for loop. Finally, code generation of the for loop always returns 0.0, so
-that is what we return from <tt>ForExprAST::Codegen</tt>.</p>
-
-<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
-the tutorial. In this chapter we added two control flow constructs, and used them to motivate a couple of aspects of the LLVM IR that are important for front-end implementors
-to know. In the next chapter of our saga, we will get a bit crazier and add
-<a href="LangImpl6.html">user-defined operators</a> to our poor innocent
-language.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for our running example, enhanced with the
-if/then/else and for expressions.. To build this example, use:
-</p>
-
-<div class="doc_code">
-<pre>
- # Compile
- g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
- # Run
- ./toy
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<div class="doc_code">
-<pre>
-#include "llvm/DerivedTypes.h"
-#include "llvm/ExecutionEngine/ExecutionEngine.h"
-#include "llvm/ExecutionEngine/JIT.h"
-#include "llvm/LLVMContext.h"
-#include "llvm/Module.h"
-#include "llvm/PassManager.h"
-#include "llvm/Analysis/Verifier.h"
-#include "llvm/Target/TargetData.h"
-#include "llvm/Target/TargetSelect.h"
-#include "llvm/Transforms/Scalar.h"
-#include "llvm/Support/IRBuilder.h"
-#include &lt;cstdio&gt;
-#include &lt;string&gt;
-#include &lt;map&gt;
-#include &lt;vector&gt;
-using namespace llvm;
-
-//===----------------------------------------------------------------------===//
-// Lexer
-//===----------------------------------------------------------------------===//
-
-// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
-// of these for known things.
-enum Token {
- tok_eof = -1,
-
- // commands
- tok_def = -2, tok_extern = -3,
-
- // primary
- tok_identifier = -4, tok_number = -5,
-
- // control
- tok_if = -6, tok_then = -7, tok_else = -8,
- tok_for = -9, tok_in = -10
-};
-
-static std::string IdentifierStr; // Filled in if tok_identifier
-static double NumVal; // Filled in if tok_number
-
-/// gettok - Return the next token from standard input.
-static int gettok() {
- static int LastChar = ' ';
-
- // Skip any whitespace.
- while (isspace(LastChar))
- LastChar = getchar();
-
- if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
- IdentifierStr = LastChar;
- while (isalnum((LastChar = getchar())))
- IdentifierStr += LastChar;
-
- if (IdentifierStr == "def") return tok_def;
- if (IdentifierStr == "extern") return tok_extern;
- if (IdentifierStr == "if") return tok_if;
- if (IdentifierStr == "then") return tok_then;
- if (IdentifierStr == "else") return tok_else;
- if (IdentifierStr == "for") return tok_for;
- if (IdentifierStr == "in") return tok_in;
- return tok_identifier;
- }
-
- if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
- std::string NumStr;
- do {
- NumStr += LastChar;
- LastChar = getchar();
- } while (isdigit(LastChar) || LastChar == '.');
-
- NumVal = strtod(NumStr.c_str(), 0);
- return tok_number;
- }
-
- if (LastChar == '#') {
- // Comment until end of line.
- do LastChar = getchar();
- while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
-
- if (LastChar != EOF)
- return gettok();
- }
-
- // Check for end of file. Don't eat the EOF.
- if (LastChar == EOF)
- return tok_eof;
-
- // Otherwise, just return the character as its ascii value.
- int ThisChar = LastChar;
- LastChar = getchar();
- return ThisChar;
-}
-
-//===----------------------------------------------------------------------===//
-// Abstract Syntax Tree (aka Parse Tree)
-//===----------------------------------------------------------------------===//
-
-/// ExprAST - Base class for all expression nodes.
-class ExprAST {
-public:
- virtual ~ExprAST() {}
- virtual Value *Codegen() = 0;
-};
-
-/// NumberExprAST - Expression class for numeric literals like "1.0".
-class NumberExprAST : public ExprAST {
- double Val;
-public:
- NumberExprAST(double val) : Val(val) {}
- virtual Value *Codegen();
-};
-
-/// VariableExprAST - Expression class for referencing a variable, like "a".
-class VariableExprAST : public ExprAST {
- std::string Name;
-public:
- VariableExprAST(const std::string &amp;name) : Name(name) {}
- virtual Value *Codegen();
-};
-
-/// BinaryExprAST - Expression class for a binary operator.
-class BinaryExprAST : public ExprAST {
- char Op;
- ExprAST *LHS, *RHS;
-public:
- BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
- : Op(op), LHS(lhs), RHS(rhs) {}
- virtual Value *Codegen();
-};
-
-/// CallExprAST - Expression class for function calls.
-class CallExprAST : public ExprAST {
- std::string Callee;
- std::vector&lt;ExprAST*&gt; Args;
-public:
- CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
- : Callee(callee), Args(args) {}
- virtual Value *Codegen();
-};
-
-/// IfExprAST - Expression class for if/then/else.
-class IfExprAST : public ExprAST {
- ExprAST *Cond, *Then, *Else;
-public:
- IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
- : Cond(cond), Then(then), Else(_else) {}
- virtual Value *Codegen();
-};
-
-/// ForExprAST - Expression class for for/in.
-class ForExprAST : public ExprAST {
- std::string VarName;
- ExprAST *Start, *End, *Step, *Body;
-public:
- ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
- ExprAST *step, ExprAST *body)
- : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
- virtual Value *Codegen();
-};
-
-/// PrototypeAST - This class represents the "prototype" for a function,
-/// which captures its name, and its argument names (thus implicitly the number
-/// of arguments the function takes).
-class PrototypeAST {
- std::string Name;
- std::vector&lt;std::string&gt; Args;
-public:
- PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
- : Name(name), Args(args) {}
-
- Function *Codegen();
-};
-
-/// FunctionAST - This class represents a function definition itself.
-class FunctionAST {
- PrototypeAST *Proto;
- ExprAST *Body;
-public:
- FunctionAST(PrototypeAST *proto, ExprAST *body)
- : Proto(proto), Body(body) {}
-
- Function *Codegen();
-};
-
-//===----------------------------------------------------------------------===//
-// Parser
-//===----------------------------------------------------------------------===//
-
-/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
-/// token the parser is looking at. getNextToken reads another token from the
-/// lexer and updates CurTok with its results.
-static int CurTok;
-static int getNextToken() {
- return CurTok = gettok();
-}
-
-/// BinopPrecedence - This holds the precedence for each binary operator that is
-/// defined.
-static std::map&lt;char, int&gt; BinopPrecedence;
-
-/// GetTokPrecedence - Get the precedence of the pending binary operator token.
-static int GetTokPrecedence() {
- if (!isascii(CurTok))
- return -1;
-
- // Make sure it's a declared binop.
- int TokPrec = BinopPrecedence[CurTok];
- if (TokPrec &lt;= 0) return -1;
- return TokPrec;
-}
-
-/// Error* - These are little helper functions for error handling.
-ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
-PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
-FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
-
-static ExprAST *ParseExpression();
-
-/// identifierexpr
-/// ::= identifier
-/// ::= identifier '(' expression* ')'
-static ExprAST *ParseIdentifierExpr() {
- std::string IdName = IdentifierStr;
-
- getNextToken(); // eat identifier.
-
- if (CurTok != '(') // Simple variable ref.
- return new VariableExprAST(IdName);
-
- // Call.
- getNextToken(); // eat (
- std::vector&lt;ExprAST*&gt; Args;
- if (CurTok != ')') {
- while (1) {
- ExprAST *Arg = ParseExpression();
- if (!Arg) return 0;
- Args.push_back(Arg);
-
- if (CurTok == ')') break;
-
- if (CurTok != ',')
- return Error("Expected ')' or ',' in argument list");
- getNextToken();
- }
- }
-
- // Eat the ')'.
- getNextToken();
-
- return new CallExprAST(IdName, Args);
-}
-
-/// numberexpr ::= number
-static ExprAST *ParseNumberExpr() {
- ExprAST *Result = new NumberExprAST(NumVal);
- getNextToken(); // consume the number
- return Result;
-}
-
-/// parenexpr ::= '(' expression ')'
-static ExprAST *ParseParenExpr() {
- getNextToken(); // eat (.
- ExprAST *V = ParseExpression();
- if (!V) return 0;
-
- if (CurTok != ')')
- return Error("expected ')'");
- getNextToken(); // eat ).
- return V;
-}
-
-/// ifexpr ::= 'if' expression 'then' expression 'else' expression
-static ExprAST *ParseIfExpr() {
- getNextToken(); // eat the if.
-
- // condition.
- ExprAST *Cond = ParseExpression();
- if (!Cond) return 0;
-
- if (CurTok != tok_then)
- return Error("expected then");
- getNextToken(); // eat the then
-
- ExprAST *Then = ParseExpression();
- if (Then == 0) return 0;
-
- if (CurTok != tok_else)
- return Error("expected else");
-
- getNextToken();
-
- ExprAST *Else = ParseExpression();
- if (!Else) return 0;
-
- return new IfExprAST(Cond, Then, Else);
-}
-
-/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
-static ExprAST *ParseForExpr() {
- getNextToken(); // eat the for.
-
- if (CurTok != tok_identifier)
- return Error("expected identifier after for");
-
- std::string IdName = IdentifierStr;
- getNextToken(); // eat identifier.
-
- if (CurTok != '=')
- return Error("expected '=' after for");
- getNextToken(); // eat '='.
-
-
- ExprAST *Start = ParseExpression();
- if (Start == 0) return 0;
- if (CurTok != ',')
- return Error("expected ',' after for start value");
- getNextToken();
-
- ExprAST *End = ParseExpression();
- if (End == 0) return 0;
-
- // The step value is optional.
- ExprAST *Step = 0;
- if (CurTok == ',') {
- getNextToken();
- Step = ParseExpression();
- if (Step == 0) return 0;
- }
-
- if (CurTok != tok_in)
- return Error("expected 'in' after for");
- getNextToken(); // eat 'in'.
-
- ExprAST *Body = ParseExpression();
- if (Body == 0) return 0;
-
- return new ForExprAST(IdName, Start, End, Step, Body);
-}
-
-/// primary
-/// ::= identifierexpr
-/// ::= numberexpr
-/// ::= parenexpr
-/// ::= ifexpr
-/// ::= forexpr
-static ExprAST *ParsePrimary() {
- switch (CurTok) {
- default: return Error("unknown token when expecting an expression");
- case tok_identifier: return ParseIdentifierExpr();
- case tok_number: return ParseNumberExpr();
- case '(': return ParseParenExpr();
- case tok_if: return ParseIfExpr();
- case tok_for: return ParseForExpr();
- }
-}
-
-/// binoprhs
-/// ::= ('+' primary)*
-static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
- // If this is a binop, find its precedence.
- while (1) {
- int TokPrec = GetTokPrecedence();
-
- // If this is a binop that binds at least as tightly as the current binop,
- // consume it, otherwise we are done.
- if (TokPrec &lt; ExprPrec)
- return LHS;
-
- // Okay, we know this is a binop.
- int BinOp = CurTok;
- getNextToken(); // eat binop
-
- // Parse the primary expression after the binary operator.
- ExprAST *RHS = ParsePrimary();
- if (!RHS) return 0;
-
- // If BinOp binds less tightly with RHS than the operator after RHS, let
- // the pending operator take RHS as its LHS.
- int NextPrec = GetTokPrecedence();
- if (TokPrec &lt; NextPrec) {
- RHS = ParseBinOpRHS(TokPrec+1, RHS);
- if (RHS == 0) return 0;
- }
-
- // Merge LHS/RHS.
- LHS = new BinaryExprAST(BinOp, LHS, RHS);
- }
-}
-
-/// expression
-/// ::= primary binoprhs
-///
-static ExprAST *ParseExpression() {
- ExprAST *LHS = ParsePrimary();
- if (!LHS) return 0;
-
- return ParseBinOpRHS(0, LHS);
-}
-
-/// prototype
-/// ::= id '(' id* ')'
-static PrototypeAST *ParsePrototype() {
- if (CurTok != tok_identifier)
- return ErrorP("Expected function name in prototype");
-
- std::string FnName = IdentifierStr;
- getNextToken();
-
- if (CurTok != '(')
- return ErrorP("Expected '(' in prototype");
-
- std::vector&lt;std::string&gt; ArgNames;
- while (getNextToken() == tok_identifier)
- ArgNames.push_back(IdentifierStr);
- if (CurTok != ')')
- return ErrorP("Expected ')' in prototype");
-
- // success.
- getNextToken(); // eat ')'.
-
- return new PrototypeAST(FnName, ArgNames);
-}
-
-/// definition ::= 'def' prototype expression
-static FunctionAST *ParseDefinition() {
- getNextToken(); // eat def.
- PrototypeAST *Proto = ParsePrototype();
- if (Proto == 0) return 0;
-
- if (ExprAST *E = ParseExpression())
- return new FunctionAST(Proto, E);
- return 0;
-}
-
-/// toplevelexpr ::= expression
-static FunctionAST *ParseTopLevelExpr() {
- if (ExprAST *E = ParseExpression()) {
- // Make an anonymous proto.
- PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
- return new FunctionAST(Proto, E);
- }
- return 0;
-}
-
-/// external ::= 'extern' prototype
-static PrototypeAST *ParseExtern() {
- getNextToken(); // eat extern.
- return ParsePrototype();
-}
-
-//===----------------------------------------------------------------------===//
-// Code Generation
-//===----------------------------------------------------------------------===//
-
-static Module *TheModule;
-static IRBuilder&lt;&gt; Builder(getGlobalContext());
-static std::map&lt;std::string, Value*&gt; NamedValues;
-static FunctionPassManager *TheFPM;
-
-Value *ErrorV(const char *Str) { Error(Str); return 0; }
-
-Value *NumberExprAST::Codegen() {
- return ConstantFP::get(getGlobalContext(), APFloat(Val));
-}
-
-Value *VariableExprAST::Codegen() {
- // Look this variable up in the function.
- Value *V = NamedValues[Name];
- return V ? V : ErrorV("Unknown variable name");
-}
-
-Value *BinaryExprAST::Codegen() {
- Value *L = LHS-&gt;Codegen();
- Value *R = RHS-&gt;Codegen();
- if (L == 0 || R == 0) return 0;
-
- switch (Op) {
- case '+': return Builder.CreateAdd(L, R, "addtmp");
- case '-': return Builder.CreateSub(L, R, "subtmp");
- case '*': return Builder.CreateMul(L, R, "multmp");
- case '&lt;':
- L = Builder.CreateFCmpULT(L, R, "cmptmp");
- // Convert bool 0/1 to double 0.0 or 1.0
- return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
- "booltmp");
- default: return ErrorV("invalid binary operator");
- }
-}
-
-Value *CallExprAST::Codegen() {
- // Look up the name in the global module table.
- Function *CalleeF = TheModule-&gt;getFunction(Callee);
- if (CalleeF == 0)
- return ErrorV("Unknown function referenced");
-
- // If argument mismatch error.
- if (CalleeF-&gt;arg_size() != Args.size())
- return ErrorV("Incorrect # arguments passed");
-
- std::vector&lt;Value*&gt; ArgsV;
- for (unsigned i = 0, e = Args.size(); i != e; ++i) {
- ArgsV.push_back(Args[i]-&gt;Codegen());
- if (ArgsV.back() == 0) return 0;
- }
-
- return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
-}
-
-Value *IfExprAST::Codegen() {
- Value *CondV = Cond-&gt;Codegen();
- if (CondV == 0) return 0;
-
- // Convert condition to a bool by comparing equal to 0.0.
- CondV = Builder.CreateFCmpONE(CondV,
- ConstantFP::get(getGlobalContext(), APFloat(0.0)),
- "ifcond");
-
- Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
-
- // Create blocks for the then and else cases. Insert the 'then' block at the
- // end of the function.
- BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
- BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
- BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
-
- Builder.CreateCondBr(CondV, ThenBB, ElseBB);
-
- // Emit then value.
- Builder.SetInsertPoint(ThenBB);
-
- Value *ThenV = Then-&gt;Codegen();
- if (ThenV == 0) return 0;
-
- Builder.CreateBr(MergeBB);
- // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
- ThenBB = Builder.GetInsertBlock();
-
- // Emit else block.
- TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
- Builder.SetInsertPoint(ElseBB);
-
- Value *ElseV = Else-&gt;Codegen();
- if (ElseV == 0) return 0;
-
- Builder.CreateBr(MergeBB);
- // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
- ElseBB = Builder.GetInsertBlock();
-
- // Emit merge block.
- TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
- Builder.SetInsertPoint(MergeBB);
- PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()),
- "iftmp");
-
- PN-&gt;addIncoming(ThenV, ThenBB);
- PN-&gt;addIncoming(ElseV, ElseBB);
- return PN;
-}
-
-Value *ForExprAST::Codegen() {
- // Output this as:
- // ...
- // start = startexpr
- // goto loop
- // loop:
- // variable = phi [start, loopheader], [nextvariable, loopend]
- // ...
- // bodyexpr
- // ...
- // loopend:
- // step = stepexpr
- // nextvariable = variable + step
- // endcond = endexpr
- // br endcond, loop, endloop
- // outloop:
-
- // Emit the start code first, without 'variable' in scope.
- Value *StartVal = Start-&gt;Codegen();
- if (StartVal == 0) return 0;
-
- // Make the new basic block for the loop header, inserting after current
- // block.
- Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
- BasicBlock *PreheaderBB = Builder.GetInsertBlock();
- BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
-
- // Insert an explicit fall through from the current block to the LoopBB.
- Builder.CreateBr(LoopBB);
-
- // Start insertion in LoopBB.
- Builder.SetInsertPoint(LoopBB);
-
- // Start the PHI node with an entry for Start.
- PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), VarName.c_str());
- Variable-&gt;addIncoming(StartVal, PreheaderBB);
-
- // Within the loop, the variable is defined equal to the PHI node. If it
- // shadows an existing variable, we have to restore it, so save it now.
- Value *OldVal = NamedValues[VarName];
- NamedValues[VarName] = Variable;
-
- // Emit the body of the loop. This, like any other expr, can change the
- // current BB. Note that we ignore the value computed by the body, but don't
- // allow an error.
- if (Body-&gt;Codegen() == 0)
- return 0;
-
- // Emit the step value.
- Value *StepVal;
- if (Step) {
- StepVal = Step-&gt;Codegen();
- if (StepVal == 0) return 0;
- } else {
- // If not specified, use 1.0.
- StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
- }
-
- Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
-
- // Compute the end condition.
- Value *EndCond = End-&gt;Codegen();
- if (EndCond == 0) return EndCond;
-
- // Convert condition to a bool by comparing equal to 0.0.
- EndCond = Builder.CreateFCmpONE(EndCond,
- ConstantFP::get(getGlobalContext(), APFloat(0.0)),
- "loopcond");
-
- // Create the "after loop" block and insert it.
- BasicBlock *LoopEndBB = Builder.GetInsertBlock();
- BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
-
- // Insert the conditional branch into the end of LoopEndBB.
- Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
-
- // Any new code will be inserted in AfterBB.
- Builder.SetInsertPoint(AfterBB);
-
- // Add a new entry to the PHI node for the backedge.
- Variable-&gt;addIncoming(NextVar, LoopEndBB);
-
- // Restore the unshadowed variable.
- if (OldVal)
- NamedValues[VarName] = OldVal;
- else
- NamedValues.erase(VarName);
-
-
- // for expr always returns 0.0.
- return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
-}
-
-Function *PrototypeAST::Codegen() {
- // Make the function type: double(double,double) etc.
- std::vector&lt;const Type*&gt; Doubles(Args.size(),
- Type::getDoubleTy(getGlobalContext()));
- FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
- Doubles, false);
-
- Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
-
- // If F conflicted, there was already something named 'Name'. If it has a
- // body, don't allow redefinition or reextern.
- if (F-&gt;getName() != Name) {
- // Delete the one we just made and get the existing one.
- F-&gt;eraseFromParent();
- F = TheModule-&gt;getFunction(Name);
-
- // If F already has a body, reject this.
- if (!F-&gt;empty()) {
- ErrorF("redefinition of function");
- return 0;
- }
-
- // If F took a different number of args, reject.
- if (F-&gt;arg_size() != Args.size()) {
- ErrorF("redefinition of function with different # args");
- return 0;
- }
- }
-
- // Set names for all arguments.
- unsigned Idx = 0;
- for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
- ++AI, ++Idx) {
- AI-&gt;setName(Args[Idx]);
-
- // Add arguments to variable symbol table.
- NamedValues[Args[Idx]] = AI;
- }
-
- return F;
-}
-
-Function *FunctionAST::Codegen() {
- NamedValues.clear();
-
- Function *TheFunction = Proto-&gt;Codegen();
- if (TheFunction == 0)
- return 0;
-
- // Create a new basic block to start insertion into.
- BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
- Builder.SetInsertPoint(BB);
-
- if (Value *RetVal = Body-&gt;Codegen()) {
- // Finish off the function.
- Builder.CreateRet(RetVal);
-
- // Validate the generated code, checking for consistency.
- verifyFunction(*TheFunction);
-
- // Optimize the function.
- TheFPM-&gt;run(*TheFunction);
-
- return TheFunction;
- }
-
- // Error reading body, remove function.
- TheFunction-&gt;eraseFromParent();
- return 0;
-}
-
-//===----------------------------------------------------------------------===//
-// Top-Level parsing and JIT Driver
-//===----------------------------------------------------------------------===//
-
-static ExecutionEngine *TheExecutionEngine;
-
-static void HandleDefinition() {
- if (FunctionAST *F = ParseDefinition()) {
- if (Function *LF = F-&gt;Codegen()) {
- fprintf(stderr, "Read function definition:");
- LF-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleExtern() {
- if (PrototypeAST *P = ParseExtern()) {
- if (Function *F = P-&gt;Codegen()) {
- fprintf(stderr, "Read extern: ");
- F-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleTopLevelExpression() {
- // Evaluate a top-level expression into an anonymous function.
- if (FunctionAST *F = ParseTopLevelExpr()) {
- if (Function *LF = F-&gt;Codegen()) {
- // JIT the function, returning a function pointer.
- void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
-
- // Cast it to the right type (takes no arguments, returns a double) so we
- // can call it as a native function.
- double (*FP)() = (double (*)())(intptr_t)FPtr;
- fprintf(stderr, "Evaluated to %f\n", FP());
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-/// top ::= definition | external | expression | ';'
-static void MainLoop() {
- while (1) {
- fprintf(stderr, "ready&gt; ");
- switch (CurTok) {
- case tok_eof: return;
- case ';': getNextToken(); break; // ignore top-level semicolons.
- case tok_def: HandleDefinition(); break;
- case tok_extern: HandleExtern(); break;
- default: HandleTopLevelExpression(); break;
- }
- }
-}
-
-//===----------------------------------------------------------------------===//
-// "Library" functions that can be "extern'd" from user code.
-//===----------------------------------------------------------------------===//
-
-/// putchard - putchar that takes a double and returns 0.
-extern "C"
-double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-
-//===----------------------------------------------------------------------===//
-// Main driver code.
-//===----------------------------------------------------------------------===//
-
-int main() {
- InitializeNativeTarget();
- LLVMContext &amp;Context = getGlobalContext();
-
- // Install standard binary operators.
- // 1 is lowest precedence.
- BinopPrecedence['&lt;'] = 10;
- BinopPrecedence['+'] = 20;
- BinopPrecedence['-'] = 20;
- BinopPrecedence['*'] = 40; // highest.
-
- // Prime the first token.
- fprintf(stderr, "ready&gt; ");
- getNextToken();
-
- // Make the module, which holds all the code.
- TheModule = new Module("my cool jit", Context);
-
- // Create the JIT. This takes ownership of the module.
- std::string ErrStr;
- TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create();
- if (!TheExecutionEngine) {
- fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
- exit(1);
- }
-
- FunctionPassManager OurFPM(TheModule);
-
- // Set up the optimizer pipeline. Start with registering info about how the
- // target lays out data structures.
- OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
- // Do simple "peephole" optimizations and bit-twiddling optzns.
- OurFPM.add(createInstructionCombiningPass());
- // Reassociate expressions.
- OurFPM.add(createReassociatePass());
- // Eliminate Common SubExpressions.
- OurFPM.add(createGVNPass());
- // Simplify the control flow graph (deleting unreachable blocks, etc).
- OurFPM.add(createCFGSimplificationPass());
-
- OurFPM.doInitialization();
-
- // Set the global so the code gen can use this.
- TheFPM = &amp;OurFPM;
-
- // Run the main "interpreter loop" now.
- MainLoop();
-
- TheFPM = 0;
-
- // Print out all of the generated code.
- TheModule-&gt;dump();
-
- return 0;
-}
-</pre>
-</div>
-
-<a href="LangImpl6.html">Next: Extending the language: user-defined operators</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/LangImpl6.html b/docs/tutorial/LangImpl6.html
deleted file mode 100644
index 5fae906..0000000
--- a/docs/tutorial/LangImpl6.html
+++ /dev/null
@@ -1,1814 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Extending the Language: User-defined Operators</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Extending the Language: User-defined Operators</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 6
- <ol>
- <li><a href="#intro">Chapter 6 Introduction</a></li>
- <li><a href="#idea">User-defined Operators: the Idea</a></li>
- <li><a href="#binary">User-defined Binary Operators</a></li>
- <li><a href="#unary">User-defined Unary Operators</a></li>
- <li><a href="#example">Kicking the Tires</a></li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="LangImpl7.html">Chapter 7</a>: Extending the Language: Mutable
-Variables / SSA Construction</li>
-</ul>
-
-<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 6 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 6 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. At this point in our tutorial, we now have a fully
-functional language that is fairly minimal, but also useful. There
-is still one big problem with it, however. Our language doesn't have many
-useful operators (like division, logical negation, or even any comparisons
-besides less-than).</p>
-
-<p>This chapter of the tutorial takes a wild digression into adding user-defined
-operators to the simple and beautiful Kaleidoscope language. This digression now gives
-us a simple and ugly language in some ways, but also a powerful one at the same time.
-One of the great things about creating your own language is that you get to
-decide what is good or bad. In this tutorial we'll assume that it is okay to
-use this as a way to show some interesting parsing techniques.</p>
-
-<p>At the end of this tutorial, we'll run through an example Kaleidoscope
-application that <a href="#example">renders the Mandelbrot set</a>. This gives
-an example of what you can build with Kaleidoscope and its feature set.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="idea">User-defined Operators: the Idea</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-The "operator overloading" that we will add to Kaleidoscope is more general than
-languages like C++. In C++, you are only allowed to redefine existing
-operators: you can't programatically change the grammar, introduce new
-operators, change precedence levels, etc. In this chapter, we will add this
-capability to Kaleidoscope, which will let the user round out the set of
-operators that are supported.</p>
-
-<p>The point of going into user-defined operators in a tutorial like this is to
-show the power and flexibility of using a hand-written parser. Thus far, the parser
-we have been implementing uses recursive descent for most parts of the grammar and
-operator precedence parsing for the expressions. See <a
-href="LangImpl2.html">Chapter 2</a> for details. Without using operator
-precedence parsing, it would be very difficult to allow the programmer to
-introduce new operators into the grammar: the grammar is dynamically extensible
-as the JIT runs.</p>
-
-<p>The two specific features we'll add are programmable unary operators (right
-now, Kaleidoscope has no unary operators at all) as well as binary operators.
-An example of this is:</p>
-
-<div class="doc_code">
-<pre>
-# Logical unary not.
-def unary!(v)
- if v then
- 0
- else
- 1;
-
-# Define &gt; with the same precedence as &lt;.
-def binary&gt; 10 (LHS RHS)
- RHS &lt; LHS;
-
-# Binary "logical or", (note that it does not "short circuit")
-def binary| 5 (LHS RHS)
- if LHS then
- 1
- else if RHS then
- 1
- else
- 0;
-
-# Define = with slightly lower precedence than relationals.
-def binary= 9 (LHS RHS)
- !(LHS &lt; RHS | LHS &gt; RHS);
-</pre>
-</div>
-
-<p>Many languages aspire to being able to implement their standard runtime
-library in the language itself. In Kaleidoscope, we can implement significant
-parts of the language in the library!</p>
-
-<p>We will break down implementation of these features into two parts:
-implementing support for user-defined binary operators and adding unary
-operators.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="binary">User-defined Binary Operators</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Adding support for user-defined binary operators is pretty simple with our
-current framework. We'll first add support for the unary/binary keywords:</p>
-
-<div class="doc_code">
-<pre>
-enum Token {
- ...
- <b>// operators
- tok_binary = -11, tok_unary = -12</b>
-};
-...
-static int gettok() {
-...
- if (IdentifierStr == "for") return tok_for;
- if (IdentifierStr == "in") return tok_in;
- <b>if (IdentifierStr == "binary") return tok_binary;
- if (IdentifierStr == "unary") return tok_unary;</b>
- return tok_identifier;
-</pre>
-</div>
-
-<p>This just adds lexer support for the unary and binary keywords, like we
-did in <a href="LangImpl5.html#iflexer">previous chapters</a>. One nice thing
-about our current AST, is that we represent binary operators with full generalisation
-by using their ASCII code as the opcode. For our extended operators, we'll use this
-same representation, so we don't need any new AST or parser support.</p>
-
-<p>On the other hand, we have to be able to represent the definitions of these
-new operators, in the "def binary| 5" part of the function definition. In our
-grammar so far, the "name" for the function definition is parsed as the
-"prototype" production and into the <tt>PrototypeAST</tt> AST node. To
-represent our new user-defined operators as prototypes, we have to extend
-the <tt>PrototypeAST</tt> AST node like this:</p>
-
-<div class="doc_code">
-<pre>
-/// PrototypeAST - This class represents the "prototype" for a function,
-/// which captures its argument names as well as if it is an operator.
-class PrototypeAST {
- std::string Name;
- std::vector&lt;std::string&gt; Args;
- <b>bool isOperator;
- unsigned Precedence; // Precedence if a binary op.</b>
-public:
- PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
- <b>bool isoperator = false, unsigned prec = 0</b>)
- : Name(name), Args(args), <b>isOperator(isoperator), Precedence(prec)</b> {}
-
- <b>bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
- bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
-
- char getOperatorName() const {
- assert(isUnaryOp() || isBinaryOp());
- return Name[Name.size()-1];
- }
-
- unsigned getBinaryPrecedence() const { return Precedence; }</b>
-
- Function *Codegen();
-};
-</pre>
-</div>
-
-<p>Basically, in addition to knowing a name for the prototype, we now keep track
-of whether it was an operator, and if it was, what precedence level the operator
-is at. The precedence is only used for binary operators (as you'll see below,
-it just doesn't apply for unary operators). Now that we have a way to represent
-the prototype for a user-defined operator, we need to parse it:</p>
-
-<div class="doc_code">
-<pre>
-/// prototype
-/// ::= id '(' id* ')'
-<b>/// ::= binary LETTER number? (id, id)</b>
-static PrototypeAST *ParsePrototype() {
- std::string FnName;
-
- <b>unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
- unsigned BinaryPrecedence = 30;</b>
-
- switch (CurTok) {
- default:
- return ErrorP("Expected function name in prototype");
- case tok_identifier:
- FnName = IdentifierStr;
- Kind = 0;
- getNextToken();
- break;
- <b>case tok_binary:
- getNextToken();
- if (!isascii(CurTok))
- return ErrorP("Expected binary operator");
- FnName = "binary";
- FnName += (char)CurTok;
- Kind = 2;
- getNextToken();
-
- // Read the precedence if present.
- if (CurTok == tok_number) {
- if (NumVal &lt; 1 || NumVal &gt; 100)
- return ErrorP("Invalid precedecnce: must be 1..100");
- BinaryPrecedence = (unsigned)NumVal;
- getNextToken();
- }
- break;</b>
- }
-
- if (CurTok != '(')
- return ErrorP("Expected '(' in prototype");
-
- std::vector&lt;std::string&gt; ArgNames;
- while (getNextToken() == tok_identifier)
- ArgNames.push_back(IdentifierStr);
- if (CurTok != ')')
- return ErrorP("Expected ')' in prototype");
-
- // success.
- getNextToken(); // eat ')'.
-
- <b>// Verify right number of names for operator.
- if (Kind &amp;&amp; ArgNames.size() != Kind)
- return ErrorP("Invalid number of operands for operator");
-
- return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);</b>
-}
-</pre>
-</div>
-
-<p>This is all fairly straightforward parsing code, and we have already seen
-a lot of similar code in the past. One interesting part about the code above is
-the couple lines that set up <tt>FnName</tt> for binary operators. This builds names
-like "binary@" for a newly defined "@" operator. This then takes advantage of the
-fact that symbol names in the LLVM symbol table are allowed to have any character in
-them, including embedded nul characters.</p>
-
-<p>The next interesting thing to add, is codegen support for these binary operators.
-Given our current structure, this is a simple addition of a default case for our
-existing binary operator node:</p>
-
-<div class="doc_code">
-<pre>
-Value *BinaryExprAST::Codegen() {
- Value *L = LHS-&gt;Codegen();
- Value *R = RHS-&gt;Codegen();
- if (L == 0 || R == 0) return 0;
-
- switch (Op) {
- case '+': return Builder.CreateAdd(L, R, "addtmp");
- case '-': return Builder.CreateSub(L, R, "subtmp");
- case '*': return Builder.CreateMul(L, R, "multmp");
- case '&lt;':
- L = Builder.CreateFCmpULT(L, R, "cmptmp");
- // Convert bool 0/1 to double 0.0 or 1.0
- return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
- "booltmp");
- <b>default: break;</b>
- }
-
- <b>// If it wasn't a builtin binary operator, it must be a user defined one. Emit
- // a call to it.
- Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
- assert(F &amp;&amp; "binary operator not found!");
-
- Value *Ops[] = { L, R };
- return Builder.CreateCall(F, Ops, Ops+2, "binop");</b>
-}
-
-</pre>
-</div>
-
-<p>As you can see above, the new code is actually really simple. It just does
-a lookup for the appropriate operator in the symbol table and generates a
-function call to it. Since user-defined operators are just built as normal
-functions (because the "prototype" boils down to a function with the right
-name) everything falls into place.</p>
-
-<p>The final piece of code we are missing, is a bit of top-level magic:</p>
-
-<div class="doc_code">
-<pre>
-Function *FunctionAST::Codegen() {
- NamedValues.clear();
-
- Function *TheFunction = Proto->Codegen();
- if (TheFunction == 0)
- return 0;
-
- <b>// If this is an operator, install it.
- if (Proto-&gt;isBinaryOp())
- BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();</b>
-
- // Create a new basic block to start insertion into.
- BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
- Builder.SetInsertPoint(BB);
-
- if (Value *RetVal = Body-&gt;Codegen()) {
- ...
-</pre>
-</div>
-
-<p>Basically, before codegening a function, if it is a user-defined operator, we
-register it in the precedence table. This allows the binary operator parsing
-logic we already have in place to handle it. Since we are working on a fully-general operator precedence parser, this is all we need to do to "extend the grammar".</p>
-
-<p>Now we have useful user-defined binary operators. This builds a lot
-on the previous framework we built for other operators. Adding unary operators
-is a bit more challenging, because we don't have any framework for it yet - lets
-see what it takes.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="unary">User-defined Unary Operators</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Since we don't currently support unary operators in the Kaleidoscope
-language, we'll need to add everything to support them. Above, we added simple
-support for the 'unary' keyword to the lexer. In addition to that, we need an
-AST node:</p>
-
-<div class="doc_code">
-<pre>
-/// UnaryExprAST - Expression class for a unary operator.
-class UnaryExprAST : public ExprAST {
- char Opcode;
- ExprAST *Operand;
-public:
- UnaryExprAST(char opcode, ExprAST *operand)
- : Opcode(opcode), Operand(operand) {}
- virtual Value *Codegen();
-};
-</pre>
-</div>
-
-<p>This AST node is very simple and obvious by now. It directly mirrors the
-binary operator AST node, except that it only has one child. With this, we
-need to add the parsing logic. Parsing a unary operator is pretty simple: we'll
-add a new function to do it:</p>
-
-<div class="doc_code">
-<pre>
-/// unary
-/// ::= primary
-/// ::= '!' unary
-static ExprAST *ParseUnary() {
- // If the current token is not an operator, it must be a primary expr.
- if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
- return ParsePrimary();
-
- // If this is a unary operator, read it.
- int Opc = CurTok;
- getNextToken();
- if (ExprAST *Operand = ParseUnary())
- return new UnaryExprAST(Opc, Operand);
- return 0;
-}
-</pre>
-</div>
-
-<p>The grammar we add is pretty straightforward here. If we see a unary
-operator when parsing a primary operator, we eat the operator as a prefix and
-parse the remaining piece as another unary operator. This allows us to handle
-multiple unary operators (e.g. "!!x"). Note that unary operators can't have
-ambiguous parses like binary operators can, so there is no need for precedence
-information.</p>
-
-<p>The problem with this function, is that we need to call ParseUnary from somewhere.
-To do this, we change previous callers of ParsePrimary to call ParseUnary
-instead:</p>
-
-<div class="doc_code">
-<pre>
-/// binoprhs
-/// ::= ('+' unary)*
-static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
- ...
- <b>// Parse the unary expression after the binary operator.
- ExprAST *RHS = ParseUnary();
- if (!RHS) return 0;</b>
- ...
-}
-/// expression
-/// ::= unary binoprhs
-///
-static ExprAST *ParseExpression() {
- <b>ExprAST *LHS = ParseUnary();</b>
- if (!LHS) return 0;
-
- return ParseBinOpRHS(0, LHS);
-}
-</pre>
-</div>
-
-<p>With these two simple changes, we are now able to parse unary operators and build the
-AST for them. Next up, we need to add parser support for prototypes, to parse
-the unary operator prototype. We extend the binary operator code above
-with:</p>
-
-<div class="doc_code">
-<pre>
-/// prototype
-/// ::= id '(' id* ')'
-/// ::= binary LETTER number? (id, id)
-<b>/// ::= unary LETTER (id)</b>
-static PrototypeAST *ParsePrototype() {
- std::string FnName;
-
- unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
- unsigned BinaryPrecedence = 30;
-
- switch (CurTok) {
- default:
- return ErrorP("Expected function name in prototype");
- case tok_identifier:
- FnName = IdentifierStr;
- Kind = 0;
- getNextToken();
- break;
- <b>case tok_unary:
- getNextToken();
- if (!isascii(CurTok))
- return ErrorP("Expected unary operator");
- FnName = "unary";
- FnName += (char)CurTok;
- Kind = 1;
- getNextToken();
- break;</b>
- case tok_binary:
- ...
-</pre>
-</div>
-
-<p>As with binary operators, we name unary operators with a name that includes
-the operator character. This assists us at code generation time. Speaking of,
-the final piece we need to add is codegen support for unary operators. It looks
-like this:</p>
-
-<div class="doc_code">
-<pre>
-Value *UnaryExprAST::Codegen() {
- Value *OperandV = Operand->Codegen();
- if (OperandV == 0) return 0;
-
- Function *F = TheModule->getFunction(std::string("unary")+Opcode);
- if (F == 0)
- return ErrorV("Unknown unary operator");
-
- return Builder.CreateCall(F, OperandV, "unop");
-}
-</pre>
-</div>
-
-<p>This code is similar to, but simpler than, the code for binary operators. It
-is simpler primarily because it doesn't need to handle any predefined operators.
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="example">Kicking the Tires</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>It is somewhat hard to believe, but with a few simple extensions we've
-covered in the last chapters, we have grown a real-ish language. With this, we
-can do a lot of interesting things, including I/O, math, and a bunch of other
-things. For example, we can now add a nice sequencing operator (printd is
-defined to print out the specified value and a newline):</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>extern printd(x);</b>
-Read extern: declare double @printd(double)
-ready&gt; <b>def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.</b>
-..
-ready&gt; <b>printd(123) : printd(456) : printd(789);</b>
-123.000000
-456.000000
-789.000000
-Evaluated to 0.000000
-</pre>
-</div>
-
-<p>We can also define a bunch of other "primitive" operations, such as:</p>
-
-<div class="doc_code">
-<pre>
-# Logical unary not.
-def unary!(v)
- if v then
- 0
- else
- 1;
-
-# Unary negate.
-def unary-(v)
- 0-v;
-
-# Define &gt; with the same precedence as &gt;.
-def binary&gt; 10 (LHS RHS)
- RHS &lt; LHS;
-
-# Binary logical or, which does not short circuit.
-def binary| 5 (LHS RHS)
- if LHS then
- 1
- else if RHS then
- 1
- else
- 0;
-
-# Binary logical and, which does not short circuit.
-def binary&amp; 6 (LHS RHS)
- if !LHS then
- 0
- else
- !!RHS;
-
-# Define = with slightly lower precedence than relationals.
-def binary = 9 (LHS RHS)
- !(LHS &lt; RHS | LHS &gt; RHS);
-
-</pre>
-</div>
-
-
-<p>Given the previous if/then/else support, we can also define interesting
-functions for I/O. For example, the following prints out a character whose
-"density" reflects the value passed in: the lower the value, the denser the
-character:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt;
-<b>
-extern putchard(char)
-def printdensity(d)
- if d &gt; 8 then
- putchard(32) # ' '
- else if d &gt; 4 then
- putchard(46) # '.'
- else if d &gt; 2 then
- putchard(43) # '+'
- else
- putchard(42); # '*'</b>
-...
-ready&gt; <b>printdensity(1): printdensity(2): printdensity(3) :
- printdensity(4): printdensity(5): printdensity(9): putchard(10);</b>
-*++..
-Evaluated to 0.000000
-</pre>
-</div>
-
-<p>Based on these simple primitive operations, we can start to define more
-interesting things. For example, here's a little function that solves for the
-number of iterations it takes a function in the complex plane to
-converge:</p>
-
-<div class="doc_code">
-<pre>
-# determine whether the specific location diverges.
-# Solve for z = z^2 + c in the complex plane.
-def mandleconverger(real imag iters creal cimag)
- if iters &gt; 255 | (real*real + imag*imag &gt; 4) then
- iters
- else
- mandleconverger(real*real - imag*imag + creal,
- 2*real*imag + cimag,
- iters+1, creal, cimag);
-
-# return the number of iterations required for the iteration to escape
-def mandleconverge(real imag)
- mandleconverger(real, imag, 0, real, imag);
-</pre>
-</div>
-
-<p>This "z = z<sup>2</sup> + c" function is a beautiful little creature that is the basis
-for computation of the <a
-href="http://en.wikipedia.org/wiki/Mandelbrot_set">Mandelbrot Set</a>. Our
-<tt>mandelconverge</tt> function returns the number of iterations that it takes
-for a complex orbit to escape, saturating to 255. This is not a very useful
-function by itself, but if you plot its value over a two-dimensional plane,
-you can see the Mandelbrot set. Given that we are limited to using putchard
-here, our amazing graphical output is limited, but we can whip together
-something using the density plotter above:</p>
-
-<div class="doc_code">
-<pre>
-# compute and plot the mandlebrot set with the specified 2 dimensional range
-# info.
-def mandelhelp(xmin xmax xstep ymin ymax ystep)
- for y = ymin, y &lt; ymax, ystep in (
- (for x = xmin, x &lt; xmax, xstep in
- printdensity(mandleconverge(x,y)))
- : putchard(10)
- )
-
-# mandel - This is a convenient helper function for ploting the mandelbrot set
-# from the specified position with the specified Magnification.
-def mandel(realstart imagstart realmag imagmag)
- mandelhelp(realstart, realstart+realmag*78, realmag,
- imagstart, imagstart+imagmag*40, imagmag);
-</pre>
-</div>
-
-<p>Given this, we can try plotting out the mandlebrot set! Lets try it out:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>mandel(-2.3, -1.3, 0.05, 0.07);</b>
-*******************************+++++++++++*************************************
-*************************+++++++++++++++++++++++*******************************
-**********************+++++++++++++++++++++++++++++****************************
-*******************+++++++++++++++++++++.. ...++++++++*************************
-*****************++++++++++++++++++++++.... ...+++++++++***********************
-***************+++++++++++++++++++++++..... ...+++++++++*********************
-**************+++++++++++++++++++++++.... ....+++++++++********************
-*************++++++++++++++++++++++...... .....++++++++*******************
-************+++++++++++++++++++++....... .......+++++++******************
-***********+++++++++++++++++++.... ... .+++++++*****************
-**********+++++++++++++++++....... .+++++++****************
-*********++++++++++++++........... ...+++++++***************
-********++++++++++++............ ...++++++++**************
-********++++++++++... .......... .++++++++**************
-*******+++++++++..... .+++++++++*************
-*******++++++++...... ..+++++++++*************
-*******++++++....... ..+++++++++*************
-*******+++++...... ..+++++++++*************
-*******.... .... ...+++++++++*************
-*******.... . ...+++++++++*************
-*******+++++...... ...+++++++++*************
-*******++++++....... ..+++++++++*************
-*******++++++++...... .+++++++++*************
-*******+++++++++..... ..+++++++++*************
-********++++++++++... .......... .++++++++**************
-********++++++++++++............ ...++++++++**************
-*********++++++++++++++.......... ...+++++++***************
-**********++++++++++++++++........ .+++++++****************
-**********++++++++++++++++++++.... ... ..+++++++****************
-***********++++++++++++++++++++++....... .......++++++++*****************
-************+++++++++++++++++++++++...... ......++++++++******************
-**************+++++++++++++++++++++++.... ....++++++++********************
-***************+++++++++++++++++++++++..... ...+++++++++*********************
-*****************++++++++++++++++++++++.... ...++++++++***********************
-*******************+++++++++++++++++++++......++++++++*************************
-*********************++++++++++++++++++++++.++++++++***************************
-*************************+++++++++++++++++++++++*******************************
-******************************+++++++++++++************************************
-*******************************************************************************
-*******************************************************************************
-*******************************************************************************
-Evaluated to 0.000000
-ready&gt; <b>mandel(-2, -1, 0.02, 0.04);</b>
-**************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
-***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-*********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
-*******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
-*****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
-***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
-**************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
-************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
-***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
-**********++++++++++++++++++++++++++++++++++++++++++++++.............
-********+++++++++++++++++++++++++++++++++++++++++++..................
-*******+++++++++++++++++++++++++++++++++++++++.......................
-******+++++++++++++++++++++++++++++++++++...........................
-*****++++++++++++++++++++++++++++++++............................
-*****++++++++++++++++++++++++++++...............................
-****++++++++++++++++++++++++++...... .........................
-***++++++++++++++++++++++++......... ...... ...........
-***++++++++++++++++++++++............
-**+++++++++++++++++++++..............
-**+++++++++++++++++++................
-*++++++++++++++++++.................
-*++++++++++++++++............ ...
-*++++++++++++++..............
-*+++....++++................
-*.......... ...........
-*
-*.......... ...........
-*+++....++++................
-*++++++++++++++..............
-*++++++++++++++++............ ...
-*++++++++++++++++++.................
-**+++++++++++++++++++................
-**+++++++++++++++++++++..............
-***++++++++++++++++++++++............
-***++++++++++++++++++++++++......... ...... ...........
-****++++++++++++++++++++++++++...... .........................
-*****++++++++++++++++++++++++++++...............................
-*****++++++++++++++++++++++++++++++++............................
-******+++++++++++++++++++++++++++++++++++...........................
-*******+++++++++++++++++++++++++++++++++++++++.......................
-********+++++++++++++++++++++++++++++++++++++++++++..................
-Evaluated to 0.000000
-ready&gt; <b>mandel(-0.9, -1.4, 0.02, 0.03);</b>
-*******************************************************************************
-*******************************************************************************
-*******************************************************************************
-**********+++++++++++++++++++++************************************************
-*+++++++++++++++++++++++++++++++++++++++***************************************
-+++++++++++++++++++++++++++++++++++++++++++++**********************************
-++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
-++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
-+++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
-+++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
-+++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
-+++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
-++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
-+++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
-++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
-++++++++++++++++++++++++............. .......++++++++++++++++++++++******
-+++++++++++++++++++++++............. ........+++++++++++++++++++++++****
-++++++++++++++++++++++........... ..........++++++++++++++++++++++***
-++++++++++++++++++++........... .........++++++++++++++++++++++*
-++++++++++++++++++............ ...........++++++++++++++++++++
-++++++++++++++++............... .............++++++++++++++++++
-++++++++++++++................. ...............++++++++++++++++
-++++++++++++.................. .................++++++++++++++
-+++++++++.................. .................+++++++++++++
-++++++........ . ......... ..++++++++++++
-++............ ...... ....++++++++++
-.............. ...++++++++++
-.............. ....+++++++++
-.............. .....++++++++
-............. ......++++++++
-........... .......++++++++
-......... ........+++++++
-......... ........+++++++
-......... ....+++++++
-........ ...+++++++
-....... ...+++++++
- ....+++++++
- .....+++++++
- ....+++++++
- ....+++++++
- ....+++++++
-Evaluated to 0.000000
-ready&gt; <b>^D</b>
-</pre>
-</div>
-
-<p>At this point, you may be starting to realize that Kaleidoscope is a real
-and powerful language. It may not be self-similar :), but it can be used to
-plot things that are!</p>
-
-<p>With this, we conclude the "adding user-defined operators" chapter of the
-tutorial. We have successfully augmented our language, adding the ability to extend the
-language in the library, and we have shown how this can be used to build a simple but
-interesting end-user application in Kaleidoscope. At this point, Kaleidoscope
-can build a variety of applications that are functional and can call functions
-with side-effects, but it can't actually define and mutate a variable itself.
-</p>
-
-<p>Strikingly, variable mutation is an important feature of some
-languages, and it is not at all obvious how to <a href="LangImpl7.html">add
-support for mutable variables</a> without having to add an "SSA construction"
-phase to your front-end. In the next chapter, we will describe how you can
-add variable mutation without building SSA in your front-end.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for our running example, enhanced with the
-if/then/else and for expressions.. To build this example, use:
-</p>
-
-<div class="doc_code">
-<pre>
- # Compile
- g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
- # Run
- ./toy
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<div class="doc_code">
-<pre>
-#include "llvm/DerivedTypes.h"
-#include "llvm/ExecutionEngine/ExecutionEngine.h"
-#include "llvm/ExecutionEngine/JIT.h"
-#include "llvm/LLVMContext.h"
-#include "llvm/Module.h"
-#include "llvm/PassManager.h"
-#include "llvm/Analysis/Verifier.h"
-#include "llvm/Target/TargetData.h"
-#include "llvm/Target/TargetSelect.h"
-#include "llvm/Transforms/Scalar.h"
-#include "llvm/Support/IRBuilder.h"
-#include &lt;cstdio&gt;
-#include &lt;string&gt;
-#include &lt;map&gt;
-#include &lt;vector&gt;
-using namespace llvm;
-
-//===----------------------------------------------------------------------===//
-// Lexer
-//===----------------------------------------------------------------------===//
-
-// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
-// of these for known things.
-enum Token {
- tok_eof = -1,
-
- // commands
- tok_def = -2, tok_extern = -3,
-
- // primary
- tok_identifier = -4, tok_number = -5,
-
- // control
- tok_if = -6, tok_then = -7, tok_else = -8,
- tok_for = -9, tok_in = -10,
-
- // operators
- tok_binary = -11, tok_unary = -12
-};
-
-static std::string IdentifierStr; // Filled in if tok_identifier
-static double NumVal; // Filled in if tok_number
-
-/// gettok - Return the next token from standard input.
-static int gettok() {
- static int LastChar = ' ';
-
- // Skip any whitespace.
- while (isspace(LastChar))
- LastChar = getchar();
-
- if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
- IdentifierStr = LastChar;
- while (isalnum((LastChar = getchar())))
- IdentifierStr += LastChar;
-
- if (IdentifierStr == "def") return tok_def;
- if (IdentifierStr == "extern") return tok_extern;
- if (IdentifierStr == "if") return tok_if;
- if (IdentifierStr == "then") return tok_then;
- if (IdentifierStr == "else") return tok_else;
- if (IdentifierStr == "for") return tok_for;
- if (IdentifierStr == "in") return tok_in;
- if (IdentifierStr == "binary") return tok_binary;
- if (IdentifierStr == "unary") return tok_unary;
- return tok_identifier;
- }
-
- if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
- std::string NumStr;
- do {
- NumStr += LastChar;
- LastChar = getchar();
- } while (isdigit(LastChar) || LastChar == '.');
-
- NumVal = strtod(NumStr.c_str(), 0);
- return tok_number;
- }
-
- if (LastChar == '#') {
- // Comment until end of line.
- do LastChar = getchar();
- while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
-
- if (LastChar != EOF)
- return gettok();
- }
-
- // Check for end of file. Don't eat the EOF.
- if (LastChar == EOF)
- return tok_eof;
-
- // Otherwise, just return the character as its ascii value.
- int ThisChar = LastChar;
- LastChar = getchar();
- return ThisChar;
-}
-
-//===----------------------------------------------------------------------===//
-// Abstract Syntax Tree (aka Parse Tree)
-//===----------------------------------------------------------------------===//
-
-/// ExprAST - Base class for all expression nodes.
-class ExprAST {
-public:
- virtual ~ExprAST() {}
- virtual Value *Codegen() = 0;
-};
-
-/// NumberExprAST - Expression class for numeric literals like "1.0".
-class NumberExprAST : public ExprAST {
- double Val;
-public:
- NumberExprAST(double val) : Val(val) {}
- virtual Value *Codegen();
-};
-
-/// VariableExprAST - Expression class for referencing a variable, like "a".
-class VariableExprAST : public ExprAST {
- std::string Name;
-public:
- VariableExprAST(const std::string &amp;name) : Name(name) {}
- virtual Value *Codegen();
-};
-
-/// UnaryExprAST - Expression class for a unary operator.
-class UnaryExprAST : public ExprAST {
- char Opcode;
- ExprAST *Operand;
-public:
- UnaryExprAST(char opcode, ExprAST *operand)
- : Opcode(opcode), Operand(operand) {}
- virtual Value *Codegen();
-};
-
-/// BinaryExprAST - Expression class for a binary operator.
-class BinaryExprAST : public ExprAST {
- char Op;
- ExprAST *LHS, *RHS;
-public:
- BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
- : Op(op), LHS(lhs), RHS(rhs) {}
- virtual Value *Codegen();
-};
-
-/// CallExprAST - Expression class for function calls.
-class CallExprAST : public ExprAST {
- std::string Callee;
- std::vector&lt;ExprAST*&gt; Args;
-public:
- CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
- : Callee(callee), Args(args) {}
- virtual Value *Codegen();
-};
-
-/// IfExprAST - Expression class for if/then/else.
-class IfExprAST : public ExprAST {
- ExprAST *Cond, *Then, *Else;
-public:
- IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
- : Cond(cond), Then(then), Else(_else) {}
- virtual Value *Codegen();
-};
-
-/// ForExprAST - Expression class for for/in.
-class ForExprAST : public ExprAST {
- std::string VarName;
- ExprAST *Start, *End, *Step, *Body;
-public:
- ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
- ExprAST *step, ExprAST *body)
- : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
- virtual Value *Codegen();
-};
-
-/// PrototypeAST - This class represents the "prototype" for a function,
-/// which captures its name, and its argument names (thus implicitly the number
-/// of arguments the function takes), as well as if it is an operator.
-class PrototypeAST {
- std::string Name;
- std::vector&lt;std::string&gt; Args;
- bool isOperator;
- unsigned Precedence; // Precedence if a binary op.
-public:
- PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
- bool isoperator = false, unsigned prec = 0)
- : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
-
- bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
- bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
-
- char getOperatorName() const {
- assert(isUnaryOp() || isBinaryOp());
- return Name[Name.size()-1];
- }
-
- unsigned getBinaryPrecedence() const { return Precedence; }
-
- Function *Codegen();
-};
-
-/// FunctionAST - This class represents a function definition itself.
-class FunctionAST {
- PrototypeAST *Proto;
- ExprAST *Body;
-public:
- FunctionAST(PrototypeAST *proto, ExprAST *body)
- : Proto(proto), Body(body) {}
-
- Function *Codegen();
-};
-
-//===----------------------------------------------------------------------===//
-// Parser
-//===----------------------------------------------------------------------===//
-
-/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
-/// token the parser is looking at. getNextToken reads another token from the
-/// lexer and updates CurTok with its results.
-static int CurTok;
-static int getNextToken() {
- return CurTok = gettok();
-}
-
-/// BinopPrecedence - This holds the precedence for each binary operator that is
-/// defined.
-static std::map&lt;char, int&gt; BinopPrecedence;
-
-/// GetTokPrecedence - Get the precedence of the pending binary operator token.
-static int GetTokPrecedence() {
- if (!isascii(CurTok))
- return -1;
-
- // Make sure it's a declared binop.
- int TokPrec = BinopPrecedence[CurTok];
- if (TokPrec &lt;= 0) return -1;
- return TokPrec;
-}
-
-/// Error* - These are little helper functions for error handling.
-ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
-PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
-FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
-
-static ExprAST *ParseExpression();
-
-/// identifierexpr
-/// ::= identifier
-/// ::= identifier '(' expression* ')'
-static ExprAST *ParseIdentifierExpr() {
- std::string IdName = IdentifierStr;
-
- getNextToken(); // eat identifier.
-
- if (CurTok != '(') // Simple variable ref.
- return new VariableExprAST(IdName);
-
- // Call.
- getNextToken(); // eat (
- std::vector&lt;ExprAST*&gt; Args;
- if (CurTok != ')') {
- while (1) {
- ExprAST *Arg = ParseExpression();
- if (!Arg) return 0;
- Args.push_back(Arg);
-
- if (CurTok == ')') break;
-
- if (CurTok != ',')
- return Error("Expected ')' or ',' in argument list");
- getNextToken();
- }
- }
-
- // Eat the ')'.
- getNextToken();
-
- return new CallExprAST(IdName, Args);
-}
-
-/// numberexpr ::= number
-static ExprAST *ParseNumberExpr() {
- ExprAST *Result = new NumberExprAST(NumVal);
- getNextToken(); // consume the number
- return Result;
-}
-
-/// parenexpr ::= '(' expression ')'
-static ExprAST *ParseParenExpr() {
- getNextToken(); // eat (.
- ExprAST *V = ParseExpression();
- if (!V) return 0;
-
- if (CurTok != ')')
- return Error("expected ')'");
- getNextToken(); // eat ).
- return V;
-}
-
-/// ifexpr ::= 'if' expression 'then' expression 'else' expression
-static ExprAST *ParseIfExpr() {
- getNextToken(); // eat the if.
-
- // condition.
- ExprAST *Cond = ParseExpression();
- if (!Cond) return 0;
-
- if (CurTok != tok_then)
- return Error("expected then");
- getNextToken(); // eat the then
-
- ExprAST *Then = ParseExpression();
- if (Then == 0) return 0;
-
- if (CurTok != tok_else)
- return Error("expected else");
-
- getNextToken();
-
- ExprAST *Else = ParseExpression();
- if (!Else) return 0;
-
- return new IfExprAST(Cond, Then, Else);
-}
-
-/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
-static ExprAST *ParseForExpr() {
- getNextToken(); // eat the for.
-
- if (CurTok != tok_identifier)
- return Error("expected identifier after for");
-
- std::string IdName = IdentifierStr;
- getNextToken(); // eat identifier.
-
- if (CurTok != '=')
- return Error("expected '=' after for");
- getNextToken(); // eat '='.
-
-
- ExprAST *Start = ParseExpression();
- if (Start == 0) return 0;
- if (CurTok != ',')
- return Error("expected ',' after for start value");
- getNextToken();
-
- ExprAST *End = ParseExpression();
- if (End == 0) return 0;
-
- // The step value is optional.
- ExprAST *Step = 0;
- if (CurTok == ',') {
- getNextToken();
- Step = ParseExpression();
- if (Step == 0) return 0;
- }
-
- if (CurTok != tok_in)
- return Error("expected 'in' after for");
- getNextToken(); // eat 'in'.
-
- ExprAST *Body = ParseExpression();
- if (Body == 0) return 0;
-
- return new ForExprAST(IdName, Start, End, Step, Body);
-}
-
-/// primary
-/// ::= identifierexpr
-/// ::= numberexpr
-/// ::= parenexpr
-/// ::= ifexpr
-/// ::= forexpr
-static ExprAST *ParsePrimary() {
- switch (CurTok) {
- default: return Error("unknown token when expecting an expression");
- case tok_identifier: return ParseIdentifierExpr();
- case tok_number: return ParseNumberExpr();
- case '(': return ParseParenExpr();
- case tok_if: return ParseIfExpr();
- case tok_for: return ParseForExpr();
- }
-}
-
-/// unary
-/// ::= primary
-/// ::= '!' unary
-static ExprAST *ParseUnary() {
- // If the current token is not an operator, it must be a primary expr.
- if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
- return ParsePrimary();
-
- // If this is a unary operator, read it.
- int Opc = CurTok;
- getNextToken();
- if (ExprAST *Operand = ParseUnary())
- return new UnaryExprAST(Opc, Operand);
- return 0;
-}
-
-/// binoprhs
-/// ::= ('+' unary)*
-static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
- // If this is a binop, find its precedence.
- while (1) {
- int TokPrec = GetTokPrecedence();
-
- // If this is a binop that binds at least as tightly as the current binop,
- // consume it, otherwise we are done.
- if (TokPrec &lt; ExprPrec)
- return LHS;
-
- // Okay, we know this is a binop.
- int BinOp = CurTok;
- getNextToken(); // eat binop
-
- // Parse the unary expression after the binary operator.
- ExprAST *RHS = ParseUnary();
- if (!RHS) return 0;
-
- // If BinOp binds less tightly with RHS than the operator after RHS, let
- // the pending operator take RHS as its LHS.
- int NextPrec = GetTokPrecedence();
- if (TokPrec &lt; NextPrec) {
- RHS = ParseBinOpRHS(TokPrec+1, RHS);
- if (RHS == 0) return 0;
- }
-
- // Merge LHS/RHS.
- LHS = new BinaryExprAST(BinOp, LHS, RHS);
- }
-}
-
-/// expression
-/// ::= unary binoprhs
-///
-static ExprAST *ParseExpression() {
- ExprAST *LHS = ParseUnary();
- if (!LHS) return 0;
-
- return ParseBinOpRHS(0, LHS);
-}
-
-/// prototype
-/// ::= id '(' id* ')'
-/// ::= binary LETTER number? (id, id)
-/// ::= unary LETTER (id)
-static PrototypeAST *ParsePrototype() {
- std::string FnName;
-
- unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
- unsigned BinaryPrecedence = 30;
-
- switch (CurTok) {
- default:
- return ErrorP("Expected function name in prototype");
- case tok_identifier:
- FnName = IdentifierStr;
- Kind = 0;
- getNextToken();
- break;
- case tok_unary:
- getNextToken();
- if (!isascii(CurTok))
- return ErrorP("Expected unary operator");
- FnName = "unary";
- FnName += (char)CurTok;
- Kind = 1;
- getNextToken();
- break;
- case tok_binary:
- getNextToken();
- if (!isascii(CurTok))
- return ErrorP("Expected binary operator");
- FnName = "binary";
- FnName += (char)CurTok;
- Kind = 2;
- getNextToken();
-
- // Read the precedence if present.
- if (CurTok == tok_number) {
- if (NumVal &lt; 1 || NumVal &gt; 100)
- return ErrorP("Invalid precedecnce: must be 1..100");
- BinaryPrecedence = (unsigned)NumVal;
- getNextToken();
- }
- break;
- }
-
- if (CurTok != '(')
- return ErrorP("Expected '(' in prototype");
-
- std::vector&lt;std::string&gt; ArgNames;
- while (getNextToken() == tok_identifier)
- ArgNames.push_back(IdentifierStr);
- if (CurTok != ')')
- return ErrorP("Expected ')' in prototype");
-
- // success.
- getNextToken(); // eat ')'.
-
- // Verify right number of names for operator.
- if (Kind &amp;&amp; ArgNames.size() != Kind)
- return ErrorP("Invalid number of operands for operator");
-
- return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
-}
-
-/// definition ::= 'def' prototype expression
-static FunctionAST *ParseDefinition() {
- getNextToken(); // eat def.
- PrototypeAST *Proto = ParsePrototype();
- if (Proto == 0) return 0;
-
- if (ExprAST *E = ParseExpression())
- return new FunctionAST(Proto, E);
- return 0;
-}
-
-/// toplevelexpr ::= expression
-static FunctionAST *ParseTopLevelExpr() {
- if (ExprAST *E = ParseExpression()) {
- // Make an anonymous proto.
- PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
- return new FunctionAST(Proto, E);
- }
- return 0;
-}
-
-/// external ::= 'extern' prototype
-static PrototypeAST *ParseExtern() {
- getNextToken(); // eat extern.
- return ParsePrototype();
-}
-
-//===----------------------------------------------------------------------===//
-// Code Generation
-//===----------------------------------------------------------------------===//
-
-static Module *TheModule;
-static IRBuilder&lt;&gt; Builder(getGlobalContext());
-static std::map&lt;std::string, Value*&gt; NamedValues;
-static FunctionPassManager *TheFPM;
-
-Value *ErrorV(const char *Str) { Error(Str); return 0; }
-
-Value *NumberExprAST::Codegen() {
- return ConstantFP::get(getGlobalContext(), APFloat(Val));
-}
-
-Value *VariableExprAST::Codegen() {
- // Look this variable up in the function.
- Value *V = NamedValues[Name];
- return V ? V : ErrorV("Unknown variable name");
-}
-
-Value *UnaryExprAST::Codegen() {
- Value *OperandV = Operand-&gt;Codegen();
- if (OperandV == 0) return 0;
-
- Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
- if (F == 0)
- return ErrorV("Unknown unary operator");
-
- return Builder.CreateCall(F, OperandV, "unop");
-}
-
-Value *BinaryExprAST::Codegen() {
- Value *L = LHS-&gt;Codegen();
- Value *R = RHS-&gt;Codegen();
- if (L == 0 || R == 0) return 0;
-
- switch (Op) {
- case '+': return Builder.CreateAdd(L, R, "addtmp");
- case '-': return Builder.CreateSub(L, R, "subtmp");
- case '*': return Builder.CreateMul(L, R, "multmp");
- case '&lt;':
- L = Builder.CreateFCmpULT(L, R, "cmptmp");
- // Convert bool 0/1 to double 0.0 or 1.0
- return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
- "booltmp");
- default: break;
- }
-
- // If it wasn't a builtin binary operator, it must be a user defined one. Emit
- // a call to it.
- Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
- assert(F &amp;&amp; "binary operator not found!");
-
- Value *Ops[] = { L, R };
- return Builder.CreateCall(F, Ops, Ops+2, "binop");
-}
-
-Value *CallExprAST::Codegen() {
- // Look up the name in the global module table.
- Function *CalleeF = TheModule-&gt;getFunction(Callee);
- if (CalleeF == 0)
- return ErrorV("Unknown function referenced");
-
- // If argument mismatch error.
- if (CalleeF-&gt;arg_size() != Args.size())
- return ErrorV("Incorrect # arguments passed");
-
- std::vector&lt;Value*&gt; ArgsV;
- for (unsigned i = 0, e = Args.size(); i != e; ++i) {
- ArgsV.push_back(Args[i]-&gt;Codegen());
- if (ArgsV.back() == 0) return 0;
- }
-
- return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
-}
-
-Value *IfExprAST::Codegen() {
- Value *CondV = Cond-&gt;Codegen();
- if (CondV == 0) return 0;
-
- // Convert condition to a bool by comparing equal to 0.0.
- CondV = Builder.CreateFCmpONE(CondV,
- ConstantFP::get(getGlobalContext(), APFloat(0.0)),
- "ifcond");
-
- Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
-
- // Create blocks for the then and else cases. Insert the 'then' block at the
- // end of the function.
- BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
- BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
- BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
-
- Builder.CreateCondBr(CondV, ThenBB, ElseBB);
-
- // Emit then value.
- Builder.SetInsertPoint(ThenBB);
-
- Value *ThenV = Then-&gt;Codegen();
- if (ThenV == 0) return 0;
-
- Builder.CreateBr(MergeBB);
- // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
- ThenBB = Builder.GetInsertBlock();
-
- // Emit else block.
- TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
- Builder.SetInsertPoint(ElseBB);
-
- Value *ElseV = Else-&gt;Codegen();
- if (ElseV == 0) return 0;
-
- Builder.CreateBr(MergeBB);
- // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
- ElseBB = Builder.GetInsertBlock();
-
- // Emit merge block.
- TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
- Builder.SetInsertPoint(MergeBB);
- PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()),
- "iftmp");
-
- PN-&gt;addIncoming(ThenV, ThenBB);
- PN-&gt;addIncoming(ElseV, ElseBB);
- return PN;
-}
-
-Value *ForExprAST::Codegen() {
- // Output this as:
- // ...
- // start = startexpr
- // goto loop
- // loop:
- // variable = phi [start, loopheader], [nextvariable, loopend]
- // ...
- // bodyexpr
- // ...
- // loopend:
- // step = stepexpr
- // nextvariable = variable + step
- // endcond = endexpr
- // br endcond, loop, endloop
- // outloop:
-
- // Emit the start code first, without 'variable' in scope.
- Value *StartVal = Start-&gt;Codegen();
- if (StartVal == 0) return 0;
-
- // Make the new basic block for the loop header, inserting after current
- // block.
- Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
- BasicBlock *PreheaderBB = Builder.GetInsertBlock();
- BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
-
- // Insert an explicit fall through from the current block to the LoopBB.
- Builder.CreateBr(LoopBB);
-
- // Start insertion in LoopBB.
- Builder.SetInsertPoint(LoopBB);
-
- // Start the PHI node with an entry for Start.
- PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), VarName.c_str());
- Variable-&gt;addIncoming(StartVal, PreheaderBB);
-
- // Within the loop, the variable is defined equal to the PHI node. If it
- // shadows an existing variable, we have to restore it, so save it now.
- Value *OldVal = NamedValues[VarName];
- NamedValues[VarName] = Variable;
-
- // Emit the body of the loop. This, like any other expr, can change the
- // current BB. Note that we ignore the value computed by the body, but don't
- // allow an error.
- if (Body-&gt;Codegen() == 0)
- return 0;
-
- // Emit the step value.
- Value *StepVal;
- if (Step) {
- StepVal = Step-&gt;Codegen();
- if (StepVal == 0) return 0;
- } else {
- // If not specified, use 1.0.
- StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
- }
-
- Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
-
- // Compute the end condition.
- Value *EndCond = End-&gt;Codegen();
- if (EndCond == 0) return EndCond;
-
- // Convert condition to a bool by comparing equal to 0.0.
- EndCond = Builder.CreateFCmpONE(EndCond,
- ConstantFP::get(getGlobalContext(), APFloat(0.0)),
- "loopcond");
-
- // Create the "after loop" block and insert it.
- BasicBlock *LoopEndBB = Builder.GetInsertBlock();
- BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
-
- // Insert the conditional branch into the end of LoopEndBB.
- Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
-
- // Any new code will be inserted in AfterBB.
- Builder.SetInsertPoint(AfterBB);
-
- // Add a new entry to the PHI node for the backedge.
- Variable-&gt;addIncoming(NextVar, LoopEndBB);
-
- // Restore the unshadowed variable.
- if (OldVal)
- NamedValues[VarName] = OldVal;
- else
- NamedValues.erase(VarName);
-
-
- // for expr always returns 0.0.
- return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
-}
-
-Function *PrototypeAST::Codegen() {
- // Make the function type: double(double,double) etc.
- std::vector&lt;const Type*&gt; Doubles(Args.size(),
- Type::getDoubleTy(getGlobalContext()));
- FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
- Doubles, false);
-
- Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
-
- // If F conflicted, there was already something named 'Name'. If it has a
- // body, don't allow redefinition or reextern.
- if (F-&gt;getName() != Name) {
- // Delete the one we just made and get the existing one.
- F-&gt;eraseFromParent();
- F = TheModule-&gt;getFunction(Name);
-
- // If F already has a body, reject this.
- if (!F-&gt;empty()) {
- ErrorF("redefinition of function");
- return 0;
- }
-
- // If F took a different number of args, reject.
- if (F-&gt;arg_size() != Args.size()) {
- ErrorF("redefinition of function with different # args");
- return 0;
- }
- }
-
- // Set names for all arguments.
- unsigned Idx = 0;
- for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
- ++AI, ++Idx) {
- AI-&gt;setName(Args[Idx]);
-
- // Add arguments to variable symbol table.
- NamedValues[Args[Idx]] = AI;
- }
-
- return F;
-}
-
-Function *FunctionAST::Codegen() {
- NamedValues.clear();
-
- Function *TheFunction = Proto-&gt;Codegen();
- if (TheFunction == 0)
- return 0;
-
- // If this is an operator, install it.
- if (Proto-&gt;isBinaryOp())
- BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
-
- // Create a new basic block to start insertion into.
- BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
- Builder.SetInsertPoint(BB);
-
- if (Value *RetVal = Body-&gt;Codegen()) {
- // Finish off the function.
- Builder.CreateRet(RetVal);
-
- // Validate the generated code, checking for consistency.
- verifyFunction(*TheFunction);
-
- // Optimize the function.
- TheFPM-&gt;run(*TheFunction);
-
- return TheFunction;
- }
-
- // Error reading body, remove function.
- TheFunction-&gt;eraseFromParent();
-
- if (Proto-&gt;isBinaryOp())
- BinopPrecedence.erase(Proto-&gt;getOperatorName());
- return 0;
-}
-
-//===----------------------------------------------------------------------===//
-// Top-Level parsing and JIT Driver
-//===----------------------------------------------------------------------===//
-
-static ExecutionEngine *TheExecutionEngine;
-
-static void HandleDefinition() {
- if (FunctionAST *F = ParseDefinition()) {
- if (Function *LF = F-&gt;Codegen()) {
- fprintf(stderr, "Read function definition:");
- LF-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleExtern() {
- if (PrototypeAST *P = ParseExtern()) {
- if (Function *F = P-&gt;Codegen()) {
- fprintf(stderr, "Read extern: ");
- F-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleTopLevelExpression() {
- // Evaluate a top-level expression into an anonymous function.
- if (FunctionAST *F = ParseTopLevelExpr()) {
- if (Function *LF = F-&gt;Codegen()) {
- // JIT the function, returning a function pointer.
- void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
-
- // Cast it to the right type (takes no arguments, returns a double) so we
- // can call it as a native function.
- double (*FP)() = (double (*)())(intptr_t)FPtr;
- fprintf(stderr, "Evaluated to %f\n", FP());
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-/// top ::= definition | external | expression | ';'
-static void MainLoop() {
- while (1) {
- fprintf(stderr, "ready&gt; ");
- switch (CurTok) {
- case tok_eof: return;
- case ';': getNextToken(); break; // ignore top-level semicolons.
- case tok_def: HandleDefinition(); break;
- case tok_extern: HandleExtern(); break;
- default: HandleTopLevelExpression(); break;
- }
- }
-}
-
-//===----------------------------------------------------------------------===//
-// "Library" functions that can be "extern'd" from user code.
-//===----------------------------------------------------------------------===//
-
-/// putchard - putchar that takes a double and returns 0.
-extern "C"
-double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-
-/// printd - printf that takes a double prints it as "%f\n", returning 0.
-extern "C"
-double printd(double X) {
- printf("%f\n", X);
- return 0;
-}
-
-//===----------------------------------------------------------------------===//
-// Main driver code.
-//===----------------------------------------------------------------------===//
-
-int main() {
- InitializeNativeTarget();
- LLVMContext &amp;Context = getGlobalContext();
-
- // Install standard binary operators.
- // 1 is lowest precedence.
- BinopPrecedence['&lt;'] = 10;
- BinopPrecedence['+'] = 20;
- BinopPrecedence['-'] = 20;
- BinopPrecedence['*'] = 40; // highest.
-
- // Prime the first token.
- fprintf(stderr, "ready&gt; ");
- getNextToken();
-
- // Make the module, which holds all the code.
- TheModule = new Module("my cool jit", Context);
-
- // Create the JIT. This takes ownership of the module.
- std::string ErrStr;
- TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create();
- if (!TheExecutionEngine) {
- fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
- exit(1);
- }
-
- FunctionPassManager OurFPM(TheModule);
-
- // Set up the optimizer pipeline. Start with registering info about how the
- // target lays out data structures.
- OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
- // Do simple "peephole" optimizations and bit-twiddling optzns.
- OurFPM.add(createInstructionCombiningPass());
- // Reassociate expressions.
- OurFPM.add(createReassociatePass());
- // Eliminate Common SubExpressions.
- OurFPM.add(createGVNPass());
- // Simplify the control flow graph (deleting unreachable blocks, etc).
- OurFPM.add(createCFGSimplificationPass());
-
- OurFPM.doInitialization();
-
- // Set the global so the code gen can use this.
- TheFPM = &amp;OurFPM;
-
- // Run the main "interpreter loop" now.
- MainLoop();
-
- TheFPM = 0;
-
- // Print out all of the generated code.
- TheModule-&gt;dump();
-
- return 0;
-}
-</pre>
-</div>
-
-<a href="LangImpl7.html">Next: Extending the language: mutable variables / SSA construction</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/LangImpl7.html b/docs/tutorial/LangImpl7.html
deleted file mode 100644
index 0b46ba5..0000000
--- a/docs/tutorial/LangImpl7.html
+++ /dev/null
@@ -1,2164 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
- construction</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 7
- <ol>
- <li><a href="#intro">Chapter 7 Introduction</a></li>
- <li><a href="#why">Why is this a hard problem?</a></li>
- <li><a href="#memory">Memory in LLVM</a></li>
- <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
- <li><a href="#adjustments">Adjusting Existing Variables for
- Mutation</a></li>
- <li><a href="#assignment">New Assignment Operator</a></li>
- <li><a href="#localvars">User-defined Local Variables</a></li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
- tidbits</li>
-</ul>
-
-<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very
-respectable, albeit simple, <a
-href="http://en.wikipedia.org/wiki/Functional_programming">functional
-programming language</a>. In our journey, we learned some parsing techniques,
-how to build and represent an AST, how to build LLVM IR, and how to optimize
-the resultant code as well as JIT compile it.</p>
-
-<p>While Kaleidoscope is interesting as a functional language, the fact that it
-is functional makes it "too easy" to generate LLVM IR for it. In particular, a
-functional language makes it very easy to build LLVM IR directly in <a
-href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
-Since LLVM requires that the input code be in SSA form, this is a very nice
-property and it is often unclear to newcomers how to generate code for an
-imperative language with mutable variables.</p>
-
-<p>The short (and happy) summary of this chapter is that there is no need for
-your front-end to build SSA form: LLVM provides highly tuned and well tested
-support for this, though the way it works is a bit unexpected for some.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-To understand why mutable variables cause complexities in SSA construction,
-consider this extremely simple C example:
-</p>
-
-<div class="doc_code">
-<pre>
-int G, H;
-int test(_Bool Condition) {
- int X;
- if (Condition)
- X = G;
- else
- X = H;
- return X;
-}
-</pre>
-</div>
-
-<p>In this case, we have the variable "X", whose value depends on the path
-executed in the program. Because there are two different possible values for X
-before the return instruction, a PHI node is inserted to merge the two values.
-The LLVM IR that we want for this example looks like this:</p>
-
-<div class="doc_code">
-<pre>
-@G = weak global i32 0 ; type of @G is i32*
-@H = weak global i32 0 ; type of @H is i32*
-
-define i32 @test(i1 %Condition) {
-entry:
- br i1 %Condition, label %cond_true, label %cond_false
-
-cond_true:
- %X.0 = load i32* @G
- br label %cond_next
-
-cond_false:
- %X.1 = load i32* @H
- br label %cond_next
-
-cond_next:
- %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
- ret i32 %X.2
-}
-</pre>
-</div>
-
-<p>In this example, the loads from the G and H global variables are explicit in
-the LLVM IR, and they live in the then/else branches of the if statement
-(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
-in the cond_next block selects the right value to use based on where control
-flow is coming from: if control flow comes from the cond_false block, X.2 gets
-the value of X.1. Alternatively, if control flow comes from cond_true, it gets
-the value of X.0. The intent of this chapter is not to explain the details of
-SSA form. For more information, see one of the many <a
-href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
-references</a>.</p>
-
-<p>The question for this article is "who places the phi nodes when lowering
-assignments to mutable variables?". The issue here is that LLVM
-<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
-However, SSA construction requires non-trivial algorithms and data structures,
-so it is inconvenient and wasteful for every front-end to have to reproduce this
-logic.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>The 'trick' here is that while LLVM does require all register values to be
-in SSA form, it does not require (or permit) memory objects to be in SSA form.
-In the example above, note that the loads from G and H are direct accesses to
-G and H: they are not renamed or versioned. This differs from some other
-compiler systems, which do try to version memory objects. In LLVM, instead of
-encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
-href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
-demand.</p>
-
-<p>
-With this in mind, the high-level idea is that we want to make a stack variable
-(which lives in memory, because it is on the stack) for each mutable object in
-a function. To take advantage of this trick, we need to talk about how LLVM
-represents stack variables.
-</p>
-
-<p>In LLVM, all memory accesses are explicit with load/store instructions, and
-it is carefully designed not to have (or need) an "address-of" operator. Notice
-how the type of the @G/@H global variables is actually "i32*" even though the
-variable is defined as "i32". What this means is that @G defines <em>space</em>
-for an i32 in the global data area, but its <em>name</em> actually refers to the
-address for that space. Stack variables work the same way, except that instead of
-being declared with global variable definitions, they are declared with the
-<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
-
-<div class="doc_code">
-<pre>
-define i32 @example() {
-entry:
- %X = alloca i32 ; type of %X is i32*.
- ...
- %tmp = load i32* %X ; load the stack value %X from the stack.
- %tmp2 = add i32 %tmp, 1 ; increment it
- store i32 %tmp2, i32* %X ; store it back
- ...
-</pre>
-</div>
-
-<p>This code shows an example of how you can declare and manipulate a stack
-variable in the LLVM IR. Stack memory allocated with the alloca instruction is
-fully general: you can pass the address of the stack slot to functions, you can
-store it in other variables, etc. In our example above, we could rewrite the
-example to use the alloca technique to avoid using a PHI node:</p>
-
-<div class="doc_code">
-<pre>
-@G = weak global i32 0 ; type of @G is i32*
-@H = weak global i32 0 ; type of @H is i32*
-
-define i32 @test(i1 %Condition) {
-entry:
- %X = alloca i32 ; type of %X is i32*.
- br i1 %Condition, label %cond_true, label %cond_false
-
-cond_true:
- %X.0 = load i32* @G
- store i32 %X.0, i32* %X ; Update X
- br label %cond_next
-
-cond_false:
- %X.1 = load i32* @H
- store i32 %X.1, i32* %X ; Update X
- br label %cond_next
-
-cond_next:
- %X.2 = load i32* %X ; Read X
- ret i32 %X.2
-}
-</pre>
-</div>
-
-<p>With this, we have discovered a way to handle arbitrary mutable variables
-without the need to create Phi nodes at all:</p>
-
-<ol>
-<li>Each mutable variable becomes a stack allocation.</li>
-<li>Each read of the variable becomes a load from the stack.</li>
-<li>Each update of the variable becomes a store to the stack.</li>
-<li>Taking the address of a variable just uses the stack address directly.</li>
-</ol>
-
-<p>While this solution has solved our immediate problem, it introduced another
-one: we have now apparently introduced a lot of stack traffic for very simple
-and common operations, a major performance problem. Fortunately for us, the
-LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
-this case, promoting allocas like this into SSA registers, inserting Phi nodes
-as appropriate. If you run this example through the pass, for example, you'll
-get:</p>
-
-<div class="doc_code">
-<pre>
-$ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
-@G = weak global i32 0
-@H = weak global i32 0
-
-define i32 @test(i1 %Condition) {
-entry:
- br i1 %Condition, label %cond_true, label %cond_false
-
-cond_true:
- %X.0 = load i32* @G
- br label %cond_next
-
-cond_false:
- %X.1 = load i32* @H
- br label %cond_next
-
-cond_next:
- %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
- ret i32 %X.01
-}
-</pre>
-</div>
-
-<p>The mem2reg pass implements the standard "iterated dominance frontier"
-algorithm for constructing SSA form and has a number of optimizations that speed
-up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing
-with mutable variables, and we highly recommend that you depend on it. Note that
-mem2reg only works on variables in certain circumstances:</p>
-
-<ol>
-<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
-promotes them. It does not apply to global variables or heap allocations.</li>
-
-<li>mem2reg only looks for alloca instructions in the entry block of the
-function. Being in the entry block guarantees that the alloca is only executed
-once, which makes analysis simpler.</li>
-
-<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
-the address of the stack object is passed to a function, or if any funny pointer
-arithmetic is involved, the alloca will not be promoted.</li>
-
-<li>mem2reg only works on allocas of <a
-href="../LangRef.html#t_classifications">first class</a>
-values (such as pointers, scalars and vectors), and only if the array size
-of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
-promoting structs or arrays to registers. Note that the "scalarrepl" pass is
-more powerful and can promote structs, "unions", and arrays in many cases.</li>
-
-</ol>
-
-<p>
-All of these properties are easy to satisfy for most imperative languages, and
-we'll illustrate it below with Kaleidoscope. The final question you may be
-asking is: should I bother with this nonsense for my front-end? Wouldn't it be
-better if I just did SSA construction directly, avoiding use of the mem2reg
-optimization pass? In short, we strongly recommend that you use this technique
-for building SSA form, unless there is an extremely good reason not to. Using
-this technique is:</p>
-
-<ul>
-<li>Proven and well tested: llvm-gcc and clang both use this technique for local
-mutable variables. As such, the most common clients of LLVM are using this to
-handle a bulk of their variables. You can be sure that bugs are found fast and
-fixed early.</li>
-
-<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
-common cases as well as fully general. For example, it has fast-paths for
-variables that are only used in a single block, variables that only have one
-assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
-</li>
-
-<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
-Debug information in LLVM</a> relies on having the address of the variable
-exposed so that debug info can be attached to it. This technique dovetails
-very naturally with this style of debug info.</li>
-</ul>
-
-<p>If nothing else, this makes it much easier to get your front-end up and
-running, and is very simple to implement. Lets extend Kaleidoscope with mutable
-variables now!
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="kalvars">Mutable Variables in
-Kaleidoscope</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Now that we know the sort of problem we want to tackle, lets see what this
-looks like in the context of our little Kaleidoscope language. We're going to
-add two features:</p>
-
-<ol>
-<li>The ability to mutate variables with the '=' operator.</li>
-<li>The ability to define new variables.</li>
-</ol>
-
-<p>While the first item is really what this is about, we only have variables
-for incoming arguments as well as for induction variables, and redefining those only
-goes so far :). Also, the ability to define new variables is a
-useful thing regardless of whether you will be mutating them. Here's a
-motivating example that shows how we could use these:</p>
-
-<div class="doc_code">
-<pre>
-# Define ':' for sequencing: as a low-precedence operator that ignores operands
-# and just returns the RHS.
-def binary : 1 (x y) y;
-
-# Recursive fib, we could do this before.
-def fib(x)
- if (x &lt; 3) then
- 1
- else
- fib(x-1)+fib(x-2);
-
-# Iterative fib.
-def fibi(x)
- <b>var a = 1, b = 1, c in</b>
- (for i = 3, i &lt; x in
- <b>c = a + b</b> :
- <b>a = b</b> :
- <b>b = c</b>) :
- b;
-
-# Call it.
-fibi(10);
-</pre>
-</div>
-
-<p>
-In order to mutate variables, we have to change our existing variables to use
-the "alloca trick". Once we have that, we'll add our new operator, then extend
-Kaleidoscope to support new variable definitions.
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
-Mutation</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-The symbol table in Kaleidoscope is managed at code generation time by the
-'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
-that holds the double value for the named variable. In order to support
-mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
-the <em>memory location</em> of the variable in question. Note that this
-change is a refactoring: it changes the structure of the code, but does not
-(by itself) change the behavior of the compiler. All of these changes are
-isolated in the Kaleidoscope code generator.</p>
-
-<p>
-At this point in Kaleidoscope's development, it only supports variables for two
-things: incoming arguments to functions and the induction variable of 'for'
-loops. For consistency, we'll allow mutation of these variables in addition to
-other user-defined variables. This means that these will both need memory
-locations.
-</p>
-
-<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
-map so that it maps to AllocaInst* instead of Value*. Once we do this, the C++
-compiler will tell us what parts of the code we need to update:</p>
-
-<div class="doc_code">
-<pre>
-static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
-</pre>
-</div>
-
-<p>Also, since we will need to create these alloca's, we'll use a helper
-function that ensures that the allocas are created in the entry block of the
-function:</p>
-
-<div class="doc_code">
-<pre>
-/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
-/// the function. This is used for mutable variables etc.
-static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
- const std::string &amp;VarName) {
- IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
- TheFunction-&gt;getEntryBlock().begin());
- return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
- VarName.c_str());
-}
-</pre>
-</div>
-
-<p>This funny looking code creates an IRBuilder object that is pointing at
-the first instruction (.begin()) of the entry block. It then creates an alloca
-with the expected name and returns it. Because all values in Kaleidoscope are
-doubles, there is no need to pass in a type to use.</p>
-
-<p>With this in place, the first functionality change we want to make is to
-variable references. In our new scheme, variables live on the stack, so code
-generating a reference to them actually needs to produce a load from the stack
-slot:</p>
-
-<div class="doc_code">
-<pre>
-Value *VariableExprAST::Codegen() {
- // Look this variable up in the function.
- Value *V = NamedValues[Name];
- if (V == 0) return ErrorV("Unknown variable name");
-
- <b>// Load the value.
- return Builder.CreateLoad(V, Name.c_str());</b>
-}
-</pre>
-</div>
-
-<p>As you can see, this is pretty straightforward. Now we need to update the
-things that define the variables to set up the alloca. We'll start with
-<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
-the unabridged code):</p>
-
-<div class="doc_code">
-<pre>
- Function *TheFunction = Builder.GetInsertBlock()->getParent();
-
- <b>// Create an alloca for the variable in the entry block.
- AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
-
- // Emit the start code first, without 'variable' in scope.
- Value *StartVal = Start-&gt;Codegen();
- if (StartVal == 0) return 0;
-
- <b>// Store the value into the alloca.
- Builder.CreateStore(StartVal, Alloca);</b>
- ...
-
- // Compute the end condition.
- Value *EndCond = End-&gt;Codegen();
- if (EndCond == 0) return EndCond;
-
- <b>// Reload, increment, and restore the alloca. This handles the case where
- // the body of the loop mutates the variable.
- Value *CurVar = Builder.CreateLoad(Alloca);
- Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
- Builder.CreateStore(NextVar, Alloca);</b>
- ...
-</pre>
-</div>
-
-<p>This code is virtually identical to the code <a
-href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
-big difference is that we no longer have to construct a PHI node, and we use
-load/store to access the variable as needed.</p>
-
-<p>To support mutable argument variables, we need to also make allocas for them.
-The code for this is also pretty simple:</p>
-
-<div class="doc_code">
-<pre>
-/// CreateArgumentAllocas - Create an alloca for each argument and register the
-/// argument in the symbol table so that references to it will succeed.
-void PrototypeAST::CreateArgumentAllocas(Function *F) {
- Function::arg_iterator AI = F-&gt;arg_begin();
- for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
- // Create an alloca for this variable.
- AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
-
- // Store the initial value into the alloca.
- Builder.CreateStore(AI, Alloca);
-
- // Add arguments to variable symbol table.
- NamedValues[Args[Idx]] = Alloca;
- }
-}
-</pre>
-</div>
-
-<p>For each argument, we make an alloca, store the input value to the function
-into the alloca, and register the alloca as the memory location for the
-argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
-it sets up the entry block for the function.</p>
-
-<p>The final missing piece is adding the mem2reg pass, which allows us to get
-good codegen once again:</p>
-
-<div class="doc_code">
-<pre>
- // Set up the optimizer pipeline. Start with registering info about how the
- // target lays out data structures.
- OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
- <b>// Promote allocas to registers.
- OurFPM.add(createPromoteMemoryToRegisterPass());</b>
- // Do simple "peephole" optimizations and bit-twiddling optzns.
- OurFPM.add(createInstructionCombiningPass());
- // Reassociate expressions.
- OurFPM.add(createReassociatePass());
-</pre>
-</div>
-
-<p>It is interesting to see what the code looks like before and after the
-mem2reg optimization runs. For example, this is the before/after code for our
-recursive fib function. Before the optimization:</p>
-
-<div class="doc_code">
-<pre>
-define double @fib(double %x) {
-entry:
- <b>%x1 = alloca double
- store double %x, double* %x1
- %x2 = load double* %x1</b>
- %cmptmp = fcmp ult double %x2, 3.000000e+00
- %booltmp = uitofp i1 %cmptmp to double
- %ifcond = fcmp one double %booltmp, 0.000000e+00
- br i1 %ifcond, label %then, label %else
-
-then: ; preds = %entry
- br label %ifcont
-
-else: ; preds = %entry
- <b>%x3 = load double* %x1</b>
- %subtmp = fsub double %x3, 1.000000e+00
- %calltmp = call double @fib( double %subtmp )
- <b>%x4 = load double* %x1</b>
- %subtmp5 = fsub double %x4, 2.000000e+00
- %calltmp6 = call double @fib( double %subtmp5 )
- %addtmp = fadd double %calltmp, %calltmp6
- br label %ifcont
-
-ifcont: ; preds = %else, %then
- %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
- ret double %iftmp
-}
-</pre>
-</div>
-
-<p>Here there is only one variable (x, the input argument) but you can still
-see the extremely simple-minded code generation strategy we are using. In the
-entry block, an alloca is created, and the initial input value is stored into
-it. Each reference to the variable does a reload from the stack. Also, note
-that we didn't modify the if/then/else expression, so it still inserts a PHI
-node. While we could make an alloca for it, it is actually easier to create a
-PHI node for it, so we still just make the PHI.</p>
-
-<p>Here is the code after the mem2reg pass runs:</p>
-
-<div class="doc_code">
-<pre>
-define double @fib(double %x) {
-entry:
- %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
- %booltmp = uitofp i1 %cmptmp to double
- %ifcond = fcmp one double %booltmp, 0.000000e+00
- br i1 %ifcond, label %then, label %else
-
-then:
- br label %ifcont
-
-else:
- %subtmp = fsub double <b>%x</b>, 1.000000e+00
- %calltmp = call double @fib( double %subtmp )
- %subtmp5 = fsub double <b>%x</b>, 2.000000e+00
- %calltmp6 = call double @fib( double %subtmp5 )
- %addtmp = fadd double %calltmp, %calltmp6
- br label %ifcont
-
-ifcont: ; preds = %else, %then
- %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
- ret double %iftmp
-}
-</pre>
-</div>
-
-<p>This is a trivial case for mem2reg, since there are no redefinitions of the
-variable. The point of showing this is to calm your tension about inserting
-such blatent inefficiencies :).</p>
-
-<p>After the rest of the optimizers run, we get:</p>
-
-<div class="doc_code">
-<pre>
-define double @fib(double %x) {
-entry:
- %cmptmp = fcmp ult double %x, 3.000000e+00
- %booltmp = uitofp i1 %cmptmp to double
- %ifcond = fcmp ueq double %booltmp, 0.000000e+00
- br i1 %ifcond, label %else, label %ifcont
-
-else:
- %subtmp = fsub double %x, 1.000000e+00
- %calltmp = call double @fib( double %subtmp )
- %subtmp5 = fsub double %x, 2.000000e+00
- %calltmp6 = call double @fib( double %subtmp5 )
- %addtmp = fadd double %calltmp, %calltmp6
- ret double %addtmp
-
-ifcont:
- ret double 1.000000e+00
-}
-</pre>
-</div>
-
-<p>Here we see that the simplifycfg pass decided to clone the return instruction
-into the end of the 'else' block. This allowed it to eliminate some branches
-and the PHI node.</p>
-
-<p>Now that all symbol table references are updated to use stack variables,
-we'll add the assignment operator.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>With our current framework, adding a new assignment operator is really
-simple. We will parse it just like any other binary operator, but handle it
-internally (instead of allowing the user to define it). The first step is to
-set a precedence:</p>
-
-<div class="doc_code">
-<pre>
- int main() {
- // Install standard binary operators.
- // 1 is lowest precedence.
- <b>BinopPrecedence['='] = 2;</b>
- BinopPrecedence['&lt;'] = 10;
- BinopPrecedence['+'] = 20;
- BinopPrecedence['-'] = 20;
-</pre>
-</div>
-
-<p>Now that the parser knows the precedence of the binary operator, it takes
-care of all the parsing and AST generation. We just need to implement codegen
-for the assignment operator. This looks like:</p>
-
-<div class="doc_code">
-<pre>
-Value *BinaryExprAST::Codegen() {
- // Special case '=' because we don't want to emit the LHS as an expression.
- if (Op == '=') {
- // Assignment requires the LHS to be an identifier.
- VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
- if (!LHSE)
- return ErrorV("destination of '=' must be a variable");
-</pre>
-</div>
-
-<p>Unlike the rest of the binary operators, our assignment operator doesn't
-follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
-as a special case before the other binary operators are handled. The other
-strange thing is that it requires the LHS to be a variable. It is invalid to
-have "(x+1) = expr" - only things like "x = expr" are allowed.
-</p>
-
-<div class="doc_code">
-<pre>
- // Codegen the RHS.
- Value *Val = RHS-&gt;Codegen();
- if (Val == 0) return 0;
-
- // Look up the name.
- Value *Variable = NamedValues[LHSE-&gt;getName()];
- if (Variable == 0) return ErrorV("Unknown variable name");
-
- Builder.CreateStore(Val, Variable);
- return Val;
- }
- ...
-</pre>
-</div>
-
-<p>Once we have the variable, codegen'ing the assignment is straightforward:
-we emit the RHS of the assignment, create a store, and return the computed
-value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
-
-<p>Now that we have an assignment operator, we can mutate loop variables and
-arguments. For example, we can now run code like this:</p>
-
-<div class="doc_code">
-<pre>
-# Function to print a double.
-extern printd(x);
-
-# Define ':' for sequencing: as a low-precedence operator that ignores operands
-# and just returns the RHS.
-def binary : 1 (x y) y;
-
-def test(x)
- printd(x) :
- x = 4 :
- printd(x);
-
-test(123);
-</pre>
-</div>
-
-<p>When run, this example prints "123" and then "4", showing that we did
-actually mutate the value! Okay, we have now officially implemented our goal:
-getting this to work requires SSA construction in the general case. However,
-to be really useful, we want the ability to define our own local variables, lets
-add this next!
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="localvars">User-defined Local
-Variables</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Adding var/in is just like any other other extensions we made to
-Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
-The first step for adding our new 'var/in' construct is to extend the lexer.
-As before, this is pretty trivial, the code looks like this:</p>
-
-<div class="doc_code">
-<pre>
-enum Token {
- ...
- <b>// var definition
- tok_var = -13</b>
-...
-}
-...
-static int gettok() {
-...
- if (IdentifierStr == "in") return tok_in;
- if (IdentifierStr == "binary") return tok_binary;
- if (IdentifierStr == "unary") return tok_unary;
- <b>if (IdentifierStr == "var") return tok_var;</b>
- return tok_identifier;
-...
-</pre>
-</div>
-
-<p>The next step is to define the AST node that we will construct. For var/in,
-it looks like this:</p>
-
-<div class="doc_code">
-<pre>
-/// VarExprAST - Expression class for var/in
-class VarExprAST : public ExprAST {
- std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
- ExprAST *Body;
-public:
- VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
- ExprAST *body)
- : VarNames(varnames), Body(body) {}
-
- virtual Value *Codegen();
-};
-</pre>
-</div>
-
-<p>var/in allows a list of names to be defined all at once, and each name can
-optionally have an initializer value. As such, we capture this information in
-the VarNames vector. Also, var/in has a body, this body is allowed to access
-the variables defined by the var/in.</p>
-
-<p>With this in place, we can define the parser pieces. The first thing we do is add
-it as a primary expression:</p>
-
-<div class="doc_code">
-<pre>
-/// primary
-/// ::= identifierexpr
-/// ::= numberexpr
-/// ::= parenexpr
-/// ::= ifexpr
-/// ::= forexpr
-<b>/// ::= varexpr</b>
-static ExprAST *ParsePrimary() {
- switch (CurTok) {
- default: return Error("unknown token when expecting an expression");
- case tok_identifier: return ParseIdentifierExpr();
- case tok_number: return ParseNumberExpr();
- case '(': return ParseParenExpr();
- case tok_if: return ParseIfExpr();
- case tok_for: return ParseForExpr();
- <b>case tok_var: return ParseVarExpr();</b>
- }
-}
-</pre>
-</div>
-
-<p>Next we define ParseVarExpr:</p>
-
-<div class="doc_code">
-<pre>
-/// varexpr ::= 'var' identifier ('=' expression)?
-// (',' identifier ('=' expression)?)* 'in' expression
-static ExprAST *ParseVarExpr() {
- getNextToken(); // eat the var.
-
- std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
-
- // At least one variable name is required.
- if (CurTok != tok_identifier)
- return Error("expected identifier after var");
-</pre>
-</div>
-
-<p>The first part of this code parses the list of identifier/expr pairs into the
-local <tt>VarNames</tt> vector.
-
-<div class="doc_code">
-<pre>
- while (1) {
- std::string Name = IdentifierStr;
- getNextToken(); // eat identifier.
-
- // Read the optional initializer.
- ExprAST *Init = 0;
- if (CurTok == '=') {
- getNextToken(); // eat the '='.
-
- Init = ParseExpression();
- if (Init == 0) return 0;
- }
-
- VarNames.push_back(std::make_pair(Name, Init));
-
- // End of var list, exit loop.
- if (CurTok != ',') break;
- getNextToken(); // eat the ','.
-
- if (CurTok != tok_identifier)
- return Error("expected identifier list after var");
- }
-</pre>
-</div>
-
-<p>Once all the variables are parsed, we then parse the body and create the
-AST node:</p>
-
-<div class="doc_code">
-<pre>
- // At this point, we have to have 'in'.
- if (CurTok != tok_in)
- return Error("expected 'in' keyword after 'var'");
- getNextToken(); // eat 'in'.
-
- ExprAST *Body = ParseExpression();
- if (Body == 0) return 0;
-
- return new VarExprAST(VarNames, Body);
-}
-</pre>
-</div>
-
-<p>Now that we can parse and represent the code, we need to support emission of
-LLVM IR for it. This code starts out with:</p>
-
-<div class="doc_code">
-<pre>
-Value *VarExprAST::Codegen() {
- std::vector&lt;AllocaInst *&gt; OldBindings;
-
- Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
-
- // Register all variables and emit their initializer.
- for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
- const std::string &amp;VarName = VarNames[i].first;
- ExprAST *Init = VarNames[i].second;
-</pre>
-</div>
-
-<p>Basically it loops over all the variables, installing them one at a time.
-For each variable we put into the symbol table, we remember the previous value
-that we replace in OldBindings.</p>
-
-<div class="doc_code">
-<pre>
- // Emit the initializer before adding the variable to scope, this prevents
- // the initializer from referencing the variable itself, and permits stuff
- // like this:
- // var a = 1 in
- // var a = a in ... # refers to outer 'a'.
- Value *InitVal;
- if (Init) {
- InitVal = Init-&gt;Codegen();
- if (InitVal == 0) return 0;
- } else { // If not specified, use 0.0.
- InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
- }
-
- AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
- Builder.CreateStore(InitVal, Alloca);
-
- // Remember the old variable binding so that we can restore the binding when
- // we unrecurse.
- OldBindings.push_back(NamedValues[VarName]);
-
- // Remember this binding.
- NamedValues[VarName] = Alloca;
- }
-</pre>
-</div>
-
-<p>There are more comments here than code. The basic idea is that we emit the
-initializer, create the alloca, then update the symbol table to point to it.
-Once all the variables are installed in the symbol table, we evaluate the body
-of the var/in expression:</p>
-
-<div class="doc_code">
-<pre>
- // Codegen the body, now that all vars are in scope.
- Value *BodyVal = Body-&gt;Codegen();
- if (BodyVal == 0) return 0;
-</pre>
-</div>
-
-<p>Finally, before returning, we restore the previous variable bindings:</p>
-
-<div class="doc_code">
-<pre>
- // Pop all our variables from scope.
- for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
- NamedValues[VarNames[i].first] = OldBindings[i];
-
- // Return the body computation.
- return BodyVal;
-}
-</pre>
-</div>
-
-<p>The end result of all of this is that we get properly scoped variable
-definitions, and we even (trivially) allow mutation of them :).</p>
-
-<p>With this, we completed what we set out to do. Our nice iterative fib
-example from the intro compiles and runs just fine. The mem2reg pass optimizes
-all of our stack variables into SSA registers, inserting PHI nodes where needed,
-and our front-end remains simple: no "iterated dominance frontier" computation
-anywhere in sight.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for our running example, enhanced with mutable
-variables and var/in support. To build this example, use:
-</p>
-
-<div class="doc_code">
-<pre>
- # Compile
- g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
- # Run
- ./toy
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<div class="doc_code">
-<pre>
-#include "llvm/DerivedTypes.h"
-#include "llvm/ExecutionEngine/ExecutionEngine.h"
-#include "llvm/ExecutionEngine/JIT.h"
-#include "llvm/LLVMContext.h"
-#include "llvm/Module.h"
-#include "llvm/PassManager.h"
-#include "llvm/Analysis/Verifier.h"
-#include "llvm/Target/TargetData.h"
-#include "llvm/Target/TargetSelect.h"
-#include "llvm/Transforms/Scalar.h"
-#include "llvm/Support/IRBuilder.h"
-#include &lt;cstdio&gt;
-#include &lt;string&gt;
-#include &lt;map&gt;
-#include &lt;vector&gt;
-using namespace llvm;
-
-//===----------------------------------------------------------------------===//
-// Lexer
-//===----------------------------------------------------------------------===//
-
-// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
-// of these for known things.
-enum Token {
- tok_eof = -1,
-
- // commands
- tok_def = -2, tok_extern = -3,
-
- // primary
- tok_identifier = -4, tok_number = -5,
-
- // control
- tok_if = -6, tok_then = -7, tok_else = -8,
- tok_for = -9, tok_in = -10,
-
- // operators
- tok_binary = -11, tok_unary = -12,
-
- // var definition
- tok_var = -13
-};
-
-static std::string IdentifierStr; // Filled in if tok_identifier
-static double NumVal; // Filled in if tok_number
-
-/// gettok - Return the next token from standard input.
-static int gettok() {
- static int LastChar = ' ';
-
- // Skip any whitespace.
- while (isspace(LastChar))
- LastChar = getchar();
-
- if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
- IdentifierStr = LastChar;
- while (isalnum((LastChar = getchar())))
- IdentifierStr += LastChar;
-
- if (IdentifierStr == "def") return tok_def;
- if (IdentifierStr == "extern") return tok_extern;
- if (IdentifierStr == "if") return tok_if;
- if (IdentifierStr == "then") return tok_then;
- if (IdentifierStr == "else") return tok_else;
- if (IdentifierStr == "for") return tok_for;
- if (IdentifierStr == "in") return tok_in;
- if (IdentifierStr == "binary") return tok_binary;
- if (IdentifierStr == "unary") return tok_unary;
- if (IdentifierStr == "var") return tok_var;
- return tok_identifier;
- }
-
- if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
- std::string NumStr;
- do {
- NumStr += LastChar;
- LastChar = getchar();
- } while (isdigit(LastChar) || LastChar == '.');
-
- NumVal = strtod(NumStr.c_str(), 0);
- return tok_number;
- }
-
- if (LastChar == '#') {
- // Comment until end of line.
- do LastChar = getchar();
- while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
-
- if (LastChar != EOF)
- return gettok();
- }
-
- // Check for end of file. Don't eat the EOF.
- if (LastChar == EOF)
- return tok_eof;
-
- // Otherwise, just return the character as its ascii value.
- int ThisChar = LastChar;
- LastChar = getchar();
- return ThisChar;
-}
-
-//===----------------------------------------------------------------------===//
-// Abstract Syntax Tree (aka Parse Tree)
-//===----------------------------------------------------------------------===//
-
-/// ExprAST - Base class for all expression nodes.
-class ExprAST {
-public:
- virtual ~ExprAST() {}
- virtual Value *Codegen() = 0;
-};
-
-/// NumberExprAST - Expression class for numeric literals like "1.0".
-class NumberExprAST : public ExprAST {
- double Val;
-public:
- NumberExprAST(double val) : Val(val) {}
- virtual Value *Codegen();
-};
-
-/// VariableExprAST - Expression class for referencing a variable, like "a".
-class VariableExprAST : public ExprAST {
- std::string Name;
-public:
- VariableExprAST(const std::string &amp;name) : Name(name) {}
- const std::string &amp;getName() const { return Name; }
- virtual Value *Codegen();
-};
-
-/// UnaryExprAST - Expression class for a unary operator.
-class UnaryExprAST : public ExprAST {
- char Opcode;
- ExprAST *Operand;
-public:
- UnaryExprAST(char opcode, ExprAST *operand)
- : Opcode(opcode), Operand(operand) {}
- virtual Value *Codegen();
-};
-
-/// BinaryExprAST - Expression class for a binary operator.
-class BinaryExprAST : public ExprAST {
- char Op;
- ExprAST *LHS, *RHS;
-public:
- BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
- : Op(op), LHS(lhs), RHS(rhs) {}
- virtual Value *Codegen();
-};
-
-/// CallExprAST - Expression class for function calls.
-class CallExprAST : public ExprAST {
- std::string Callee;
- std::vector&lt;ExprAST*&gt; Args;
-public:
- CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
- : Callee(callee), Args(args) {}
- virtual Value *Codegen();
-};
-
-/// IfExprAST - Expression class for if/then/else.
-class IfExprAST : public ExprAST {
- ExprAST *Cond, *Then, *Else;
-public:
- IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
- : Cond(cond), Then(then), Else(_else) {}
- virtual Value *Codegen();
-};
-
-/// ForExprAST - Expression class for for/in.
-class ForExprAST : public ExprAST {
- std::string VarName;
- ExprAST *Start, *End, *Step, *Body;
-public:
- ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
- ExprAST *step, ExprAST *body)
- : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
- virtual Value *Codegen();
-};
-
-/// VarExprAST - Expression class for var/in
-class VarExprAST : public ExprAST {
- std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
- ExprAST *Body;
-public:
- VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
- ExprAST *body)
- : VarNames(varnames), Body(body) {}
-
- virtual Value *Codegen();
-};
-
-/// PrototypeAST - This class represents the "prototype" for a function,
-/// which captures its name, and its argument names (thus implicitly the number
-/// of arguments the function takes), as well as if it is an operator.
-class PrototypeAST {
- std::string Name;
- std::vector&lt;std::string&gt; Args;
- bool isOperator;
- unsigned Precedence; // Precedence if a binary op.
-public:
- PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
- bool isoperator = false, unsigned prec = 0)
- : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
-
- bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
- bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
-
- char getOperatorName() const {
- assert(isUnaryOp() || isBinaryOp());
- return Name[Name.size()-1];
- }
-
- unsigned getBinaryPrecedence() const { return Precedence; }
-
- Function *Codegen();
-
- void CreateArgumentAllocas(Function *F);
-};
-
-/// FunctionAST - This class represents a function definition itself.
-class FunctionAST {
- PrototypeAST *Proto;
- ExprAST *Body;
-public:
- FunctionAST(PrototypeAST *proto, ExprAST *body)
- : Proto(proto), Body(body) {}
-
- Function *Codegen();
-};
-
-//===----------------------------------------------------------------------===//
-// Parser
-//===----------------------------------------------------------------------===//
-
-/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
-/// token the parser is looking at. getNextToken reads another token from the
-/// lexer and updates CurTok with its results.
-static int CurTok;
-static int getNextToken() {
- return CurTok = gettok();
-}
-
-/// BinopPrecedence - This holds the precedence for each binary operator that is
-/// defined.
-static std::map&lt;char, int&gt; BinopPrecedence;
-
-/// GetTokPrecedence - Get the precedence of the pending binary operator token.
-static int GetTokPrecedence() {
- if (!isascii(CurTok))
- return -1;
-
- // Make sure it's a declared binop.
- int TokPrec = BinopPrecedence[CurTok];
- if (TokPrec &lt;= 0) return -1;
- return TokPrec;
-}
-
-/// Error* - These are little helper functions for error handling.
-ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
-PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
-FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
-
-static ExprAST *ParseExpression();
-
-/// identifierexpr
-/// ::= identifier
-/// ::= identifier '(' expression* ')'
-static ExprAST *ParseIdentifierExpr() {
- std::string IdName = IdentifierStr;
-
- getNextToken(); // eat identifier.
-
- if (CurTok != '(') // Simple variable ref.
- return new VariableExprAST(IdName);
-
- // Call.
- getNextToken(); // eat (
- std::vector&lt;ExprAST*&gt; Args;
- if (CurTok != ')') {
- while (1) {
- ExprAST *Arg = ParseExpression();
- if (!Arg) return 0;
- Args.push_back(Arg);
-
- if (CurTok == ')') break;
-
- if (CurTok != ',')
- return Error("Expected ')' or ',' in argument list");
- getNextToken();
- }
- }
-
- // Eat the ')'.
- getNextToken();
-
- return new CallExprAST(IdName, Args);
-}
-
-/// numberexpr ::= number
-static ExprAST *ParseNumberExpr() {
- ExprAST *Result = new NumberExprAST(NumVal);
- getNextToken(); // consume the number
- return Result;
-}
-
-/// parenexpr ::= '(' expression ')'
-static ExprAST *ParseParenExpr() {
- getNextToken(); // eat (.
- ExprAST *V = ParseExpression();
- if (!V) return 0;
-
- if (CurTok != ')')
- return Error("expected ')'");
- getNextToken(); // eat ).
- return V;
-}
-
-/// ifexpr ::= 'if' expression 'then' expression 'else' expression
-static ExprAST *ParseIfExpr() {
- getNextToken(); // eat the if.
-
- // condition.
- ExprAST *Cond = ParseExpression();
- if (!Cond) return 0;
-
- if (CurTok != tok_then)
- return Error("expected then");
- getNextToken(); // eat the then
-
- ExprAST *Then = ParseExpression();
- if (Then == 0) return 0;
-
- if (CurTok != tok_else)
- return Error("expected else");
-
- getNextToken();
-
- ExprAST *Else = ParseExpression();
- if (!Else) return 0;
-
- return new IfExprAST(Cond, Then, Else);
-}
-
-/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
-static ExprAST *ParseForExpr() {
- getNextToken(); // eat the for.
-
- if (CurTok != tok_identifier)
- return Error("expected identifier after for");
-
- std::string IdName = IdentifierStr;
- getNextToken(); // eat identifier.
-
- if (CurTok != '=')
- return Error("expected '=' after for");
- getNextToken(); // eat '='.
-
-
- ExprAST *Start = ParseExpression();
- if (Start == 0) return 0;
- if (CurTok != ',')
- return Error("expected ',' after for start value");
- getNextToken();
-
- ExprAST *End = ParseExpression();
- if (End == 0) return 0;
-
- // The step value is optional.
- ExprAST *Step = 0;
- if (CurTok == ',') {
- getNextToken();
- Step = ParseExpression();
- if (Step == 0) return 0;
- }
-
- if (CurTok != tok_in)
- return Error("expected 'in' after for");
- getNextToken(); // eat 'in'.
-
- ExprAST *Body = ParseExpression();
- if (Body == 0) return 0;
-
- return new ForExprAST(IdName, Start, End, Step, Body);
-}
-
-/// varexpr ::= 'var' identifier ('=' expression)?
-// (',' identifier ('=' expression)?)* 'in' expression
-static ExprAST *ParseVarExpr() {
- getNextToken(); // eat the var.
-
- std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
-
- // At least one variable name is required.
- if (CurTok != tok_identifier)
- return Error("expected identifier after var");
-
- while (1) {
- std::string Name = IdentifierStr;
- getNextToken(); // eat identifier.
-
- // Read the optional initializer.
- ExprAST *Init = 0;
- if (CurTok == '=') {
- getNextToken(); // eat the '='.
-
- Init = ParseExpression();
- if (Init == 0) return 0;
- }
-
- VarNames.push_back(std::make_pair(Name, Init));
-
- // End of var list, exit loop.
- if (CurTok != ',') break;
- getNextToken(); // eat the ','.
-
- if (CurTok != tok_identifier)
- return Error("expected identifier list after var");
- }
-
- // At this point, we have to have 'in'.
- if (CurTok != tok_in)
- return Error("expected 'in' keyword after 'var'");
- getNextToken(); // eat 'in'.
-
- ExprAST *Body = ParseExpression();
- if (Body == 0) return 0;
-
- return new VarExprAST(VarNames, Body);
-}
-
-/// primary
-/// ::= identifierexpr
-/// ::= numberexpr
-/// ::= parenexpr
-/// ::= ifexpr
-/// ::= forexpr
-/// ::= varexpr
-static ExprAST *ParsePrimary() {
- switch (CurTok) {
- default: return Error("unknown token when expecting an expression");
- case tok_identifier: return ParseIdentifierExpr();
- case tok_number: return ParseNumberExpr();
- case '(': return ParseParenExpr();
- case tok_if: return ParseIfExpr();
- case tok_for: return ParseForExpr();
- case tok_var: return ParseVarExpr();
- }
-}
-
-/// unary
-/// ::= primary
-/// ::= '!' unary
-static ExprAST *ParseUnary() {
- // If the current token is not an operator, it must be a primary expr.
- if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
- return ParsePrimary();
-
- // If this is a unary operator, read it.
- int Opc = CurTok;
- getNextToken();
- if (ExprAST *Operand = ParseUnary())
- return new UnaryExprAST(Opc, Operand);
- return 0;
-}
-
-/// binoprhs
-/// ::= ('+' unary)*
-static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
- // If this is a binop, find its precedence.
- while (1) {
- int TokPrec = GetTokPrecedence();
-
- // If this is a binop that binds at least as tightly as the current binop,
- // consume it, otherwise we are done.
- if (TokPrec &lt; ExprPrec)
- return LHS;
-
- // Okay, we know this is a binop.
- int BinOp = CurTok;
- getNextToken(); // eat binop
-
- // Parse the unary expression after the binary operator.
- ExprAST *RHS = ParseUnary();
- if (!RHS) return 0;
-
- // If BinOp binds less tightly with RHS than the operator after RHS, let
- // the pending operator take RHS as its LHS.
- int NextPrec = GetTokPrecedence();
- if (TokPrec &lt; NextPrec) {
- RHS = ParseBinOpRHS(TokPrec+1, RHS);
- if (RHS == 0) return 0;
- }
-
- // Merge LHS/RHS.
- LHS = new BinaryExprAST(BinOp, LHS, RHS);
- }
-}
-
-/// expression
-/// ::= unary binoprhs
-///
-static ExprAST *ParseExpression() {
- ExprAST *LHS = ParseUnary();
- if (!LHS) return 0;
-
- return ParseBinOpRHS(0, LHS);
-}
-
-/// prototype
-/// ::= id '(' id* ')'
-/// ::= binary LETTER number? (id, id)
-/// ::= unary LETTER (id)
-static PrototypeAST *ParsePrototype() {
- std::string FnName;
-
- unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
- unsigned BinaryPrecedence = 30;
-
- switch (CurTok) {
- default:
- return ErrorP("Expected function name in prototype");
- case tok_identifier:
- FnName = IdentifierStr;
- Kind = 0;
- getNextToken();
- break;
- case tok_unary:
- getNextToken();
- if (!isascii(CurTok))
- return ErrorP("Expected unary operator");
- FnName = "unary";
- FnName += (char)CurTok;
- Kind = 1;
- getNextToken();
- break;
- case tok_binary:
- getNextToken();
- if (!isascii(CurTok))
- return ErrorP("Expected binary operator");
- FnName = "binary";
- FnName += (char)CurTok;
- Kind = 2;
- getNextToken();
-
- // Read the precedence if present.
- if (CurTok == tok_number) {
- if (NumVal &lt; 1 || NumVal &gt; 100)
- return ErrorP("Invalid precedecnce: must be 1..100");
- BinaryPrecedence = (unsigned)NumVal;
- getNextToken();
- }
- break;
- }
-
- if (CurTok != '(')
- return ErrorP("Expected '(' in prototype");
-
- std::vector&lt;std::string&gt; ArgNames;
- while (getNextToken() == tok_identifier)
- ArgNames.push_back(IdentifierStr);
- if (CurTok != ')')
- return ErrorP("Expected ')' in prototype");
-
- // success.
- getNextToken(); // eat ')'.
-
- // Verify right number of names for operator.
- if (Kind &amp;&amp; ArgNames.size() != Kind)
- return ErrorP("Invalid number of operands for operator");
-
- return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
-}
-
-/// definition ::= 'def' prototype expression
-static FunctionAST *ParseDefinition() {
- getNextToken(); // eat def.
- PrototypeAST *Proto = ParsePrototype();
- if (Proto == 0) return 0;
-
- if (ExprAST *E = ParseExpression())
- return new FunctionAST(Proto, E);
- return 0;
-}
-
-/// toplevelexpr ::= expression
-static FunctionAST *ParseTopLevelExpr() {
- if (ExprAST *E = ParseExpression()) {
- // Make an anonymous proto.
- PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
- return new FunctionAST(Proto, E);
- }
- return 0;
-}
-
-/// external ::= 'extern' prototype
-static PrototypeAST *ParseExtern() {
- getNextToken(); // eat extern.
- return ParsePrototype();
-}
-
-//===----------------------------------------------------------------------===//
-// Code Generation
-//===----------------------------------------------------------------------===//
-
-static Module *TheModule;
-static IRBuilder&lt;&gt; Builder(getGlobalContext());
-static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
-static FunctionPassManager *TheFPM;
-
-Value *ErrorV(const char *Str) { Error(Str); return 0; }
-
-/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
-/// the function. This is used for mutable variables etc.
-static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
- const std::string &amp;VarName) {
- IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
- TheFunction-&gt;getEntryBlock().begin());
- return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
- VarName.c_str());
-}
-
-Value *NumberExprAST::Codegen() {
- return ConstantFP::get(getGlobalContext(), APFloat(Val));
-}
-
-Value *VariableExprAST::Codegen() {
- // Look this variable up in the function.
- Value *V = NamedValues[Name];
- if (V == 0) return ErrorV("Unknown variable name");
-
- // Load the value.
- return Builder.CreateLoad(V, Name.c_str());
-}
-
-Value *UnaryExprAST::Codegen() {
- Value *OperandV = Operand-&gt;Codegen();
- if (OperandV == 0) return 0;
-
- Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
- if (F == 0)
- return ErrorV("Unknown unary operator");
-
- return Builder.CreateCall(F, OperandV, "unop");
-}
-
-Value *BinaryExprAST::Codegen() {
- // Special case '=' because we don't want to emit the LHS as an expression.
- if (Op == '=') {
- // Assignment requires the LHS to be an identifier.
- VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
- if (!LHSE)
- return ErrorV("destination of '=' must be a variable");
- // Codegen the RHS.
- Value *Val = RHS-&gt;Codegen();
- if (Val == 0) return 0;
-
- // Look up the name.
- Value *Variable = NamedValues[LHSE-&gt;getName()];
- if (Variable == 0) return ErrorV("Unknown variable name");
-
- Builder.CreateStore(Val, Variable);
- return Val;
- }
-
- Value *L = LHS-&gt;Codegen();
- Value *R = RHS-&gt;Codegen();
- if (L == 0 || R == 0) return 0;
-
- switch (Op) {
- case '+': return Builder.CreateAdd(L, R, "addtmp");
- case '-': return Builder.CreateSub(L, R, "subtmp");
- case '*': return Builder.CreateMul(L, R, "multmp");
- case '&lt;':
- L = Builder.CreateFCmpULT(L, R, "cmptmp");
- // Convert bool 0/1 to double 0.0 or 1.0
- return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
- "booltmp");
- default: break;
- }
-
- // If it wasn't a builtin binary operator, it must be a user defined one. Emit
- // a call to it.
- Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
- assert(F &amp;&amp; "binary operator not found!");
-
- Value *Ops[] = { L, R };
- return Builder.CreateCall(F, Ops, Ops+2, "binop");
-}
-
-Value *CallExprAST::Codegen() {
- // Look up the name in the global module table.
- Function *CalleeF = TheModule-&gt;getFunction(Callee);
- if (CalleeF == 0)
- return ErrorV("Unknown function referenced");
-
- // If argument mismatch error.
- if (CalleeF-&gt;arg_size() != Args.size())
- return ErrorV("Incorrect # arguments passed");
-
- std::vector&lt;Value*&gt; ArgsV;
- for (unsigned i = 0, e = Args.size(); i != e; ++i) {
- ArgsV.push_back(Args[i]-&gt;Codegen());
- if (ArgsV.back() == 0) return 0;
- }
-
- return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
-}
-
-Value *IfExprAST::Codegen() {
- Value *CondV = Cond-&gt;Codegen();
- if (CondV == 0) return 0;
-
- // Convert condition to a bool by comparing equal to 0.0.
- CondV = Builder.CreateFCmpONE(CondV,
- ConstantFP::get(getGlobalContext(), APFloat(0.0)),
- "ifcond");
-
- Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
-
- // Create blocks for the then and else cases. Insert the 'then' block at the
- // end of the function.
- BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
- BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
- BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
-
- Builder.CreateCondBr(CondV, ThenBB, ElseBB);
-
- // Emit then value.
- Builder.SetInsertPoint(ThenBB);
-
- Value *ThenV = Then-&gt;Codegen();
- if (ThenV == 0) return 0;
-
- Builder.CreateBr(MergeBB);
- // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
- ThenBB = Builder.GetInsertBlock();
-
- // Emit else block.
- TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
- Builder.SetInsertPoint(ElseBB);
-
- Value *ElseV = Else-&gt;Codegen();
- if (ElseV == 0) return 0;
-
- Builder.CreateBr(MergeBB);
- // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
- ElseBB = Builder.GetInsertBlock();
-
- // Emit merge block.
- TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
- Builder.SetInsertPoint(MergeBB);
- PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()),
- "iftmp");
-
- PN-&gt;addIncoming(ThenV, ThenBB);
- PN-&gt;addIncoming(ElseV, ElseBB);
- return PN;
-}
-
-Value *ForExprAST::Codegen() {
- // Output this as:
- // var = alloca double
- // ...
- // start = startexpr
- // store start -&gt; var
- // goto loop
- // loop:
- // ...
- // bodyexpr
- // ...
- // loopend:
- // step = stepexpr
- // endcond = endexpr
- //
- // curvar = load var
- // nextvar = curvar + step
- // store nextvar -&gt; var
- // br endcond, loop, endloop
- // outloop:
-
- Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
-
- // Create an alloca for the variable in the entry block.
- AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
-
- // Emit the start code first, without 'variable' in scope.
- Value *StartVal = Start-&gt;Codegen();
- if (StartVal == 0) return 0;
-
- // Store the value into the alloca.
- Builder.CreateStore(StartVal, Alloca);
-
- // Make the new basic block for the loop header, inserting after current
- // block.
- BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
-
- // Insert an explicit fall through from the current block to the LoopBB.
- Builder.CreateBr(LoopBB);
-
- // Start insertion in LoopBB.
- Builder.SetInsertPoint(LoopBB);
-
- // Within the loop, the variable is defined equal to the PHI node. If it
- // shadows an existing variable, we have to restore it, so save it now.
- AllocaInst *OldVal = NamedValues[VarName];
- NamedValues[VarName] = Alloca;
-
- // Emit the body of the loop. This, like any other expr, can change the
- // current BB. Note that we ignore the value computed by the body, but don't
- // allow an error.
- if (Body-&gt;Codegen() == 0)
- return 0;
-
- // Emit the step value.
- Value *StepVal;
- if (Step) {
- StepVal = Step-&gt;Codegen();
- if (StepVal == 0) return 0;
- } else {
- // If not specified, use 1.0.
- StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
- }
-
- // Compute the end condition.
- Value *EndCond = End-&gt;Codegen();
- if (EndCond == 0) return EndCond;
-
- // Reload, increment, and restore the alloca. This handles the case where
- // the body of the loop mutates the variable.
- Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
- Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
- Builder.CreateStore(NextVar, Alloca);
-
- // Convert condition to a bool by comparing equal to 0.0.
- EndCond = Builder.CreateFCmpONE(EndCond,
- ConstantFP::get(getGlobalContext(), APFloat(0.0)),
- "loopcond");
-
- // Create the "after loop" block and insert it.
- BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
-
- // Insert the conditional branch into the end of LoopEndBB.
- Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
-
- // Any new code will be inserted in AfterBB.
- Builder.SetInsertPoint(AfterBB);
-
- // Restore the unshadowed variable.
- if (OldVal)
- NamedValues[VarName] = OldVal;
- else
- NamedValues.erase(VarName);
-
-
- // for expr always returns 0.0.
- return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
-}
-
-Value *VarExprAST::Codegen() {
- std::vector&lt;AllocaInst *&gt; OldBindings;
-
- Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
-
- // Register all variables and emit their initializer.
- for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
- const std::string &amp;VarName = VarNames[i].first;
- ExprAST *Init = VarNames[i].second;
-
- // Emit the initializer before adding the variable to scope, this prevents
- // the initializer from referencing the variable itself, and permits stuff
- // like this:
- // var a = 1 in
- // var a = a in ... # refers to outer 'a'.
- Value *InitVal;
- if (Init) {
- InitVal = Init-&gt;Codegen();
- if (InitVal == 0) return 0;
- } else { // If not specified, use 0.0.
- InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
- }
-
- AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
- Builder.CreateStore(InitVal, Alloca);
-
- // Remember the old variable binding so that we can restore the binding when
- // we unrecurse.
- OldBindings.push_back(NamedValues[VarName]);
-
- // Remember this binding.
- NamedValues[VarName] = Alloca;
- }
-
- // Codegen the body, now that all vars are in scope.
- Value *BodyVal = Body-&gt;Codegen();
- if (BodyVal == 0) return 0;
-
- // Pop all our variables from scope.
- for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
- NamedValues[VarNames[i].first] = OldBindings[i];
-
- // Return the body computation.
- return BodyVal;
-}
-
-Function *PrototypeAST::Codegen() {
- // Make the function type: double(double,double) etc.
- std::vector&lt;const Type*&gt; Doubles(Args.size(),
- Type::getDoubleTy(getGlobalContext()));
- FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
- Doubles, false);
-
- Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
-
- // If F conflicted, there was already something named 'Name'. If it has a
- // body, don't allow redefinition or reextern.
- if (F-&gt;getName() != Name) {
- // Delete the one we just made and get the existing one.
- F-&gt;eraseFromParent();
- F = TheModule-&gt;getFunction(Name);
-
- // If F already has a body, reject this.
- if (!F-&gt;empty()) {
- ErrorF("redefinition of function");
- return 0;
- }
-
- // If F took a different number of args, reject.
- if (F-&gt;arg_size() != Args.size()) {
- ErrorF("redefinition of function with different # args");
- return 0;
- }
- }
-
- // Set names for all arguments.
- unsigned Idx = 0;
- for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
- ++AI, ++Idx)
- AI-&gt;setName(Args[Idx]);
-
- return F;
-}
-
-/// CreateArgumentAllocas - Create an alloca for each argument and register the
-/// argument in the symbol table so that references to it will succeed.
-void PrototypeAST::CreateArgumentAllocas(Function *F) {
- Function::arg_iterator AI = F-&gt;arg_begin();
- for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
- // Create an alloca for this variable.
- AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
-
- // Store the initial value into the alloca.
- Builder.CreateStore(AI, Alloca);
-
- // Add arguments to variable symbol table.
- NamedValues[Args[Idx]] = Alloca;
- }
-}
-
-Function *FunctionAST::Codegen() {
- NamedValues.clear();
-
- Function *TheFunction = Proto-&gt;Codegen();
- if (TheFunction == 0)
- return 0;
-
- // If this is an operator, install it.
- if (Proto-&gt;isBinaryOp())
- BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
-
- // Create a new basic block to start insertion into.
- BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
- Builder.SetInsertPoint(BB);
-
- // Add all arguments to the symbol table and create their allocas.
- Proto-&gt;CreateArgumentAllocas(TheFunction);
-
- if (Value *RetVal = Body-&gt;Codegen()) {
- // Finish off the function.
- Builder.CreateRet(RetVal);
-
- // Validate the generated code, checking for consistency.
- verifyFunction(*TheFunction);
-
- // Optimize the function.
- TheFPM-&gt;run(*TheFunction);
-
- return TheFunction;
- }
-
- // Error reading body, remove function.
- TheFunction-&gt;eraseFromParent();
-
- if (Proto-&gt;isBinaryOp())
- BinopPrecedence.erase(Proto-&gt;getOperatorName());
- return 0;
-}
-
-//===----------------------------------------------------------------------===//
-// Top-Level parsing and JIT Driver
-//===----------------------------------------------------------------------===//
-
-static ExecutionEngine *TheExecutionEngine;
-
-static void HandleDefinition() {
- if (FunctionAST *F = ParseDefinition()) {
- if (Function *LF = F-&gt;Codegen()) {
- fprintf(stderr, "Read function definition:");
- LF-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleExtern() {
- if (PrototypeAST *P = ParseExtern()) {
- if (Function *F = P-&gt;Codegen()) {
- fprintf(stderr, "Read extern: ");
- F-&gt;dump();
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-static void HandleTopLevelExpression() {
- // Evaluate a top-level expression into an anonymous function.
- if (FunctionAST *F = ParseTopLevelExpr()) {
- if (Function *LF = F-&gt;Codegen()) {
- // JIT the function, returning a function pointer.
- void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
-
- // Cast it to the right type (takes no arguments, returns a double) so we
- // can call it as a native function.
- double (*FP)() = (double (*)())(intptr_t)FPtr;
- fprintf(stderr, "Evaluated to %f\n", FP());
- }
- } else {
- // Skip token for error recovery.
- getNextToken();
- }
-}
-
-/// top ::= definition | external | expression | ';'
-static void MainLoop() {
- while (1) {
- fprintf(stderr, "ready&gt; ");
- switch (CurTok) {
- case tok_eof: return;
- case ';': getNextToken(); break; // ignore top-level semicolons.
- case tok_def: HandleDefinition(); break;
- case tok_extern: HandleExtern(); break;
- default: HandleTopLevelExpression(); break;
- }
- }
-}
-
-//===----------------------------------------------------------------------===//
-// "Library" functions that can be "extern'd" from user code.
-//===----------------------------------------------------------------------===//
-
-/// putchard - putchar that takes a double and returns 0.
-extern "C"
-double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-
-/// printd - printf that takes a double prints it as "%f\n", returning 0.
-extern "C"
-double printd(double X) {
- printf("%f\n", X);
- return 0;
-}
-
-//===----------------------------------------------------------------------===//
-// Main driver code.
-//===----------------------------------------------------------------------===//
-
-int main() {
- InitializeNativeTarget();
- LLVMContext &amp;Context = getGlobalContext();
-
- // Install standard binary operators.
- // 1 is lowest precedence.
- BinopPrecedence['='] = 2;
- BinopPrecedence['&lt;'] = 10;
- BinopPrecedence['+'] = 20;
- BinopPrecedence['-'] = 20;
- BinopPrecedence['*'] = 40; // highest.
-
- // Prime the first token.
- fprintf(stderr, "ready&gt; ");
- getNextToken();
-
- // Make the module, which holds all the code.
- TheModule = new Module("my cool jit", Context);
-
- // Create the JIT. This takes ownership of the module.
- std::string ErrStr;
- TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&amp;ErrStr).create();
- if (!TheExecutionEngine) {
- fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
- exit(1);
- }
-
- FunctionPassManager OurFPM(TheModule);
-
- // Set up the optimizer pipeline. Start with registering info about how the
- // target lays out data structures.
- OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
- // Promote allocas to registers.
- OurFPM.add(createPromoteMemoryToRegisterPass());
- // Do simple "peephole" optimizations and bit-twiddling optzns.
- OurFPM.add(createInstructionCombiningPass());
- // Reassociate expressions.
- OurFPM.add(createReassociatePass());
- // Eliminate Common SubExpressions.
- OurFPM.add(createGVNPass());
- // Simplify the control flow graph (deleting unreachable blocks, etc).
- OurFPM.add(createCFGSimplificationPass());
-
- OurFPM.doInitialization();
-
- // Set the global so the code gen can use this.
- TheFPM = &amp;OurFPM;
-
- // Run the main "interpreter loop" now.
- MainLoop();
-
- TheFPM = 0;
-
- // Print out all of the generated code.
- TheModule-&gt;dump();
-
- return 0;
-}
-</pre>
-</div>
-
-<a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/LangImpl8.html b/docs/tutorial/LangImpl8.html
deleted file mode 100644
index 64a6200..0000000
--- a/docs/tutorial/LangImpl8.html
+++ /dev/null
@@ -1,365 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Conclusion and other useful LLVM tidbits</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Conclusion and other useful LLVM
- tidbits</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 8
- <ol>
- <li><a href="#conclusion">Tutorial Conclusion</a></li>
- <li><a href="#llvmirproperties">Properties of LLVM IR</a>
- <ul>
- <li><a href="#targetindep">Target Independence</a></li>
- <li><a href="#safety">Safety Guarantees</a></li>
- <li><a href="#langspecific">Language-Specific Optimizations</a></li>
- </ul>
- </li>
- <li><a href="#tipsandtricks">Tips and Tricks</a>
- <ul>
- <li><a href="#offsetofsizeof">Implementing portable
- offsetof/sizeof</a></li>
- <li><a href="#gcstack">Garbage Collected Stack Frames</a></li>
- </ul>
- </li>
- </ol>
-</li>
-</ul>
-
-
-<div class="doc_author">
- <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="conclusion">Tutorial Conclusion</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to the the final chapter of the "<a href="index.html">Implementing a
-language with LLVM</a>" tutorial. In the course of this tutorial, we have grown
-our little Kaleidoscope language from being a useless toy, to being a
-semi-interesting (but probably still useless) toy. :)</p>
-
-<p>It is interesting to see how far we've come, and how little code it has
-taken. We built the entire lexer, parser, AST, code generator, and an
-interactive run-loop (with a JIT!) by-hand in under 700 lines of
-(non-comment/non-blank) code.</p>
-
-<p>Our little language supports a couple of interesting features: it supports
-user defined binary and unary operators, it uses JIT compilation for immediate
-evaluation, and it supports a few control flow constructs with SSA construction.
-</p>
-
-<p>Part of the idea of this tutorial was to show you how easy and fun it can be
-to define, build, and play with languages. Building a compiler need not be a
-scary or mystical process! Now that you've seen some of the basics, I strongly
-encourage you to take the code and hack on it. For example, try adding:</p>
-
-<ul>
-<li><b>global variables</b> - While global variables have questional value in
-modern software engineering, they are often useful when putting together quick
-little hacks like the Kaleidoscope compiler itself. Fortunately, our current
-setup makes it very easy to add global variables: just have value lookup check
-to see if an unresolved variable is in the global variable symbol table before
-rejecting it. To create a new global variable, make an instance of the LLVM
-<tt>GlobalVariable</tt> class.</li>
-
-<li><b>typed variables</b> - Kaleidoscope currently only supports variables of
-type double. This gives the language a very nice elegance, because only
-supporting one type means that you never have to specify types. Different
-languages have different ways of handling this. The easiest way is to require
-the user to specify types for every variable definition, and record the type
-of the variable in the symbol table along with its Value*.</li>
-
-<li><b>arrays, structs, vectors, etc</b> - Once you add types, you can start
-extending the type system in all sorts of interesting ways. Simple arrays are
-very easy and are quite useful for many different applications. Adding them is
-mostly an exercise in learning how the LLVM <a
-href="../LangRef.html#i_getelementptr">getelementptr</a> instruction works: it
-is so nifty/unconventional, it <a
-href="../GetElementPtr.html">has its own FAQ</a>! If you add support
-for recursive types (e.g. linked lists), make sure to read the <a
-href="../ProgrammersManual.html#TypeResolve">section in the LLVM
-Programmer's Manual</a> that describes how to construct them.</li>
-
-<li><b>standard runtime</b> - Our current language allows the user to access
-arbitrary external functions, and we use it for things like "printd" and
-"putchard". As you extend the language to add higher-level constructs, often
-these constructs make the most sense if they are lowered to calls into a
-language-supplied runtime. For example, if you add hash tables to the language,
-it would probably make sense to add the routines to a runtime, instead of
-inlining them all the way.</li>
-
-<li><b>memory management</b> - Currently we can only access the stack in
-Kaleidoscope. It would also be useful to be able to allocate heap memory,
-either with calls to the standard libc malloc/free interface or with a garbage
-collector. If you would like to use garbage collection, note that LLVM fully
-supports <a href="../GarbageCollection.html">Accurate Garbage Collection</a>
-including algorithms that move objects and need to scan/update the stack.</li>
-
-<li><b>debugger support</b> - LLVM supports generation of <a
-href="../SourceLevelDebugging.html">DWARF Debug info</a> which is understood by
-common debuggers like GDB. Adding support for debug info is fairly
-straightforward. The best way to understand it is to compile some C/C++ code
-with "<tt>llvm-gcc -g -O0</tt>" and taking a look at what it produces.</li>
-
-<li><b>exception handling support</b> - LLVM supports generation of <a
-href="../ExceptionHandling.html">zero cost exceptions</a> which interoperate
-with code compiled in other languages. You could also generate code by
-implicitly making every function return an error value and checking it. You
-could also make explicit use of setjmp/longjmp. There are many different ways
-to go here.</li>
-
-<li><b>object orientation, generics, database access, complex numbers,
-geometric programming, ...</b> - Really, there is
-no end of crazy features that you can add to the language.</li>
-
-<li><b>unusual domains</b> - We've been talking about applying LLVM to a domain
-that many people are interested in: building a compiler for a specific language.
-However, there are many other domains that can use compiler technology that are
-not typically considered. For example, LLVM has been used to implement OpenGL
-graphics acceleration, translate C++ code to ActionScript, and many other
-cute and clever things. Maybe you will be the first to JIT compile a regular
-expression interpreter into native code with LLVM?</li>
-
-</ul>
-
-<p>
-Have fun - try doing something crazy and unusual. Building a language like
-everyone else always has, is much less fun than trying something a little crazy
-or off the wall and seeing how it turns out. If you get stuck or want to talk
-about it, feel free to email the <a
-href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing
-list</a>: it has lots of people who are interested in languages and are often
-willing to help out.
-</p>
-
-<p>Before we end this tutorial, I want to talk about some "tips and tricks" for generating
-LLVM IR. These are some of the more subtle things that may not be obvious, but
-are very useful if you want to take advantage of LLVM's capabilities.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="llvmirproperties">Properties of the LLVM
-IR</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>We have a couple common questions about code in the LLVM IR form - lets just
-get these out of the way right now, shall we?</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="targetindep">Target
-Independence</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>Kaleidoscope is an example of a "portable language": any program written in
-Kaleidoscope will work the same way on any target that it runs on. Many other
-languages have this property, e.g. lisp, java, haskell, javascript, python, etc
-(note that while these languages are portable, not all their libraries are).</p>
-
-<p>One nice aspect of LLVM is that it is often capable of preserving target
-independence in the IR: you can take the LLVM IR for a Kaleidoscope-compiled
-program and run it on any target that LLVM supports, even emitting C code and
-compiling that on targets that LLVM doesn't support natively. You can trivially
-tell that the Kaleidoscope compiler generates target-independent code because it
-never queries for any target-specific information when generating code.</p>
-
-<p>The fact that LLVM provides a compact, target-independent, representation for
-code gets a lot of people excited. Unfortunately, these people are usually
-thinking about C or a language from the C family when they are asking questions
-about language portability. I say "unfortunately", because there is really no
-way to make (fully general) C code portable, other than shipping the source code
-around (and of course, C source code is not actually portable in general
-either - ever port a really old application from 32- to 64-bits?).</p>
-
-<p>The problem with C (again, in its full generality) is that it is heavily
-laden with target specific assumptions. As one simple example, the preprocessor
-often destructively removes target-independence from the code when it processes
-the input text:</p>
-
-<div class="doc_code">
-<pre>
-#ifdef __i386__
- int X = 1;
-#else
- int X = 42;
-#endif
-</pre>
-</div>
-
-<p>While it is possible to engineer more and more complex solutions to problems
-like this, it cannot be solved in full generality in a way that is better than shipping
-the actual source code.</p>
-
-<p>That said, there are interesting subsets of C that can be made portable. If
-you are willing to fix primitive types to a fixed size (say int = 32-bits,
-and long = 64-bits), don't care about ABI compatibility with existing binaries,
-and are willing to give up some other minor features, you can have portable
-code. This can make sense for specialized domains such as an
-in-kernel language.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="safety">Safety Guarantees</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>Many of the languages above are also "safe" languages: it is impossible for
-a program written in Java to corrupt its address space and crash the process
-(assuming the JVM has no bugs).
-Safety is an interesting property that requires a combination of language
-design, runtime support, and often operating system support.</p>
-
-<p>It is certainly possible to implement a safe language in LLVM, but LLVM IR
-does not itself guarantee safety. The LLVM IR allows unsafe pointer casts,
-use after free bugs, buffer over-runs, and a variety of other problems. Safety
-needs to be implemented as a layer on top of LLVM and, conveniently, several
-groups have investigated this. Ask on the <a
-href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing
-list</a> if you are interested in more details.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="langspecific">Language-Specific
-Optimizations</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>One thing about LLVM that turns off many people is that it does not solve all
-the world's problems in one system (sorry 'world hunger', someone else will have
-to solve you some other day). One specific complaint is that people perceive
-LLVM as being incapable of performing high-level language-specific optimization:
-LLVM "loses too much information".</p>
-
-<p>Unfortunately, this is really not the place to give you a full and unified
-version of "Chris Lattner's theory of compiler design". Instead, I'll make a
-few observations:</p>
-
-<p>First, you're right that LLVM does lose information. For example, as of this
-writing, there is no way to distinguish in the LLVM IR whether an SSA-value came
-from a C "int" or a C "long" on an ILP32 machine (other than debug info). Both
-get compiled down to an 'i32' value and the information about what it came from
-is lost. The more general issue here, is that the LLVM type system uses
-"structural equivalence" instead of "name equivalence". Another place this
-surprises people is if you have two types in a high-level language that have the
-same structure (e.g. two different structs that have a single int field): these
-types will compile down into a single LLVM type and it will be impossible to
-tell what it came from.</p>
-
-<p>Second, while LLVM does lose information, LLVM is not a fixed target: we
-continue to enhance and improve it in many different ways. In addition to
-adding new features (LLVM did not always support exceptions or debug info), we
-also extend the IR to capture important information for optimization (e.g.
-whether an argument is sign or zero extended, information about pointers
-aliasing, etc). Many of the enhancements are user-driven: people want LLVM to
-include some specific feature, so they go ahead and extend it.</p>
-
-<p>Third, it is <em>possible and easy</em> to add language-specific
-optimizations, and you have a number of choices in how to do it. As one trivial
-example, it is easy to add language-specific optimization passes that
-"know" things about code compiled for a language. In the case of the C family,
-there is an optimization pass that "knows" about the standard C library
-functions. If you call "exit(0)" in main(), it knows that it is safe to
-optimize that into "return 0;" because C specifies what the 'exit'
-function does.</p>
-
-<p>In addition to simple library knowledge, it is possible to embed a variety of
-other language-specific information into the LLVM IR. If you have a specific
-need and run into a wall, please bring the topic up on the llvmdev list. At the
-very worst, you can always treat LLVM as if it were a "dumb code generator" and
-implement the high-level optimizations you desire in your front-end, on the
-language-specific AST.
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="tipsandtricks">Tips and Tricks</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>There is a variety of useful tips and tricks that you come to know after
-working on/with LLVM that aren't obvious at first glance. Instead of letting
-everyone rediscover them, this section talks about some of these issues.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="offsetofsizeof">Implementing portable
-offsetof/sizeof</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>One interesting thing that comes up, if you are trying to keep the code
-generated by your compiler "target independent", is that you often need to know
-the size of some LLVM type or the offset of some field in an llvm structure.
-For example, you might need to pass the size of a type into a function that
-allocates memory.</p>
-
-<p>Unfortunately, this can vary widely across targets: for example the width of
-a pointer is trivially target-specific. However, there is a <a
-href="http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt">clever
-way to use the getelementptr instruction</a> that allows you to compute this
-in a portable way.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="gcstack">Garbage Collected
-Stack Frames</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>Some languages want to explicitly manage their stack frames, often so that
-they are garbage collected or to allow easy implementation of closures. There
-are often better ways to implement these features than explicit stack frames,
-but <a
-href="http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt">LLVM
-does support them,</a> if you want. It requires your front-end to convert the
-code into <a
-href="http://en.wikipedia.org/wiki/Continuation-passing_style">Continuation
-Passing Style</a> and the use of tail calls (which LLVM also supports).</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/Makefile b/docs/tutorial/Makefile
deleted file mode 100644
index 9082ad4..0000000
--- a/docs/tutorial/Makefile
+++ /dev/null
@@ -1,28 +0,0 @@
-##===- docs/tutorial/Makefile ------------------------------*- Makefile -*-===##
-#
-# The LLVM Compiler Infrastructure
-#
-# This file is distributed under the University of Illinois Open Source
-# License. See LICENSE.TXT for details.
-#
-##===----------------------------------------------------------------------===##
-
-LEVEL := ../..
-include $(LEVEL)/Makefile.common
-
-HTML := $(wildcard $(PROJ_SRC_DIR)/*.html)
-EXTRA_DIST := $(HTML) index.html
-HTML_DIR := $(DESTDIR)$(PROJ_docsdir)/html/tutorial
-
-install-local:: $(HTML)
- $(Echo) Installing HTML Tutorial Documentation
- $(Verb) $(MKDIR) $(HTML_DIR)
- $(Verb) $(DataInstall) $(HTML) $(HTML_DIR)
- $(Verb) $(DataInstall) $(PROJ_SRC_DIR)/index.html $(HTML_DIR)
-
-uninstall-local::
- $(Echo) Uninstalling Tutorial Documentation
- $(Verb) $(RM) -rf $(HTML_DIR)
-
-printvars::
- $(Echo) "HTML : " '$(HTML)'
diff --git a/docs/tutorial/OCamlLangImpl1.html b/docs/tutorial/OCamlLangImpl1.html
deleted file mode 100644
index 98c1124..0000000
--- a/docs/tutorial/OCamlLangImpl1.html
+++ /dev/null
@@ -1,365 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Tutorial Introduction and the Lexer</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <meta name="author" content="Erick Tryzelaar">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Tutorial Introduction and the Lexer</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 1
- <ol>
- <li><a href="#intro">Tutorial Introduction</a></li>
- <li><a href="#language">The Basic Language</a></li>
- <li><a href="#lexer">The Lexer</a></li>
- </ol>
-</li>
-<li><a href="OCamlLangImpl2.html">Chapter 2</a>: Implementing a Parser and
-AST</li>
-</ul>
-
-<div class="doc_author">
- <p>
- Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
- and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a>
- </p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Tutorial Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to the "Implementing a language with LLVM" tutorial. This tutorial
-runs through the implementation of a simple language, showing how fun and
-easy it can be. This tutorial will get you up and started as well as help to
-build a framework you can extend to other languages. The code in this tutorial
-can also be used as a playground to hack on other LLVM specific things.
-</p>
-
-<p>
-The goal of this tutorial is to progressively unveil our language, describing
-how it is built up over time. This will let us cover a fairly broad range of
-language design and LLVM-specific usage issues, showing and explaining the code
-for it all along the way, without overwhelming you with tons of details up
-front.</p>
-
-<p>It is useful to point out ahead of time that this tutorial is really about
-teaching compiler techniques and LLVM specifically, <em>not</em> about teaching
-modern and sane software engineering principles. In practice, this means that
-we'll take a number of shortcuts to simplify the exposition. For example, the
-code leaks memory, uses global variables all over the place, doesn't use nice
-design patterns like <a
-href="http://en.wikipedia.org/wiki/Visitor_pattern">visitors</a>, etc... but it
-is very simple. If you dig in and use the code as a basis for future projects,
-fixing these deficiencies shouldn't be hard.</p>
-
-<p>I've tried to put this tutorial together in a way that makes chapters easy to
-skip over if you are already familiar with or are uninterested in the various
-pieces. The structure of the tutorial is:
-</p>
-
-<ul>
-<li><b><a href="#language">Chapter #1</a>: Introduction to the Kaleidoscope
-language, and the definition of its Lexer</b> - This shows where we are going
-and the basic functionality that we want it to do. In order to make this
-tutorial maximally understandable and hackable, we choose to implement
-everything in Objective Caml instead of using lexer and parser generators.
-LLVM obviously works just fine with such tools, feel free to use one if you
-prefer.</li>
-<li><b><a href="OCamlLangImpl2.html">Chapter #2</a>: Implementing a Parser and
-AST</b> - With the lexer in place, we can talk about parsing techniques and
-basic AST construction. This tutorial describes recursive descent parsing and
-operator precedence parsing. Nothing in Chapters 1 or 2 is LLVM-specific,
-the code doesn't even link in LLVM at this point. :)</li>
-<li><b><a href="OCamlLangImpl3.html">Chapter #3</a>: Code generation to LLVM
-IR</b> - With the AST ready, we can show off how easy generation of LLVM IR
-really is.</li>
-<li><b><a href="OCamlLangImpl4.html">Chapter #4</a>: Adding JIT and Optimizer
-Support</b> - Because a lot of people are interested in using LLVM as a JIT,
-we'll dive right into it and show you the 3 lines it takes to add JIT support.
-LLVM is also useful in many other ways, but this is one simple and "sexy" way
-to shows off its power. :)</li>
-<li><b><a href="OCamlLangImpl5.html">Chapter #5</a>: Extending the Language:
-Control Flow</b> - With the language up and running, we show how to extend it
-with control flow operations (if/then/else and a 'for' loop). This gives us a
-chance to talk about simple SSA construction and control flow.</li>
-<li><b><a href="OCamlLangImpl6.html">Chapter #6</a>: Extending the Language:
-User-defined Operators</b> - This is a silly but fun chapter that talks about
-extending the language to let the user program define their own arbitrary
-unary and binary operators (with assignable precedence!). This lets us build a
-significant piece of the "language" as library routines.</li>
-<li><b><a href="OCamlLangImpl7.html">Chapter #7</a>: Extending the Language:
-Mutable Variables</b> - This chapter talks about adding user-defined local
-variables along with an assignment operator. The interesting part about this
-is how easy and trivial it is to construct SSA form in LLVM: no, LLVM does
-<em>not</em> require your front-end to construct SSA form!</li>
-<li><b><a href="OCamlLangImpl8.html">Chapter #8</a>: Conclusion and other
-useful LLVM tidbits</b> - This chapter wraps up the series by talking about
-potential ways to extend the language, but also includes a bunch of pointers to
-info about "special topics" like adding garbage collection support, exceptions,
-debugging, support for "spaghetti stacks", and a bunch of other tips and
-tricks.</li>
-
-</ul>
-
-<p>By the end of the tutorial, we'll have written a bit less than 700 lines of
-non-comment, non-blank, lines of code. With this small amount of code, we'll
-have built up a very reasonable compiler for a non-trivial language including
-a hand-written lexer, parser, AST, as well as code generation support with a JIT
-compiler. While other systems may have interesting "hello world" tutorials,
-I think the breadth of this tutorial is a great testament to the strengths of
-LLVM and why you should consider it if you're interested in language or compiler
-design.</p>
-
-<p>A note about this tutorial: we expect you to extend the language and play
-with it on your own. Take the code and go crazy hacking away at it, compilers
-don't need to be scary creatures - it can be a lot of fun to play with
-languages!</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="language">The Basic Language</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>This tutorial will be illustrated with a toy language that we'll call
-"<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>" (derived
-from "meaning beautiful, form, and view").
-Kaleidoscope is a procedural language that allows you to define functions, use
-conditionals, math, etc. Over the course of the tutorial, we'll extend
-Kaleidoscope to support the if/then/else construct, a for loop, user defined
-operators, JIT compilation with a simple command line interface, etc.</p>
-
-<p>Because we want to keep things simple, the only datatype in Kaleidoscope is a
-64-bit floating point type (aka 'float' in O'Caml parlance). As such, all
-values are implicitly double precision and the language doesn't require type
-declarations. This gives the language a very nice and simple syntax. For
-example, the following simple example computes <a
-href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers:</a></p>
-
-<div class="doc_code">
-<pre>
-# Compute the x'th fibonacci number.
-def fib(x)
- if x &lt; 3 then
- 1
- else
- fib(x-1)+fib(x-2)
-
-# This expression will compute the 40th number.
-fib(40)
-</pre>
-</div>
-
-<p>We also allow Kaleidoscope to call into standard library functions (the LLVM
-JIT makes this completely trivial). This means that you can use the 'extern'
-keyword to define a function before you use it (this is also useful for mutually
-recursive functions). For example:</p>
-
-<div class="doc_code">
-<pre>
-extern sin(arg);
-extern cos(arg);
-extern atan2(arg1 arg2);
-
-atan2(sin(.4), cos(42))
-</pre>
-</div>
-
-<p>A more interesting example is included in Chapter 6 where we write a little
-Kaleidoscope application that <a href="OCamlLangImpl6.html#example">displays
-a Mandelbrot Set</a> at various levels of magnification.</p>
-
-<p>Lets dive into the implementation of this language!</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="lexer">The Lexer</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>When it comes to implementing a language, the first thing needed is
-the ability to process a text file and recognize what it says. The traditional
-way to do this is to use a "<a
-href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner')
-to break the input up into "tokens". Each token returned by the lexer includes
-a token code and potentially some metadata (e.g. the numeric value of a number).
-First, we define the possibilities:
-</p>
-
-<div class="doc_code">
-<pre>
-(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
- * these others for known things. *)
-type token =
- (* commands *)
- | Def | Extern
-
- (* primary *)
- | Ident of string | Number of float
-
- (* unknown *)
- | Kwd of char
-</pre>
-</div>
-
-<p>Each token returned by our lexer will be one of the token variant values.
-An unknown character like '+' will be returned as <tt>Token.Kwd '+'</tt>. If
-the curr token is an identifier, the value will be <tt>Token.Ident s</tt>. If
-the current token is a numeric literal (like 1.0), the value will be
-<tt>Token.Number 1.0</tt>.
-</p>
-
-<p>The actual implementation of the lexer is a collection of functions driven
-by a function named <tt>Lexer.lex</tt>. The <tt>Lexer.lex</tt> function is
-called to return the next token from standard input. We will use
-<a href="http://caml.inria.fr/pub/docs/manual-camlp4/index.html">Camlp4</a>
-to simplify the tokenization of the standard input. Its definition starts
-as:</p>
-
-<div class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer
- *===----------------------------------------------------------------------===*)
-
-let rec lex = parser
- (* Skip any whitespace. *)
- | [&lt; ' (' ' | '\n' | '\r' | '\t'); stream &gt;] -&gt; lex stream
-</pre>
-</div>
-
-<p>
-<tt>Lexer.lex</tt> works by recursing over a <tt>char Stream.t</tt> to read
-characters one at a time from the standard input. It eats them as it recognizes
-them and stores them in in a <tt>Token.token</tt> variant. The first thing that
-it has to do is ignore whitespace between tokens. This is accomplished with the
-recursive call above.</p>
-
-<p>The next thing <tt>Lexer.lex</tt> needs to do is recognize identifiers and
-specific keywords like "def". Kaleidoscope does this with a pattern match
-and a helper function.<p>
-
-<div class="doc_code">
-<pre>
- (* identifier: [a-zA-Z][a-zA-Z0-9] *)
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_ident buffer stream
-
-...
-
-and lex_ident buffer = parser
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_ident buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- match Buffer.contents buffer with
- | "def" -&gt; [&lt; 'Token.Def; stream &gt;]
- | "extern" -&gt; [&lt; 'Token.Extern; stream &gt;]
- | id -&gt; [&lt; 'Token.Ident id; stream &gt;]
-</pre>
-</div>
-
-<p>Numeric values are similar:</p>
-
-<div class="doc_code">
-<pre>
- (* number: [0-9.]+ *)
- | [&lt; ' ('0' .. '9' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_number buffer stream
-
-...
-
-and lex_number buffer = parser
- | [&lt; ' ('0' .. '9' | '.' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_number buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- [&lt; 'Token.Number (float_of_string (Buffer.contents buffer)); stream &gt;]
-</pre>
-</div>
-
-<p>This is all pretty straight-forward code for processing input. When reading
-a numeric value from input, we use the ocaml <tt>float_of_string</tt> function
-to convert it to a numeric value that we store in <tt>Token.Number</tt>. Note
-that this isn't doing sufficient error checking: it will raise <tt>Failure</tt>
-if the string "1.23.45.67". Feel free to extend it :). Next we handle
-comments:
-</p>
-
-<div class="doc_code">
-<pre>
- (* Comment until end of line. *)
- | [&lt; ' ('#'); stream &gt;] -&gt;
- lex_comment stream
-
-...
-
-and lex_comment = parser
- | [&lt; ' ('\n'); stream=lex &gt;] -&gt; stream
- | [&lt; 'c; e=lex_comment &gt;] -&gt; e
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-</pre>
-</div>
-
-<p>We handle comments by skipping to the end of the line and then return the
-next token. Finally, if the input doesn't match one of the above cases, it is
-either an operator character like '+' or the end of the file. These are handled
-with this code:</p>
-
-<div class="doc_code">
-<pre>
- (* Otherwise, just return the character as its ascii value. *)
- | [&lt; 'c; stream &gt;] -&gt;
- [&lt; 'Token.Kwd c; lex stream &gt;]
-
- (* end of stream. *)
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-</pre>
-</div>
-
-<p>With this, we have the complete lexer for the basic Kaleidoscope language
-(the <a href="OCamlLangImpl2.html#code">full code listing</a> for the Lexer is
-available in the <a href="OCamlLangImpl2.html">next chapter</a> of the
-tutorial). Next we'll <a href="OCamlLangImpl2.html">build a simple parser that
-uses this to build an Abstract Syntax Tree</a>. When we have that, we'll
-include a driver so that you can use the lexer and parser together.
-</p>
-
-<a href="OCamlLangImpl2.html">Next: Implementing a Parser and AST</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/OCamlLangImpl2.html b/docs/tutorial/OCamlLangImpl2.html
deleted file mode 100644
index 6665109..0000000
--- a/docs/tutorial/OCamlLangImpl2.html
+++ /dev/null
@@ -1,1045 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Implementing a Parser and AST</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <meta name="author" content="Erick Tryzelaar">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 2
- <ol>
- <li><a href="#intro">Chapter 2 Introduction</a></li>
- <li><a href="#ast">The Abstract Syntax Tree (AST)</a></li>
- <li><a href="#parserbasics">Parser Basics</a></li>
- <li><a href="#parserprimexprs">Basic Expression Parsing</a></li>
- <li><a href="#parserbinops">Binary Expression Parsing</a></li>
- <li><a href="#parsertop">Parsing the Rest</a></li>
- <li><a href="#driver">The Driver</a></li>
- <li><a href="#conclusions">Conclusions</a></li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="OCamlLangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li>
-</ul>
-
-<div class="doc_author">
- <p>
- Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
- and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a>
- </p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 2 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language
-with LLVM in Objective Caml</a>" tutorial. This chapter shows you how to use
-the lexer, built in <a href="OCamlLangImpl1.html">Chapter 1</a>, to build a
-full <a href="http://en.wikipedia.org/wiki/Parsing">parser</a> for our
-Kaleidoscope language. Once we have a parser, we'll define and build an <a
-href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax
-Tree</a> (AST).</p>
-
-<p>The parser we will build uses a combination of <a
-href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent
-Parsing</a> and <a href=
-"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
-Parsing</a> to parse the Kaleidoscope language (the latter for
-binary expressions and the former for everything else). Before we get to
-parsing though, lets talk about the output of the parser: the Abstract Syntax
-Tree.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>The AST for a program captures its behavior in such a way that it is easy for
-later stages of the compiler (e.g. code generation) to interpret. We basically
-want one object for each construct in the language, and the AST should closely
-model the language. In Kaleidoscope, we have expressions, a prototype, and a
-function object. We'll start with expressions first:</p>
-
-<div class="doc_code">
-<pre>
-(* expr - Base type for all expression nodes. *)
-type expr =
- (* variant for numeric literals like "1.0". *)
- | Number of float
-</pre>
-</div>
-
-<p>The code above shows the definition of the base ExprAST class and one
-subclass which we use for numeric literals. The important thing to note about
-this code is that the Number variant captures the numeric value of the
-literal as an instance variable. This allows later phases of the compiler to
-know what the stored numeric value is.</p>
-
-<p>Right now we only create the AST, so there are no useful functions on
-them. It would be very easy to add a function to pretty print the code,
-for example. Here are the other expression AST node definitions that we'll use
-in the basic form of the Kaleidoscope language:
-</p>
-
-<div class="doc_code">
-<pre>
- (* variant for referencing a variable, like "a". *)
- | Variable of string
-
- (* variant for a binary operator. *)
- | Binary of char * expr * expr
-
- (* variant for function calls. *)
- | Call of string * expr array
-</pre>
-</div>
-
-<p>This is all (intentionally) rather straight-forward: variables capture the
-variable name, binary operators capture their opcode (e.g. '+'), and calls
-capture a function name as well as a list of any argument expressions. One thing
-that is nice about our AST is that it captures the language features without
-talking about the syntax of the language. Note that there is no discussion about
-precedence of binary operators, lexical structure, etc.</p>
-
-<p>For our basic language, these are all of the expression nodes we'll define.
-Because it doesn't have conditional control flow, it isn't Turing-complete;
-we'll fix that in a later installment. The two things we need next are a way
-to talk about the interface to a function, and a way to talk about functions
-themselves:</p>
-
-<div class="doc_code">
-<pre>
-(* proto - This type represents the "prototype" for a function, which captures
- * its name, and its argument names (thus implicitly the number of arguments the
- * function takes). *)
-type proto = Prototype of string * string array
-
-(* func - This type represents a function definition itself. *)
-type func = Function of proto * expr
-</pre>
-</div>
-
-<p>In Kaleidoscope, functions are typed with just a count of their arguments.
-Since all values are double precision floating point, the type of each argument
-doesn't need to be stored anywhere. In a more aggressive and realistic
-language, the "expr" variants would probably have a type field.</p>
-
-<p>With this scaffolding, we can now talk about parsing expressions and function
-bodies in Kaleidoscope.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="parserbasics">Parser Basics</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Now that we have an AST to build, we need to define the parser code to build
-it. The idea here is that we want to parse something like "x+y" (which is
-returned as three tokens by the lexer) into an AST that could be generated with
-calls like this:</p>
-
-<div class="doc_code">
-<pre>
- let x = Variable "x" in
- let y = Variable "y" in
- let result = Binary ('+', x, y) in
- ...
-</pre>
-</div>
-
-<p>
-The error handling routines make use of the builtin <tt>Stream.Failure</tt> and
-<tt>Stream.Error</tt>s. <tt>Stream.Failure</tt> is raised when the parser is
-unable to find any matching token in the first position of a pattern.
-<tt>Stream.Error</tt> is raised when the first token matches, but the rest do
-not. The error recovery in our parser will not be the best and is not
-particular user-friendly, but it will be enough for our tutorial. These
-exceptions make it easier to handle errors in routines that have various return
-types.</p>
-
-<p>With these basic types and exceptions, we can implement the first
-piece of our grammar: numeric literals.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="parserprimexprs">Basic Expression
- Parsing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>We start with numeric literals, because they are the simplest to process.
-For each production in our grammar, we'll define a function which parses that
-production. We call this class of expressions "primary" expressions, for
-reasons that will become more clear <a href="OCamlLangImpl6.html#unary">
-later in the tutorial</a>. In order to parse an arbitrary primary expression,
-we need to determine what sort of expression it is. For numeric literals, we
-have:</p>
-
-<div class="doc_code">
-<pre>
-(* primary
- * ::= identifier
- * ::= numberexpr
- * ::= parenexpr *)
-parse_primary = parser
- (* numberexpr ::= number *)
- | [&lt; 'Token.Number n &gt;] -&gt; Ast.Number n
-</pre>
-</div>
-
-<p>This routine is very simple: it expects to be called when the current token
-is a <tt>Token.Number</tt> token. It takes the current number value, creates
-a <tt>Ast.Number</tt> node, advances the lexer to the next token, and finally
-returns.</p>
-
-<p>There are some interesting aspects to this. The most important one is that
-this routine eats all of the tokens that correspond to the production and
-returns the lexer buffer with the next token (which is not part of the grammar
-production) ready to go. This is a fairly standard way to go for recursive
-descent parsers. For a better example, the parenthesis operator is defined like
-this:</p>
-
-<div class="doc_code">
-<pre>
- (* parenexpr ::= '(' expression ')' *)
- | [&lt; 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" &gt;] -&gt; e
-</pre>
-</div>
-
-<p>This function illustrates a number of interesting things about the
-parser:</p>
-
-<p>
-1) It shows how we use the <tt>Stream.Error</tt> exception. When called, this
-function expects that the current token is a '(' token, but after parsing the
-subexpression, it is possible that there is no ')' waiting. For example, if
-the user types in "(4 x" instead of "(4)", the parser should emit an error.
-Because errors can occur, the parser needs a way to indicate that they
-happened. In our parser, we use the camlp4 shortcut syntax <tt>token ?? "parse
-error"</tt>, where if the token before the <tt>??</tt> does not match, then
-<tt>Stream.Error "parse error"</tt> will be raised.</p>
-
-<p>2) Another interesting aspect of this function is that it uses recursion by
-calling <tt>Parser.parse_primary</tt> (we will soon see that
-<tt>Parser.parse_primary</tt> can call <tt>Parser.parse_primary</tt>). This is
-powerful because it allows us to handle recursive grammars, and keeps each
-production very simple. Note that parentheses do not cause construction of AST
-nodes themselves. While we could do it this way, the most important role of
-parentheses are to guide the parser and provide grouping. Once the parser
-constructs the AST, parentheses are not needed.</p>
-
-<p>The next simple production is for handling variable references and function
-calls:</p>
-
-<div class="doc_code">
-<pre>
- (* identifierexpr
- * ::= identifier
- * ::= identifier '(' argumentexpr ')' *)
- | [&lt; 'Token.Ident id; stream &gt;] -&gt;
- let rec parse_args accumulator = parser
- | [&lt; e=parse_expr; stream &gt;] -&gt;
- begin parser
- | [&lt; 'Token.Kwd ','; e=parse_args (e :: accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; e :: accumulator
- end stream
- | [&lt; &gt;] -&gt; accumulator
- in
- let rec parse_ident id = parser
- (* Call. *)
- | [&lt; 'Token.Kwd '(';
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')'"&gt;] -&gt;
- Ast.Call (id, Array.of_list (List.rev args))
-
- (* Simple variable ref. *)
- | [&lt; &gt;] -&gt; Ast.Variable id
- in
- parse_ident id stream
-</pre>
-</div>
-
-<p>This routine follows the same style as the other routines. (It expects to be
-called if the current token is a <tt>Token.Ident</tt> token). It also has
-recursion and error handling. One interesting aspect of this is that it uses
-<em>look-ahead</em> to determine if the current identifier is a stand alone
-variable reference or if it is a function call expression. It handles this by
-checking to see if the token after the identifier is a '(' token, constructing
-either a <tt>Ast.Variable</tt> or <tt>Ast.Call</tt> node as appropriate.
-</p>
-
-<p>We finish up by raising an exception if we received a token we didn't
-expect:</p>
-
-<div class="doc_code">
-<pre>
- | [&lt; &gt;] -&gt; raise (Stream.Error "unknown token when expecting an expression.")
-</pre>
-</div>
-
-<p>Now that basic expressions are handled, we need to handle binary expressions.
-They are a bit more complex.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="parserbinops">Binary Expression
- Parsing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Binary expressions are significantly harder to parse because they are often
-ambiguous. For example, when given the string "x+y*z", the parser can choose
-to parse it as either "(x+y)*z" or "x+(y*z)". With common definitions from
-mathematics, we expect the later parse, because "*" (multiplication) has
-higher <em>precedence</em> than "+" (addition).</p>
-
-<p>There are many ways to handle this, but an elegant and efficient way is to
-use <a href=
-"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence
-Parsing</a>. This parsing technique uses the precedence of binary operators to
-guide recursion. To start with, we need a table of precedences:</p>
-
-<div class="doc_code">
-<pre>
-(* binop_precedence - This holds the precedence for each binary operator that is
- * defined *)
-let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
-
-(* precedence - Get the precedence of the pending binary operator token. *)
-let precedence c = try Hashtbl.find binop_precedence c with Not_found -&gt; -1
-
-...
-
-let main () =
- (* Install standard binary operators.
- * 1 is the lowest precedence. *)
- Hashtbl.add Parser.binop_precedence '&lt;' 10;
- Hashtbl.add Parser.binop_precedence '+' 20;
- Hashtbl.add Parser.binop_precedence '-' 20;
- Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
- ...
-</pre>
-</div>
-
-<p>For the basic form of Kaleidoscope, we will only support 4 binary operators
-(this can obviously be extended by you, our brave and intrepid reader). The
-<tt>Parser.precedence</tt> function returns the precedence for the current
-token, or -1 if the token is not a binary operator. Having a <tt>Hashtbl.t</tt>
-makes it easy to add new operators and makes it clear that the algorithm doesn't
-depend on the specific operators involved, but it would be easy enough to
-eliminate the <tt>Hashtbl.t</tt> and do the comparisons in the
-<tt>Parser.precedence</tt> function. (Or just use a fixed-size array).</p>
-
-<p>With the helper above defined, we can now start parsing binary expressions.
-The basic idea of operator precedence parsing is to break down an expression
-with potentially ambiguous binary operators into pieces. Consider ,for example,
-the expression "a+b+(c+d)*e*f+g". Operator precedence parsing considers this
-as a stream of primary expressions separated by binary operators. As such,
-it will first parse the leading primary expression "a", then it will see the
-pairs [+, b] [+, (c+d)] [*, e] [*, f] and [+, g]. Note that because parentheses
-are primary expressions, the binary expression parser doesn't need to worry
-about nested subexpressions like (c+d) at all.
-</p>
-
-<p>
-To start, an expression is a primary expression potentially followed by a
-sequence of [binop,primaryexpr] pairs:</p>
-
-<div class="doc_code">
-<pre>
-(* expression
- * ::= primary binoprhs *)
-and parse_expr = parser
- | [&lt; lhs=parse_primary; stream &gt;] -&gt; parse_bin_rhs 0 lhs stream
-</pre>
-</div>
-
-<p><tt>Parser.parse_bin_rhs</tt> is the function that parses the sequence of
-pairs for us. It takes a precedence and a pointer to an expression for the part
-that has been parsed so far. Note that "x" is a perfectly valid expression: As
-such, "binoprhs" is allowed to be empty, in which case it returns the expression
-that is passed into it. In our example above, the code passes the expression for
-"a" into <tt>Parser.parse_bin_rhs</tt> and the current token is "+".</p>
-
-<p>The precedence value passed into <tt>Parser.parse_bin_rhs</tt> indicates the
-<em>minimal operator precedence</em> that the function is allowed to eat. For
-example, if the current pair stream is [+, x] and <tt>Parser.parse_bin_rhs</tt>
-is passed in a precedence of 40, it will not consume any tokens (because the
-precedence of '+' is only 20). With this in mind, <tt>Parser.parse_bin_rhs</tt>
-starts with:</p>
-
-<div class="doc_code">
-<pre>
-(* binoprhs
- * ::= ('+' primary)* *)
-and parse_bin_rhs expr_prec lhs stream =
- match Stream.peek stream with
- (* If this is a binop, find its precedence. *)
- | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -&gt;
- let token_prec = precedence c in
-
- (* If this is a binop that binds at least as tightly as the current binop,
- * consume it, otherwise we are done. *)
- if token_prec &lt; expr_prec then lhs else begin
-</pre>
-</div>
-
-<p>This code gets the precedence of the current token and checks to see if if is
-too low. Because we defined invalid tokens to have a precedence of -1, this
-check implicitly knows that the pair-stream ends when the token stream runs out
-of binary operators. If this check succeeds, we know that the token is a binary
-operator and that it will be included in this expression:</p>
-
-<div class="doc_code">
-<pre>
- (* Eat the binop. *)
- Stream.junk stream;
-
- (* Okay, we know this is a binop. *)
- let rhs =
- match Stream.peek stream with
- | Some (Token.Kwd c2) -&gt;
-</pre>
-</div>
-
-<p>As such, this code eats (and remembers) the binary operator and then parses
-the primary expression that follows. This builds up the whole pair, the first of
-which is [+, b] for the running example.</p>
-
-<p>Now that we parsed the left-hand side of an expression and one pair of the
-RHS sequence, we have to decide which way the expression associates. In
-particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)".
-To determine this, we look ahead at "binop" to determine its precedence and
-compare it to BinOp's precedence (which is '+' in this case):</p>
-
-<div class="doc_code">
-<pre>
- (* If BinOp binds less tightly with rhs than the operator after
- * rhs, let the pending operator take rhs as its lhs. *)
- let next_prec = precedence c2 in
- if token_prec &lt; next_prec
-</pre>
-</div>
-
-<p>If the precedence of the binop to the right of "RHS" is lower or equal to the
-precedence of our current operator, then we know that the parentheses associate
-as "(a+b) binop ...". In our example, the current operator is "+" and the next
-operator is "+", we know that they have the same precedence. In this case we'll
-create the AST node for "a+b", and then continue parsing:</p>
-
-<div class="doc_code">
-<pre>
- ... if body omitted ...
- in
-
- (* Merge lhs/rhs. *)
- let lhs = Ast.Binary (c, lhs, rhs) in
- parse_bin_rhs expr_prec lhs stream
- end
-</pre>
-</div>
-
-<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next
-iteration of the loop, with "+" as the current token. The code above will eat,
-remember, and parse "(c+d)" as the primary expression, which makes the
-current pair equal to [+, (c+d)]. It will then evaluate the 'if' conditional above with
-"*" as the binop to the right of the primary. In this case, the precedence of "*" is
-higher than the precedence of "+" so the if condition will be entered.</p>
-
-<p>The critical question left here is "how can the if condition parse the right
-hand side in full"? In particular, to build the AST correctly for our example,
-it needs to get all of "(c+d)*e*f" as the RHS expression variable. The code to
-do this is surprisingly simple (code from the above two blocks duplicated for
-context):</p>
-
-<div class="doc_code">
-<pre>
- match Stream.peek stream with
- | Some (Token.Kwd c2) -&gt;
- (* If BinOp binds less tightly with rhs than the operator after
- * rhs, let the pending operator take rhs as its lhs. *)
- if token_prec &lt; precedence c2
- then <b>parse_bin_rhs (token_prec + 1) rhs stream</b>
- else rhs
- | _ -&gt; rhs
- in
-
- (* Merge lhs/rhs. *)
- let lhs = Ast.Binary (c, lhs, rhs) in
- parse_bin_rhs expr_prec lhs stream
- end
-</pre>
-</div>
-
-<p>At this point, we know that the binary operator to the RHS of our primary
-has higher precedence than the binop we are currently parsing. As such, we know
-that any sequence of pairs whose operators are all higher precedence than "+"
-should be parsed together and returned as "RHS". To do this, we recursively
-invoke the <tt>Parser.parse_bin_rhs</tt> function specifying "token_prec+1" as
-the minimum precedence required for it to continue. In our example above, this
-will cause it to return the AST node for "(c+d)*e*f" as RHS, which is then set
-as the RHS of the '+' expression.</p>
-
-<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed
-and added to the AST. With this little bit of code (14 non-trivial lines), we
-correctly handle fully general binary expression parsing in a very elegant way.
-This was a whirlwind tour of this code, and it is somewhat subtle. I recommend
-running through it with a few tough examples to see how it works.
-</p>
-
-<p>This wraps up handling of expressions. At this point, we can point the
-parser at an arbitrary token stream and build an expression from it, stopping
-at the first token that is not part of the expression. Next up we need to
-handle function definitions, etc.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="parsertop">Parsing the Rest</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-The next thing missing is handling of function prototypes. In Kaleidoscope,
-these are used both for 'extern' function declarations as well as function body
-definitions. The code to do this is straight-forward and not very interesting
-(once you've survived expressions):
-</p>
-
-<div class="doc_code">
-<pre>
-(* prototype
- * ::= id '(' id* ')' *)
-let parse_prototype =
- let rec parse_args accumulator = parser
- | [&lt; 'Token.Ident id; e=parse_args (id::accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
- in
-
- parser
- | [&lt; 'Token.Ident id;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- (* success. *)
- Ast.Prototype (id, Array.of_list (List.rev args))
-
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected function name in prototype")
-</pre>
-</div>
-
-<p>Given this, a function definition is very simple, just a prototype plus
-an expression to implement the body:</p>
-
-<div class="doc_code">
-<pre>
-(* definition ::= 'def' prototype expression *)
-let parse_definition = parser
- | [&lt; 'Token.Def; p=parse_prototype; e=parse_expr &gt;] -&gt;
- Ast.Function (p, e)
-</pre>
-</div>
-
-<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as
-well as to support forward declaration of user functions. These 'extern's are just
-prototypes with no body:</p>
-
-<div class="doc_code">
-<pre>
-(* external ::= 'extern' prototype *)
-let parse_extern = parser
- | [&lt; 'Token.Extern; e=parse_prototype &gt;] -&gt; e
-</pre>
-</div>
-
-<p>Finally, we'll also let the user type in arbitrary top-level expressions and
-evaluate them on the fly. We will handle this by defining anonymous nullary
-(zero argument) functions for them:</p>
-
-<div class="doc_code">
-<pre>
-(* toplevelexpr ::= expression *)
-let parse_toplevel = parser
- | [&lt; e=parse_expr &gt;] -&gt;
- (* Make an anonymous proto. *)
- Ast.Function (Ast.Prototype ("", [||]), e)
-</pre>
-</div>
-
-<p>Now that we have all the pieces, let's build a little driver that will let us
-actually <em>execute</em> this code we've built!</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="driver">The Driver</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>The driver for this simply invokes all of the parsing pieces with a top-level
-dispatch loop. There isn't much interesting here, so I'll just include the
-top-level loop. See <a href="#code">below</a> for full code in the "Top-Level
-Parsing" section.</p>
-
-<div class="doc_code">
-<pre>
-(* top ::= definition | external | expression | ';' *)
-let rec main_loop stream =
- match Stream.peek stream with
- | None -&gt; ()
-
- (* ignore top-level semicolons. *)
- | Some (Token.Kwd ';') -&gt;
- Stream.junk stream;
- main_loop stream
-
- | Some token -&gt;
- begin
- try match token with
- | Token.Def -&gt;
- ignore(Parser.parse_definition stream);
- print_endline "parsed a function definition.";
- | Token.Extern -&gt;
- ignore(Parser.parse_extern stream);
- print_endline "parsed an extern.";
- | _ -&gt;
- (* Evaluate a top-level expression into an anonymous function. *)
- ignore(Parser.parse_toplevel stream);
- print_endline "parsed a top-level expr";
- with Stream.Error s -&gt;
- (* Skip token for error recovery. *)
- Stream.junk stream;
- print_endline s;
- end;
- print_string "ready&gt; "; flush stdout;
- main_loop stream
-</pre>
-</div>
-
-<p>The most interesting part of this is that we ignore top-level semicolons.
-Why is this, you ask? The basic reason is that if you type "4 + 5" at the
-command line, the parser doesn't know whether that is the end of what you will type
-or not. For example, on the next line you could type "def foo..." in which case
-4+5 is the end of a top-level expression. Alternatively you could type "* 6",
-which would continue the expression. Having top-level semicolons allows you to
-type "4+5;", and the parser will know you are done.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="conclusions">Conclusions</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>With just under 300 lines of commented code (240 lines of non-comment,
-non-blank code), we fully defined our minimal language, including a lexer,
-parser, and AST builder. With this done, the executable will validate
-Kaleidoscope code and tell us if it is grammatically invalid. For
-example, here is a sample interaction:</p>
-
-<div class="doc_code">
-<pre>
-$ <b>./toy.byte</b>
-ready&gt; <b>def foo(x y) x+foo(y, 4.0);</b>
-Parsed a function definition.
-ready&gt; <b>def foo(x y) x+y y;</b>
-Parsed a function definition.
-Parsed a top-level expr
-ready&gt; <b>def foo(x y) x+y );</b>
-Parsed a function definition.
-Error: unknown token when expecting an expression
-ready&gt; <b>extern sin(a);</b>
-ready&gt; Parsed an extern
-ready&gt; <b>^D</b>
-$
-</pre>
-</div>
-
-<p>There is a lot of room for extension here. You can define new AST nodes,
-extend the language in many ways, etc. In the <a href="OCamlLangImpl3.html">
-next installment</a>, we will describe how to generate LLVM Intermediate
-Representation (IR) from the AST.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for this and the previous chapter.
-Note that it is fully self-contained: you don't need LLVM or any external
-libraries at all for this. (Besides the ocaml standard libraries, of
-course.) To build this, just compile with:</p>
-
-<div class="doc_code">
-<pre>
-# Compile
-ocamlbuild toy.byte
-# Run
-./toy.byte
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<dl>
-<dt>_tags:</dt>
-<dd class="doc_code">
-<pre>
-&lt;{lexer,parser}.ml&gt;: use_camlp4, pp(camlp4of)
-</pre>
-</dd>
-
-<dt>token.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer Tokens
- *===----------------------------------------------------------------------===*)
-
-(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
- * these others for known things. *)
-type token =
- (* commands *)
- | Def | Extern
-
- (* primary *)
- | Ident of string | Number of float
-
- (* unknown *)
- | Kwd of char
-</pre>
-</dd>
-
-<dt>lexer.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer
- *===----------------------------------------------------------------------===*)
-
-let rec lex = parser
- (* Skip any whitespace. *)
- | [&lt; ' (' ' | '\n' | '\r' | '\t'); stream &gt;] -&gt; lex stream
-
- (* identifier: [a-zA-Z][a-zA-Z0-9] *)
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_ident buffer stream
-
- (* number: [0-9.]+ *)
- | [&lt; ' ('0' .. '9' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_number buffer stream
-
- (* Comment until end of line. *)
- | [&lt; ' ('#'); stream &gt;] -&gt;
- lex_comment stream
-
- (* Otherwise, just return the character as its ascii value. *)
- | [&lt; 'c; stream &gt;] -&gt;
- [&lt; 'Token.Kwd c; lex stream &gt;]
-
- (* end of stream. *)
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-
-and lex_number buffer = parser
- | [&lt; ' ('0' .. '9' | '.' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_number buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- [&lt; 'Token.Number (float_of_string (Buffer.contents buffer)); stream &gt;]
-
-and lex_ident buffer = parser
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_ident buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- match Buffer.contents buffer with
- | "def" -&gt; [&lt; 'Token.Def; stream &gt;]
- | "extern" -&gt; [&lt; 'Token.Extern; stream &gt;]
- | id -&gt; [&lt; 'Token.Ident id; stream &gt;]
-
-and lex_comment = parser
- | [&lt; ' ('\n'); stream=lex &gt;] -&gt; stream
- | [&lt; 'c; e=lex_comment &gt;] -&gt; e
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-</pre>
-</dd>
-
-<dt>ast.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Abstract Syntax Tree (aka Parse Tree)
- *===----------------------------------------------------------------------===*)
-
-(* expr - Base type for all expression nodes. *)
-type expr =
- (* variant for numeric literals like "1.0". *)
- | Number of float
-
- (* variant for referencing a variable, like "a". *)
- | Variable of string
-
- (* variant for a binary operator. *)
- | Binary of char * expr * expr
-
- (* variant for function calls. *)
- | Call of string * expr array
-
-(* proto - This type represents the "prototype" for a function, which captures
- * its name, and its argument names (thus implicitly the number of arguments the
- * function takes). *)
-type proto = Prototype of string * string array
-
-(* func - This type represents a function definition itself. *)
-type func = Function of proto * expr
-</pre>
-</dd>
-
-<dt>parser.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===---------------------------------------------------------------------===
- * Parser
- *===---------------------------------------------------------------------===*)
-
-(* binop_precedence - This holds the precedence for each binary operator that is
- * defined *)
-let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
-
-(* precedence - Get the precedence of the pending binary operator token. *)
-let precedence c = try Hashtbl.find binop_precedence c with Not_found -&gt; -1
-
-(* primary
- * ::= identifier
- * ::= numberexpr
- * ::= parenexpr *)
-let rec parse_primary = parser
- (* numberexpr ::= number *)
- | [&lt; 'Token.Number n &gt;] -&gt; Ast.Number n
-
- (* parenexpr ::= '(' expression ')' *)
- | [&lt; 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" &gt;] -&gt; e
-
- (* identifierexpr
- * ::= identifier
- * ::= identifier '(' argumentexpr ')' *)
- | [&lt; 'Token.Ident id; stream &gt;] -&gt;
- let rec parse_args accumulator = parser
- | [&lt; e=parse_expr; stream &gt;] -&gt;
- begin parser
- | [&lt; 'Token.Kwd ','; e=parse_args (e :: accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; e :: accumulator
- end stream
- | [&lt; &gt;] -&gt; accumulator
- in
- let rec parse_ident id = parser
- (* Call. *)
- | [&lt; 'Token.Kwd '(';
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')'"&gt;] -&gt;
- Ast.Call (id, Array.of_list (List.rev args))
-
- (* Simple variable ref. *)
- | [&lt; &gt;] -&gt; Ast.Variable id
- in
- parse_ident id stream
-
- | [&lt; &gt;] -&gt; raise (Stream.Error "unknown token when expecting an expression.")
-
-(* binoprhs
- * ::= ('+' primary)* *)
-and parse_bin_rhs expr_prec lhs stream =
- match Stream.peek stream with
- (* If this is a binop, find its precedence. *)
- | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -&gt;
- let token_prec = precedence c in
-
- (* If this is a binop that binds at least as tightly as the current binop,
- * consume it, otherwise we are done. *)
- if token_prec &lt; expr_prec then lhs else begin
- (* Eat the binop. *)
- Stream.junk stream;
-
- (* Parse the primary expression after the binary operator. *)
- let rhs = parse_primary stream in
-
- (* Okay, we know this is a binop. *)
- let rhs =
- match Stream.peek stream with
- | Some (Token.Kwd c2) -&gt;
- (* If BinOp binds less tightly with rhs than the operator after
- * rhs, let the pending operator take rhs as its lhs. *)
- let next_prec = precedence c2 in
- if token_prec &lt; next_prec
- then parse_bin_rhs (token_prec + 1) rhs stream
- else rhs
- | _ -&gt; rhs
- in
-
- (* Merge lhs/rhs. *)
- let lhs = Ast.Binary (c, lhs, rhs) in
- parse_bin_rhs expr_prec lhs stream
- end
- | _ -&gt; lhs
-
-(* expression
- * ::= primary binoprhs *)
-and parse_expr = parser
- | [&lt; lhs=parse_primary; stream &gt;] -&gt; parse_bin_rhs 0 lhs stream
-
-(* prototype
- * ::= id '(' id* ')' *)
-let parse_prototype =
- let rec parse_args accumulator = parser
- | [&lt; 'Token.Ident id; e=parse_args (id::accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
- in
-
- parser
- | [&lt; 'Token.Ident id;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- (* success. *)
- Ast.Prototype (id, Array.of_list (List.rev args))
-
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected function name in prototype")
-
-(* definition ::= 'def' prototype expression *)
-let parse_definition = parser
- | [&lt; 'Token.Def; p=parse_prototype; e=parse_expr &gt;] -&gt;
- Ast.Function (p, e)
-
-(* toplevelexpr ::= expression *)
-let parse_toplevel = parser
- | [&lt; e=parse_expr &gt;] -&gt;
- (* Make an anonymous proto. *)
- Ast.Function (Ast.Prototype ("", [||]), e)
-
-(* external ::= 'extern' prototype *)
-let parse_extern = parser
- | [&lt; 'Token.Extern; e=parse_prototype &gt;] -&gt; e
-</pre>
-</dd>
-
-<dt>toplevel.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Top-Level parsing and JIT Driver
- *===----------------------------------------------------------------------===*)
-
-(* top ::= definition | external | expression | ';' *)
-let rec main_loop stream =
- match Stream.peek stream with
- | None -&gt; ()
-
- (* ignore top-level semicolons. *)
- | Some (Token.Kwd ';') -&gt;
- Stream.junk stream;
- main_loop stream
-
- | Some token -&gt;
- begin
- try match token with
- | Token.Def -&gt;
- ignore(Parser.parse_definition stream);
- print_endline "parsed a function definition.";
- | Token.Extern -&gt;
- ignore(Parser.parse_extern stream);
- print_endline "parsed an extern.";
- | _ -&gt;
- (* Evaluate a top-level expression into an anonymous function. *)
- ignore(Parser.parse_toplevel stream);
- print_endline "parsed a top-level expr";
- with Stream.Error s -&gt;
- (* Skip token for error recovery. *)
- Stream.junk stream;
- print_endline s;
- end;
- print_string "ready&gt; "; flush stdout;
- main_loop stream
-</pre>
-</dd>
-
-<dt>toy.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Main driver code.
- *===----------------------------------------------------------------------===*)
-
-let main () =
- (* Install standard binary operators.
- * 1 is the lowest precedence. *)
- Hashtbl.add Parser.binop_precedence '&lt;' 10;
- Hashtbl.add Parser.binop_precedence '+' 20;
- Hashtbl.add Parser.binop_precedence '-' 20;
- Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
-
- (* Prime the first token. *)
- print_string "ready&gt; "; flush stdout;
- let stream = Lexer.lex (Stream.of_channel stdin) in
-
- (* Run the main "interpreter loop" now. *)
- Toplevel.main_loop stream;
-;;
-
-main ()
-</pre>
-</dd>
-</dl>
-
-<a href="OCamlLangImpl3.html">Next: Implementing Code Generation to LLVM IR</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a>
- <a href="mailto:erickt@users.sourceforge.net">Erick Tryzelaar</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/OCamlLangImpl3.html b/docs/tutorial/OCamlLangImpl3.html
deleted file mode 100644
index febd7f5..0000000
--- a/docs/tutorial/OCamlLangImpl3.html
+++ /dev/null
@@ -1,1093 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <meta name="author" content="Erick Tryzelaar">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 3
- <ol>
- <li><a href="#intro">Chapter 3 Introduction</a></li>
- <li><a href="#basics">Code Generation Setup</a></li>
- <li><a href="#exprs">Expression Code Generation</a></li>
- <li><a href="#funcs">Function Code Generation</a></li>
- <li><a href="#driver">Driver Changes and Closing Thoughts</a></li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="OCamlLangImpl4.html">Chapter 4</a>: Adding JIT and Optimizer
-Support</li>
-</ul>
-
-<div class="doc_author">
- <p>
- Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
- and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a>
- </p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 3 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. This chapter shows you how to transform the <a
-href="OCamlLangImpl2.html">Abstract Syntax Tree</a>, built in Chapter 2, into
-LLVM IR. This will teach you a little bit about how LLVM does things, as well
-as demonstrate how easy it is to use. It's much more work to build a lexer and
-parser than it is to generate LLVM IR code. :)
-</p>
-
-<p><b>Please note</b>: the code in this chapter and later require LLVM 2.3 or
-LLVM SVN to work. LLVM 2.2 and before will not work with it.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="basics">Code Generation Setup</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-In order to generate LLVM IR, we want some simple setup to get started. First
-we define virtual code generation (codegen) methods in each AST class:</p>
-
-<div class="doc_code">
-<pre>
-let rec codegen_expr = function
- | Ast.Number n -&gt; ...
- | Ast.Variable name -&gt; ...
-</pre>
-</div>
-
-<p>The <tt>Codegen.codegen_expr</tt> function says to emit IR for that AST node
-along with all the things it depends on, and they all return an LLVM Value
-object. "Value" is the class used to represent a "<a
-href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
-Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
-of SSA values is that their value is computed as the related instruction
-executes, and it does not get a new value until (and if) the instruction
-re-executes. In other words, there is no way to "change" an SSA value. For
-more information, please read up on <a
-href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
-Assignment</a> - the concepts are really quite natural once you grok them.</p>
-
-<p>The
-second thing we want is an "Error" exception like we used for the parser, which
-will be used to report errors found during code generation (for example, use of
-an undeclared parameter):</p>
-
-<div class="doc_code">
-<pre>
-exception Error of string
-
-let the_module = create_module (global_context ()) "my cool jit"
-let builder = builder (global_context ())
-let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
-let double_type = double_type context
-</pre>
-</div>
-
-<p>The static variables will be used during code generation.
-<tt>Codgen.the_module</tt> is the LLVM construct that contains all of the
-functions and global variables in a chunk of code. In many ways, it is the
-top-level structure that the LLVM IR uses to contain code.</p>
-
-<p>The <tt>Codegen.builder</tt> object is a helper object that makes it easy to
-generate LLVM instructions. Instances of the <a
-href="http://llvm.org/doxygen/IRBuilder_8h-source.html"><tt>IRBuilder</tt></a>
-class keep track of the current place to insert instructions and has methods to
-create new instructions.</p>
-
-<p>The <tt>Codegen.named_values</tt> map keeps track of which values are defined
-in the current scope and what their LLVM representation is. (In other words, it
-is a symbol table for the code). In this form of Kaleidoscope, the only things
-that can be referenced are function parameters. As such, function parameters
-will be in this map when generating code for their function body.</p>
-
-<p>
-With these basics in place, we can start talking about how to generate code for
-each expression. Note that this assumes that the <tt>Codgen.builder</tt> has
-been set up to generate code <em>into</em> something. For now, we'll assume
-that this has already been done, and we'll just use it to emit code.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Generating LLVM code for expression nodes is very straightforward: less
-than 30 lines of commented code for all four of our expression nodes. First
-we'll do numeric literals:</p>
-
-<div class="doc_code">
-<pre>
- | Ast.Number n -&gt; const_float double_type n
-</pre>
-</div>
-
-<p>In the LLVM IR, numeric constants are represented with the
-<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
-internally (<tt>APFloat</tt> has the capability of holding floating point
-constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just
-creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR
-that constants are all uniqued together and shared. For this reason, the API
-uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)".</p>
-
-<div class="doc_code">
-<pre>
- | Ast.Variable name -&gt;
- (try Hashtbl.find named_values name with
- | Not_found -&gt; raise (Error "unknown variable name"))
-</pre>
-</div>
-
-<p>References to variables are also quite simple using LLVM. In the simple
-version of Kaleidoscope, we assume that the variable has already been emitted
-somewhere and its value is available. In practice, the only values that can be
-in the <tt>Codegen.named_values</tt> map are function arguments. This code
-simply checks to see that the specified name is in the map (if not, an unknown
-variable is being referenced) and returns the value for it. In future chapters,
-we'll add support for <a href="LangImpl5.html#for">loop induction variables</a>
-in the symbol table, and for <a href="LangImpl7.html#localvars">local
-variables</a>.</p>
-
-<div class="doc_code">
-<pre>
- | Ast.Binary (op, lhs, rhs) -&gt;
- let lhs_val = codegen_expr lhs in
- let rhs_val = codegen_expr rhs in
- begin
- match op with
- | '+' -&gt; build_add lhs_val rhs_val "addtmp" builder
- | '-' -&gt; build_sub lhs_val rhs_val "subtmp" builder
- | '*' -&gt; build_mul lhs_val rhs_val "multmp" builder
- | '&lt;' -&gt;
- (* Convert bool 0/1 to double 0.0 or 1.0 *)
- let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
- build_uitofp i double_type "booltmp" builder
- | _ -&gt; raise (Error "invalid binary operator")
- end
-</pre>
-</div>
-
-<p>Binary operators start to get more interesting. The basic idea here is that
-we recursively emit code for the left-hand side of the expression, then the
-right-hand side, then we compute the result of the binary expression. In this
-code, we do a simple switch on the opcode to create the right LLVM instruction.
-</p>
-
-<p>In the example above, the LLVM builder class is starting to show its value.
-IRBuilder knows where to insert the newly created instruction, all you have to
-do is specify what instruction to create (e.g. with <tt>Llvm.create_add</tt>),
-which operands to use (<tt>lhs</tt> and <tt>rhs</tt> here) and optionally
-provide a name for the generated instruction.</p>
-
-<p>One nice thing about LLVM is that the name is just a hint. For instance, if
-the code above emits multiple "addtmp" variables, LLVM will automatically
-provide each one with an increasing, unique numeric suffix. Local value names
-for instructions are purely optional, but it makes it much easier to read the
-IR dumps.</p>
-
-<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained by
-strict rules: for example, the Left and Right operators of
-an <a href="../LangRef.html#i_add">add instruction</a> must have the same
-type, and the result type of the add must match the operand types. Because
-all values in Kaleidoscope are doubles, this makes for very simple code for add,
-sub and mul.</p>
-
-<p>On the other hand, LLVM specifies that the <a
-href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
-(a one bit integer). The problem with this is that Kaleidoscope wants the value to be a 0.0 or 1.0 value. In order to get these semantics, we combine the fcmp instruction with
-a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction
-converts its input integer into a floating point value by treating the input
-as an unsigned value. In contrast, if we used the <a
-href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '&lt;'
-operator would return 0.0 and -1.0, depending on the input value.</p>
-
-<div class="doc_code">
-<pre>
- | Ast.Call (callee, args) -&gt;
- (* Look up the name in the module table. *)
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "unknown function referenced")
- in
- let params = params callee in
-
- (* If argument mismatch error. *)
- if Array.length params == Array.length args then () else
- raise (Error "incorrect # arguments passed");
- let args = Array.map codegen_expr args in
- build_call callee args "calltmp" builder
-</pre>
-</div>
-
-<p>Code generation for function calls is quite straightforward with LLVM. The
-code above initially does a function name lookup in the LLVM Module's symbol
-table. Recall that the LLVM Module is the container that holds all of the
-functions we are JIT'ing. By giving each function the same name as what the
-user specifies, we can use the LLVM symbol table to resolve function names for
-us.</p>
-
-<p>Once we have the function to call, we recursively codegen each argument that
-is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
-instruction</a>. Note that LLVM uses the native C calling conventions by
-default, allowing these calls to also call into standard library functions like
-"sin" and "cos", with no additional effort.</p>
-
-<p>This wraps up our handling of the four basic expressions that we have so far
-in Kaleidoscope. Feel free to go in and add some more. For example, by
-browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find
-several other interesting instructions that are really easy to plug into our
-basic framework.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="funcs">Function Code Generation</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Code generation for prototypes and functions must handle a number of
-details, which make their code less beautiful than expression code
-generation, but allows us to illustrate some important points. First, lets
-talk about code generation for prototypes: they are used both for function
-bodies and external function declarations. The code starts with:</p>
-
-<div class="doc_code">
-<pre>
-let codegen_proto = function
- | Ast.Prototype (name, args) -&gt;
- (* Make the function type: double(double,double) etc. *)
- let doubles = Array.make (Array.length args) double_type in
- let ft = function_type double_type doubles in
- let f =
- match lookup_function name the_module with
-</pre>
-</div>
-
-<p>This code packs a lot of power into a few lines. Note first that this
-function returns a "Function*" instead of a "Value*" (although at the moment
-they both are modeled by <tt>llvalue</tt> in ocaml). Because a "prototype"
-really talks about the external interface for a function (not the value computed
-by an expression), it makes sense for it to return the LLVM Function it
-corresponds to when codegen'd.</p>
-
-<p>The call to <tt>Llvm.function_type</tt> creates the <tt>Llvm.llvalue</tt>
-that should be used for a given Prototype. Since all function arguments in
-Kaleidoscope are of type double, the first line creates a vector of "N" LLVM
-double types. It then uses the <tt>Llvm.function_type</tt> method to create a
-function type that takes "N" doubles as arguments, returns one double as a
-result, and that is not vararg (that uses the function
-<tt>Llvm.var_arg_function_type</tt>). Note that Types in LLVM are uniqued just
-like <tt>Constant</tt>s are, so you don't "new" a type, you "get" it.</p>
-
-<p>The final line above checks if the function has already been defined in
-<tt>Codegen.the_module</tt>. If not, we will create it.</p>
-
-<div class="doc_code">
-<pre>
- | None -&gt; declare_function name ft the_module
-</pre>
-</div>
-
-<p>This indicates the type and name to use, as well as which module to insert
-into. By default we assume a function has
-<tt>Llvm.Linkage.ExternalLinkage</tt>. "<a href="LangRef.html#linkage">external
-linkage</a>" means that the function may be defined outside the current module
-and/or that it is callable by functions outside the module. The "<tt>name</tt>"
-passed in is the name the user specified: this name is registered in
-"<tt>Codegen.the_module</tt>"s symbol table, which is used by the function call
-code above.</p>
-
-<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases:
-first, we want to allow 'extern'ing a function more than once, as long as the
-prototypes for the externs match (since all arguments have the same type, we
-just have to check that the number of arguments match). Second, we want to
-allow 'extern'ing a function and then defining a body for it. This is useful
-when defining mutually recursive functions.</p>
-
-<div class="doc_code">
-<pre>
- (* If 'f' conflicted, there was already something named 'name'. If it
- * has a body, don't allow redefinition or reextern. *)
- | Some f -&gt;
- (* If 'f' already has a body, reject this. *)
- if Array.length (basic_blocks f) == 0 then () else
- raise (Error "redefinition of function");
-
- (* If 'f' took a different number of arguments, reject. *)
- if Array.length (params f) == Array.length args then () else
- raise (Error "redefinition of function with different # args");
- f
- in
-</pre>
-</div>
-
-<p>In order to verify the logic above, we first check to see if the pre-existing
-function is "empty". In this case, empty means that it has no basic blocks in
-it, which means it has no body. If it has no body, it is a forward
-declaration. Since we don't allow anything after a full definition of the
-function, the code rejects this case. If the previous reference to a function
-was an 'extern', we simply verify that the number of arguments for that
-definition and this one match up. If not, we emit an error.</p>
-
-<div class="doc_code">
-<pre>
- (* Set names for all arguments. *)
- Array.iteri (fun i a -&gt;
- let n = args.(i) in
- set_value_name n a;
- Hashtbl.add named_values n a;
- ) (params f);
- f
-</pre>
-</div>
-
-<p>The last bit of code for prototypes loops over all of the arguments in the
-function, setting the name of the LLVM Argument objects to match, and registering
-the arguments in the <tt>Codegen.named_values</tt> map for future use by the
-<tt>Ast.Variable</tt> variant. Once this is set up, it returns the Function
-object to the caller. Note that we don't check for conflicting
-argument names here (e.g. "extern foo(a b a)"). Doing so would be very
-straight-forward with the mechanics we have already used above.</p>
-
-<div class="doc_code">
-<pre>
-let codegen_func = function
- | Ast.Function (proto, body) -&gt;
- Hashtbl.clear named_values;
- let the_function = codegen_proto proto in
-</pre>
-</div>
-
-<p>Code generation for function definitions starts out simply enough: we just
-codegen the prototype (Proto) and verify that it is ok. We then clear out the
-<tt>Codegen.named_values</tt> map to make sure that there isn't anything in it
-from the last function we compiled. Code generation of the prototype ensures
-that there is an LLVM Function object that is ready to go for us.</p>
-
-<div class="doc_code">
-<pre>
- (* Create a new basic block to start insertion into. *)
- let bb = append_block context "entry" the_function in
- position_at_end bb builder;
-
- try
- let ret_val = codegen_expr body in
-</pre>
-</div>
-
-<p>Now we get to the point where the <tt>Codegen.builder</tt> is set up. The
-first line creates a new
-<a href="http://en.wikipedia.org/wiki/Basic_block">basic block</a> (named
-"entry"), which is inserted into <tt>the_function</tt>. The second line then
-tells the builder that new instructions should be inserted into the end of the
-new basic block. Basic blocks in LLVM are an important part of functions that
-define the <a
-href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>.
-Since we don't have any control flow, our functions will only contain one
-block at this point. We'll fix this in <a href="OCamlLangImpl5.html">Chapter
-5</a> :).</p>
-
-<div class="doc_code">
-<pre>
- let ret_val = codegen_expr body in
-
- (* Finish off the function. *)
- let _ = build_ret ret_val builder in
-
- (* Validate the generated code, checking for consistency. *)
- Llvm_analysis.assert_valid_function the_function;
-
- the_function
-</pre>
-</div>
-
-<p>Once the insertion point is set up, we call the <tt>Codegen.codegen_func</tt>
-method for the root expression of the function. If no error happens, this emits
-code to compute the expression into the entry block and returns the value that
-was computed. Assuming no error, we then create an LLVM <a
-href="../LangRef.html#i_ret">ret instruction</a>, which completes the function.
-Once the function is built, we call
-<tt>Llvm_analysis.assert_valid_function</tt>, which is provided by LLVM. This
-function does a variety of consistency checks on the generated code, to
-determine if our compiler is doing everything right. Using this is important:
-it can catch a lot of bugs. Once the function is finished and validated, we
-return it.</p>
-
-<div class="doc_code">
-<pre>
- with e -&gt;
- delete_function the_function;
- raise e
-</pre>
-</div>
-
-<p>The only piece left here is handling of the error case. For simplicity, we
-handle this by merely deleting the function we produced with the
-<tt>Llvm.delete_function</tt> method. This allows the user to redefine a
-function that they incorrectly typed in before: if we didn't delete it, it
-would live in the symbol table, with a body, preventing future redefinition.</p>
-
-<p>This code does have a bug, though. Since the <tt>Codegen.codegen_proto</tt>
-can return a previously defined forward declaration, our code can actually delete
-a forward declaration. There are a number of ways to fix this bug, see what you
-can come up with! Here is a testcase:</p>
-
-<div class="doc_code">
-<pre>
-extern foo(a b); # ok, defines foo.
-def foo(a b) c; # error, 'c' is invalid.
-def bar() foo(1, 2); # error, unknown function "foo"
-</pre>
-</div>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="driver">Driver Changes and
-Closing Thoughts</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-For now, code generation to LLVM doesn't really get us much, except that we can
-look at the pretty IR calls. The sample code inserts calls to Codegen into the
-"<tt>Toplevel.main_loop</tt>", and then dumps out the LLVM IR. This gives a
-nice way to look at the LLVM IR for simple functions. For example:
-</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>4+5</b>;
-Read top-level expression:
-define double @""() {
-entry:
- %addtmp = fadd double 4.000000e+00, 5.000000e+00
- ret double %addtmp
-}
-</pre>
-</div>
-
-<p>Note how the parser turns the top-level expression into anonymous functions
-for us. This will be handy when we add <a href="OCamlLangImpl4.html#jit">JIT
-support</a> in the next chapter. Also note that the code is very literally
-transcribed, no optimizations are being performed. We will
-<a href="OCamlLangImpl4.html#trivialconstfold">add optimizations</a> explicitly
-in the next chapter.</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def foo(a b) a*a + 2*a*b + b*b;</b>
-Read function definition:
-define double @foo(double %a, double %b) {
-entry:
- %multmp = fmul double %a, %a
- %multmp1 = fmul double 2.000000e+00, %a
- %multmp2 = fmul double %multmp1, %b
- %addtmp = fadd double %multmp, %multmp2
- %multmp3 = fmul double %b, %b
- %addtmp4 = fadd double %addtmp, %multmp3
- ret double %addtmp4
-}
-</pre>
-</div>
-
-<p>This shows some simple arithmetic. Notice the striking similarity to the
-LLVM builder calls that we use to create the instructions.</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def bar(a) foo(a, 4.0) + bar(31337);</b>
-Read function definition:
-define double @bar(double %a) {
-entry:
- %calltmp = call double @foo( double %a, double 4.000000e+00 )
- %calltmp1 = call double @bar( double 3.133700e+04 )
- %addtmp = fadd double %calltmp, %calltmp1
- ret double %addtmp
-}
-</pre>
-</div>
-
-<p>This shows some function calls. Note that this function will take a long
-time to execute if you call it. In the future we'll add conditional control
-flow to actually make recursion useful :).</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>extern cos(x);</b>
-Read extern:
-declare double @cos(double)
-
-ready&gt; <b>cos(1.234);</b>
-Read top-level expression:
-define double @""() {
-entry:
- %calltmp = call double @cos( double 1.234000e+00 )
- ret double %calltmp
-}
-</pre>
-</div>
-
-<p>This shows an extern for the libm "cos" function, and a call to it.</p>
-
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>^D</b>
-; ModuleID = 'my cool jit'
-
-define double @""() {
-entry:
- %addtmp = fadd double 4.000000e+00, 5.000000e+00
- ret double %addtmp
-}
-
-define double @foo(double %a, double %b) {
-entry:
- %multmp = fmul double %a, %a
- %multmp1 = fmul double 2.000000e+00, %a
- %multmp2 = fmul double %multmp1, %b
- %addtmp = fadd double %multmp, %multmp2
- %multmp3 = fmul double %b, %b
- %addtmp4 = fadd double %addtmp, %multmp3
- ret double %addtmp4
-}
-
-define double @bar(double %a) {
-entry:
- %calltmp = call double @foo( double %a, double 4.000000e+00 )
- %calltmp1 = call double @bar( double 3.133700e+04 )
- %addtmp = fadd double %calltmp, %calltmp1
- ret double %addtmp
-}
-
-declare double @cos(double)
-
-define double @""() {
-entry:
- %calltmp = call double @cos( double 1.234000e+00 )
- ret double %calltmp
-}
-</pre>
-</div>
-
-<p>When you quit the current demo, it dumps out the IR for the entire module
-generated. Here you can see the big picture with all the functions referencing
-each other.</p>
-
-<p>This wraps up the third chapter of the Kaleidoscope tutorial. Up next, we'll
-describe how to <a href="OCamlLangImpl4.html">add JIT codegen and optimizer
-support</a> to this so we can actually start running code!</p>
-
-</div>
-
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for our running example, enhanced with the
-LLVM code generator. Because this uses the LLVM libraries, we need to link
-them in. To do this, we use the <a
-href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform
-our makefile/command line about which options to use:</p>
-
-<div class="doc_code">
-<pre>
-# Compile
-ocamlbuild toy.byte
-# Run
-./toy.byte
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<dl>
-<dt>_tags:</dt>
-<dd class="doc_code">
-<pre>
-&lt;{lexer,parser}.ml&gt;: use_camlp4, pp(camlp4of)
-&lt;*.{byte,native}&gt;: g++, use_llvm, use_llvm_analysis
-</pre>
-</dd>
-
-<dt>myocamlbuild.ml:</dt>
-<dd class="doc_code">
-<pre>
-open Ocamlbuild_plugin;;
-
-ocaml_lib ~extern:true "llvm";;
-ocaml_lib ~extern:true "llvm_analysis";;
-
-flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
-</pre>
-</dd>
-
-<dt>token.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer Tokens
- *===----------------------------------------------------------------------===*)
-
-(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
- * these others for known things. *)
-type token =
- (* commands *)
- | Def | Extern
-
- (* primary *)
- | Ident of string | Number of float
-
- (* unknown *)
- | Kwd of char
-</pre>
-</dd>
-
-<dt>lexer.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer
- *===----------------------------------------------------------------------===*)
-
-let rec lex = parser
- (* Skip any whitespace. *)
- | [&lt; ' (' ' | '\n' | '\r' | '\t'); stream &gt;] -&gt; lex stream
-
- (* identifier: [a-zA-Z][a-zA-Z0-9] *)
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_ident buffer stream
-
- (* number: [0-9.]+ *)
- | [&lt; ' ('0' .. '9' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_number buffer stream
-
- (* Comment until end of line. *)
- | [&lt; ' ('#'); stream &gt;] -&gt;
- lex_comment stream
-
- (* Otherwise, just return the character as its ascii value. *)
- | [&lt; 'c; stream &gt;] -&gt;
- [&lt; 'Token.Kwd c; lex stream &gt;]
-
- (* end of stream. *)
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-
-and lex_number buffer = parser
- | [&lt; ' ('0' .. '9' | '.' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_number buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- [&lt; 'Token.Number (float_of_string (Buffer.contents buffer)); stream &gt;]
-
-and lex_ident buffer = parser
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_ident buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- match Buffer.contents buffer with
- | "def" -&gt; [&lt; 'Token.Def; stream &gt;]
- | "extern" -&gt; [&lt; 'Token.Extern; stream &gt;]
- | id -&gt; [&lt; 'Token.Ident id; stream &gt;]
-
-and lex_comment = parser
- | [&lt; ' ('\n'); stream=lex &gt;] -&gt; stream
- | [&lt; 'c; e=lex_comment &gt;] -&gt; e
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-</pre>
-</dd>
-
-<dt>ast.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Abstract Syntax Tree (aka Parse Tree)
- *===----------------------------------------------------------------------===*)
-
-(* expr - Base type for all expression nodes. *)
-type expr =
- (* variant for numeric literals like "1.0". *)
- | Number of float
-
- (* variant for referencing a variable, like "a". *)
- | Variable of string
-
- (* variant for a binary operator. *)
- | Binary of char * expr * expr
-
- (* variant for function calls. *)
- | Call of string * expr array
-
-(* proto - This type represents the "prototype" for a function, which captures
- * its name, and its argument names (thus implicitly the number of arguments the
- * function takes). *)
-type proto = Prototype of string * string array
-
-(* func - This type represents a function definition itself. *)
-type func = Function of proto * expr
-</pre>
-</dd>
-
-<dt>parser.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===---------------------------------------------------------------------===
- * Parser
- *===---------------------------------------------------------------------===*)
-
-(* binop_precedence - This holds the precedence for each binary operator that is
- * defined *)
-let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
-
-(* precedence - Get the precedence of the pending binary operator token. *)
-let precedence c = try Hashtbl.find binop_precedence c with Not_found -&gt; -1
-
-(* primary
- * ::= identifier
- * ::= numberexpr
- * ::= parenexpr *)
-let rec parse_primary = parser
- (* numberexpr ::= number *)
- | [&lt; 'Token.Number n &gt;] -&gt; Ast.Number n
-
- (* parenexpr ::= '(' expression ')' *)
- | [&lt; 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" &gt;] -&gt; e
-
- (* identifierexpr
- * ::= identifier
- * ::= identifier '(' argumentexpr ')' *)
- | [&lt; 'Token.Ident id; stream &gt;] -&gt;
- let rec parse_args accumulator = parser
- | [&lt; e=parse_expr; stream &gt;] -&gt;
- begin parser
- | [&lt; 'Token.Kwd ','; e=parse_args (e :: accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; e :: accumulator
- end stream
- | [&lt; &gt;] -&gt; accumulator
- in
- let rec parse_ident id = parser
- (* Call. *)
- | [&lt; 'Token.Kwd '(';
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')'"&gt;] -&gt;
- Ast.Call (id, Array.of_list (List.rev args))
-
- (* Simple variable ref. *)
- | [&lt; &gt;] -&gt; Ast.Variable id
- in
- parse_ident id stream
-
- | [&lt; &gt;] -&gt; raise (Stream.Error "unknown token when expecting an expression.")
-
-(* binoprhs
- * ::= ('+' primary)* *)
-and parse_bin_rhs expr_prec lhs stream =
- match Stream.peek stream with
- (* If this is a binop, find its precedence. *)
- | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -&gt;
- let token_prec = precedence c in
-
- (* If this is a binop that binds at least as tightly as the current binop,
- * consume it, otherwise we are done. *)
- if token_prec &lt; expr_prec then lhs else begin
- (* Eat the binop. *)
- Stream.junk stream;
-
- (* Parse the primary expression after the binary operator. *)
- let rhs = parse_primary stream in
-
- (* Okay, we know this is a binop. *)
- let rhs =
- match Stream.peek stream with
- | Some (Token.Kwd c2) -&gt;
- (* If BinOp binds less tightly with rhs than the operator after
- * rhs, let the pending operator take rhs as its lhs. *)
- let next_prec = precedence c2 in
- if token_prec &lt; next_prec
- then parse_bin_rhs (token_prec + 1) rhs stream
- else rhs
- | _ -&gt; rhs
- in
-
- (* Merge lhs/rhs. *)
- let lhs = Ast.Binary (c, lhs, rhs) in
- parse_bin_rhs expr_prec lhs stream
- end
- | _ -&gt; lhs
-
-(* expression
- * ::= primary binoprhs *)
-and parse_expr = parser
- | [&lt; lhs=parse_primary; stream &gt;] -&gt; parse_bin_rhs 0 lhs stream
-
-(* prototype
- * ::= id '(' id* ')' *)
-let parse_prototype =
- let rec parse_args accumulator = parser
- | [&lt; 'Token.Ident id; e=parse_args (id::accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
- in
-
- parser
- | [&lt; 'Token.Ident id;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- (* success. *)
- Ast.Prototype (id, Array.of_list (List.rev args))
-
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected function name in prototype")
-
-(* definition ::= 'def' prototype expression *)
-let parse_definition = parser
- | [&lt; 'Token.Def; p=parse_prototype; e=parse_expr &gt;] -&gt;
- Ast.Function (p, e)
-
-(* toplevelexpr ::= expression *)
-let parse_toplevel = parser
- | [&lt; e=parse_expr &gt;] -&gt;
- (* Make an anonymous proto. *)
- Ast.Function (Ast.Prototype ("", [||]), e)
-
-(* external ::= 'extern' prototype *)
-let parse_extern = parser
- | [&lt; 'Token.Extern; e=parse_prototype &gt;] -&gt; e
-</pre>
-</dd>
-
-<dt>codegen.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Code Generation
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-
-exception Error of string
-
-let context = global_context ()
-let the_module = create_module context "my cool jit"
-let builder = builder context
-let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
-let double_type = double_type context
-
-let rec codegen_expr = function
- | Ast.Number n -&gt; const_float double_type n
- | Ast.Variable name -&gt;
- (try Hashtbl.find named_values name with
- | Not_found -&gt; raise (Error "unknown variable name"))
- | Ast.Binary (op, lhs, rhs) -&gt;
- let lhs_val = codegen_expr lhs in
- let rhs_val = codegen_expr rhs in
- begin
- match op with
- | '+' -&gt; build_add lhs_val rhs_val "addtmp" builder
- | '-' -&gt; build_sub lhs_val rhs_val "subtmp" builder
- | '*' -&gt; build_mul lhs_val rhs_val "multmp" builder
- | '&lt;' -&gt;
- (* Convert bool 0/1 to double 0.0 or 1.0 *)
- let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
- build_uitofp i double_type "booltmp" builder
- | _ -&gt; raise (Error "invalid binary operator")
- end
- | Ast.Call (callee, args) -&gt;
- (* Look up the name in the module table. *)
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "unknown function referenced")
- in
- let params = params callee in
-
- (* If argument mismatch error. *)
- if Array.length params == Array.length args then () else
- raise (Error "incorrect # arguments passed");
- let args = Array.map codegen_expr args in
- build_call callee args "calltmp" builder
-
-let codegen_proto = function
- | Ast.Prototype (name, args) -&gt;
- (* Make the function type: double(double,double) etc. *)
- let doubles = Array.make (Array.length args) double_type in
- let ft = function_type double_type doubles in
- let f =
- match lookup_function name the_module with
- | None -&gt; declare_function name ft the_module
-
- (* If 'f' conflicted, there was already something named 'name'. If it
- * has a body, don't allow redefinition or reextern. *)
- | Some f -&gt;
- (* If 'f' already has a body, reject this. *)
- if block_begin f &lt;&gt; At_end f then
- raise (Error "redefinition of function");
-
- (* If 'f' took a different number of arguments, reject. *)
- if element_type (type_of f) &lt;&gt; ft then
- raise (Error "redefinition of function with different # args");
- f
- in
-
- (* Set names for all arguments. *)
- Array.iteri (fun i a -&gt;
- let n = args.(i) in
- set_value_name n a;
- Hashtbl.add named_values n a;
- ) (params f);
- f
-
-let codegen_func = function
- | Ast.Function (proto, body) -&gt;
- Hashtbl.clear named_values;
- let the_function = codegen_proto proto in
-
- (* Create a new basic block to start insertion into. *)
- let bb = append_block context "entry" the_function in
- position_at_end bb builder;
-
- try
- let ret_val = codegen_expr body in
-
- (* Finish off the function. *)
- let _ = build_ret ret_val builder in
-
- (* Validate the generated code, checking for consistency. *)
- Llvm_analysis.assert_valid_function the_function;
-
- the_function
- with e -&gt;
- delete_function the_function;
- raise e
-</pre>
-</dd>
-
-<dt>toplevel.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Top-Level parsing and JIT Driver
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-
-(* top ::= definition | external | expression | ';' *)
-let rec main_loop stream =
- match Stream.peek stream with
- | None -&gt; ()
-
- (* ignore top-level semicolons. *)
- | Some (Token.Kwd ';') -&gt;
- Stream.junk stream;
- main_loop stream
-
- | Some token -&gt;
- begin
- try match token with
- | Token.Def -&gt;
- let e = Parser.parse_definition stream in
- print_endline "parsed a function definition.";
- dump_value (Codegen.codegen_func e);
- | Token.Extern -&gt;
- let e = Parser.parse_extern stream in
- print_endline "parsed an extern.";
- dump_value (Codegen.codegen_proto e);
- | _ -&gt;
- (* Evaluate a top-level expression into an anonymous function. *)
- let e = Parser.parse_toplevel stream in
- print_endline "parsed a top-level expr";
- dump_value (Codegen.codegen_func e);
- with Stream.Error s | Codegen.Error s -&gt;
- (* Skip token for error recovery. *)
- Stream.junk stream;
- print_endline s;
- end;
- print_string "ready&gt; "; flush stdout;
- main_loop stream
-</pre>
-</dd>
-
-<dt>toy.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Main driver code.
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-
-let main () =
- (* Install standard binary operators.
- * 1 is the lowest precedence. *)
- Hashtbl.add Parser.binop_precedence '&lt;' 10;
- Hashtbl.add Parser.binop_precedence '+' 20;
- Hashtbl.add Parser.binop_precedence '-' 20;
- Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
-
- (* Prime the first token. *)
- print_string "ready&gt; "; flush stdout;
- let stream = Lexer.lex (Stream.of_channel stdin) in
-
- (* Run the main "interpreter loop" now. *)
- Toplevel.main_loop stream;
-
- (* Print out all the generated code. *)
- dump_module Codegen.the_module
-;;
-
-main ()
-</pre>
-</dd>
-</dl>
-
-<a href="OCamlLangImpl4.html">Next: Adding JIT and Optimizer Support</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/OCamlLangImpl4.html b/docs/tutorial/OCamlLangImpl4.html
deleted file mode 100644
index 116c618..0000000
--- a/docs/tutorial/OCamlLangImpl4.html
+++ /dev/null
@@ -1,1029 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Adding JIT and Optimizer Support</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <meta name="author" content="Erick Tryzelaar">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Adding JIT and Optimizer Support</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 4
- <ol>
- <li><a href="#intro">Chapter 4 Introduction</a></li>
- <li><a href="#trivialconstfold">Trivial Constant Folding</a></li>
- <li><a href="#optimizerpasses">LLVM Optimization Passes</a></li>
- <li><a href="#jit">Adding a JIT Compiler</a></li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="OCamlLangImpl5.html">Chapter 5</a>: Extending the Language: Control
-Flow</li>
-</ul>
-
-<div class="doc_author">
- <p>
- Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
- and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a>
- </p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 4 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 4 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. Chapters 1-3 described the implementation of a simple
-language and added support for generating LLVM IR. This chapter describes
-two new techniques: adding optimizer support to your language, and adding JIT
-compiler support. These additions will demonstrate how to get nice, efficient code
-for the Kaleidoscope language.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="trivialconstfold">Trivial Constant
-Folding</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p><b>Note:</b> the default <tt>IRBuilder</tt> now always includes the constant
-folding optimisations below.<p>
-
-<p>
-Our demonstration for Chapter 3 is elegant and easy to extend. Unfortunately,
-it does not produce wonderful code. For example, when compiling simple code,
-we don't get obvious optimizations:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def test(x) 1+2+x;</b>
-Read function definition:
-define double @test(double %x) {
-entry:
- %addtmp = fadd double 1.000000e+00, 2.000000e+00
- %addtmp1 = fadd double %addtmp, %x
- ret double %addtmp1
-}
-</pre>
-</div>
-
-<p>This code is a very, very literal transcription of the AST built by parsing
-the input. As such, this transcription lacks optimizations like constant folding
-(we'd like to get "<tt>add x, 3.0</tt>" in the example above) as well as other
-more important optimizations. Constant folding, in particular, is a very common
-and very important optimization: so much so that many language implementors
-implement constant folding support in their AST representation.</p>
-
-<p>With LLVM, you don't need this support in the AST. Since all calls to build
-LLVM IR go through the LLVM builder, it would be nice if the builder itself
-checked to see if there was a constant folding opportunity when you call it.
-If so, it could just do the constant fold and return the constant instead of
-creating an instruction. This is exactly what the <tt>LLVMFoldingBuilder</tt>
-class does.
-
-<p>All we did was switch from <tt>LLVMBuilder</tt> to
-<tt>LLVMFoldingBuilder</tt>. Though we change no other code, we now have all of our
-instructions implicitly constant folded without us having to do anything
-about it. For example, the input above now compiles to:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def test(x) 1+2+x;</b>
-Read function definition:
-define double @test(double %x) {
-entry:
- %addtmp = fadd double 3.000000e+00, %x
- ret double %addtmp
-}
-</pre>
-</div>
-
-<p>Well, that was easy :). In practice, we recommend always using
-<tt>LLVMFoldingBuilder</tt> when generating code like this. It has no
-"syntactic overhead" for its use (you don't have to uglify your compiler with
-constant checks everywhere) and it can dramatically reduce the amount of
-LLVM IR that is generated in some cases (particular for languages with a macro
-preprocessor or that use a lot of constants).</p>
-
-<p>On the other hand, the <tt>LLVMFoldingBuilder</tt> is limited by the fact
-that it does all of its analysis inline with the code as it is built. If you
-take a slightly more complex example:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def test(x) (1+2+x)*(x+(1+2));</b>
-ready&gt; Read function definition:
-define double @test(double %x) {
-entry:
- %addtmp = fadd double 3.000000e+00, %x
- %addtmp1 = fadd double %x, 3.000000e+00
- %multmp = fmul double %addtmp, %addtmp1
- ret double %multmp
-}
-</pre>
-</div>
-
-<p>In this case, the LHS and RHS of the multiplication are the same value. We'd
-really like to see this generate "<tt>tmp = x+3; result = tmp*tmp;</tt>" instead
-of computing "<tt>x*3</tt>" twice.</p>
-
-<p>Unfortunately, no amount of local analysis will be able to detect and correct
-this. This requires two transformations: reassociation of expressions (to
-make the add's lexically identical) and Common Subexpression Elimination (CSE)
-to delete the redundant add instruction. Fortunately, LLVM provides a broad
-range of optimizations that you can use, in the form of "passes".</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="optimizerpasses">LLVM Optimization
- Passes</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>LLVM provides many optimization passes, which do many different sorts of
-things and have different tradeoffs. Unlike other systems, LLVM doesn't hold
-to the mistaken notion that one set of optimizations is right for all languages
-and for all situations. LLVM allows a compiler implementor to make complete
-decisions about what optimizations to use, in which order, and in what
-situation.</p>
-
-<p>As a concrete example, LLVM supports both "whole module" passes, which look
-across as large of body of code as they can (often a whole file, but if run
-at link time, this can be a substantial portion of the whole program). It also
-supports and includes "per-function" passes which just operate on a single
-function at a time, without looking at other functions. For more information
-on passes and how they are run, see the <a href="../WritingAnLLVMPass.html">How
-to Write a Pass</a> document and the <a href="../Passes.html">List of LLVM
-Passes</a>.</p>
-
-<p>For Kaleidoscope, we are currently generating functions on the fly, one at
-a time, as the user types them in. We aren't shooting for the ultimate
-optimization experience in this setting, but we also want to catch the easy and
-quick stuff where possible. As such, we will choose to run a few per-function
-optimizations as the user types the function in. If we wanted to make a "static
-Kaleidoscope compiler", we would use exactly the code we have now, except that
-we would defer running the optimizer until the entire file has been parsed.</p>
-
-<p>In order to get per-function optimizations going, we need to set up a
-<a href="../WritingAnLLVMPass.html#passmanager">Llvm.PassManager</a> to hold and
-organize the LLVM optimizations that we want to run. Once we have that, we can
-add a set of optimizations to run. The code looks like this:</p>
-
-<div class="doc_code">
-<pre>
- (* Create the JIT. *)
- let the_execution_engine = ExecutionEngine.create Codegen.the_module in
- let the_fpm = PassManager.create_function Codegen.the_module in
-
- (* Set up the optimizer pipeline. Start with registering info about how the
- * target lays out data structures. *)
- TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
-
- (* Do simple "peephole" optimizations and bit-twiddling optzn. *)
- add_instruction_combining the_fpm;
-
- (* reassociate expressions. *)
- add_reassociation the_fpm;
-
- (* Eliminate Common SubExpressions. *)
- add_gvn the_fpm;
-
- (* Simplify the control flow graph (deleting unreachable blocks, etc). *)
- add_cfg_simplification the_fpm;
-
- ignore (PassManager.initialize the_fpm);
-
- (* Run the main "interpreter loop" now. *)
- Toplevel.main_loop the_fpm the_execution_engine stream;
-</pre>
-</div>
-
-<p>The meat of the matter here, is the definition of "<tt>the_fpm</tt>". It
-requires a pointer to the <tt>the_module</tt> to construct itself. Once it is
-set up, we use a series of "add" calls to add a bunch of LLVM passes. The
-first pass is basically boilerplate, it adds a pass so that later optimizations
-know how the data structures in the program are laid out. The
-"<tt>the_execution_engine</tt>" variable is related to the JIT, which we will
-get to in the next section.</p>
-
-<p>In this case, we choose to add 4 optimization passes. The passes we chose
-here are a pretty standard set of "cleanup" optimizations that are useful for
-a wide variety of code. I won't delve into what they do but, believe me,
-they are a good starting place :).</p>
-
-<p>Once the <tt>Llvm.PassManager.</tt> is set up, we need to make use of it.
-We do this by running it after our newly created function is constructed (in
-<tt>Codegen.codegen_func</tt>), but before it is returned to the client:</p>
-
-<div class="doc_code">
-<pre>
-let codegen_func the_fpm = function
- ...
- try
- let ret_val = codegen_expr body in
-
- (* Finish off the function. *)
- let _ = build_ret ret_val builder in
-
- (* Validate the generated code, checking for consistency. *)
- Llvm_analysis.assert_valid_function the_function;
-
- (* Optimize the function. *)
- let _ = PassManager.run_function the_function the_fpm in
-
- the_function
-</pre>
-</div>
-
-<p>As you can see, this is pretty straightforward. The <tt>the_fpm</tt>
-optimizes and updates the LLVM Function* in place, improving (hopefully) its
-body. With this in place, we can try our test above again:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def test(x) (1+2+x)*(x+(1+2));</b>
-ready&gt; Read function definition:
-define double @test(double %x) {
-entry:
- %addtmp = fadd double %x, 3.000000e+00
- %multmp = fmul double %addtmp, %addtmp
- ret double %multmp
-}
-</pre>
-</div>
-
-<p>As expected, we now get our nicely optimized code, saving a floating point
-add instruction from every execution of this function.</p>
-
-<p>LLVM provides a wide variety of optimizations that can be used in certain
-circumstances. Some <a href="../Passes.html">documentation about the various
-passes</a> is available, but it isn't very complete. Another good source of
-ideas can come from looking at the passes that <tt>llvm-gcc</tt> or
-<tt>llvm-ld</tt> run to get started. The "<tt>opt</tt>" tool allows you to
-experiment with passes from the command line, so you can see if they do
-anything.</p>
-
-<p>Now that we have reasonable code coming out of our front-end, lets talk about
-executing it!</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="jit">Adding a JIT Compiler</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Code that is available in LLVM IR can have a wide variety of tools
-applied to it. For example, you can run optimizations on it (as we did above),
-you can dump it out in textual or binary forms, you can compile the code to an
-assembly file (.s) for some target, or you can JIT compile it. The nice thing
-about the LLVM IR representation is that it is the "common currency" between
-many different parts of the compiler.
-</p>
-
-<p>In this section, we'll add JIT compiler support to our interpreter. The
-basic idea that we want for Kaleidoscope is to have the user enter function
-bodies as they do now, but immediately evaluate the top-level expressions they
-type in. For example, if they type in "1 + 2;", we should evaluate and print
-out 3. If they define a function, they should be able to call it from the
-command line.</p>
-
-<p>In order to do this, we first declare and initialize the JIT. This is done
-by adding a global variable and a call in <tt>main</tt>:</p>
-
-<div class="doc_code">
-<pre>
-...
-let main () =
- ...
- <b>(* Create the JIT. *)
- let the_execution_engine = ExecutionEngine.create Codegen.the_module in</b>
- ...
-</pre>
-</div>
-
-<p>This creates an abstract "Execution Engine" which can be either a JIT
-compiler or the LLVM interpreter. LLVM will automatically pick a JIT compiler
-for you if one is available for your platform, otherwise it will fall back to
-the interpreter.</p>
-
-<p>Once the <tt>Llvm_executionengine.ExecutionEngine.t</tt> is created, the JIT
-is ready to be used. There are a variety of APIs that are useful, but the
-simplest one is the "<tt>Llvm_executionengine.ExecutionEngine.run_function</tt>"
-function. This method JIT compiles the specified LLVM Function and returns a
-function pointer to the generated machine code. In our case, this means that we
-can change the code that parses a top-level expression to look like this:</p>
-
-<div class="doc_code">
-<pre>
- (* Evaluate a top-level expression into an anonymous function. *)
- let e = Parser.parse_toplevel stream in
- print_endline "parsed a top-level expr";
- let the_function = Codegen.codegen_func the_fpm e in
- dump_value the_function;
-
- (* JIT the function, returning a function pointer. *)
- let result = ExecutionEngine.run_function the_function [||]
- the_execution_engine in
-
- print_string "Evaluated to ";
- print_float (GenericValue.as_float Codegen.double_type result);
- print_newline ();
-</pre>
-</div>
-
-<p>Recall that we compile top-level expressions into a self-contained LLVM
-function that takes no arguments and returns the computed double. Because the
-LLVM JIT compiler matches the native platform ABI, this means that you can just
-cast the result pointer to a function pointer of that type and call it directly.
-This means, there is no difference between JIT compiled code and native machine
-code that is statically linked into your application.</p>
-
-<p>With just these two changes, lets see how Kaleidoscope works now!</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>4+5;</b>
-define double @""() {
-entry:
- ret double 9.000000e+00
-}
-
-<em>Evaluated to 9.000000</em>
-</pre>
-</div>
-
-<p>Well this looks like it is basically working. The dump of the function
-shows the "no argument function that always returns double" that we synthesize
-for each top level expression that is typed in. This demonstrates very basic
-functionality, but can we do more?</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>def testfunc(x y) x + y*2; </b>
-Read function definition:
-define double @testfunc(double %x, double %y) {
-entry:
- %multmp = fmul double %y, 2.000000e+00
- %addtmp = fadd double %multmp, %x
- ret double %addtmp
-}
-
-ready&gt; <b>testfunc(4, 10);</b>
-define double @""() {
-entry:
- %calltmp = call double @testfunc( double 4.000000e+00, double 1.000000e+01 )
- ret double %calltmp
-}
-
-<em>Evaluated to 24.000000</em>
-</pre>
-</div>
-
-<p>This illustrates that we can now call user code, but there is something a bit
-subtle going on here. Note that we only invoke the JIT on the anonymous
-functions that <em>call testfunc</em>, but we never invoked it
-on <em>testfunc</em> itself. What actually happened here is that the JIT
-scanned for all non-JIT'd functions transitively called from the anonymous
-function and compiled all of them before returning
-from <tt>run_function</tt>.</p>
-
-<p>The JIT provides a number of other more advanced interfaces for things like
-freeing allocated machine code, rejit'ing functions to update them, etc.
-However, even with this simple code, we get some surprisingly powerful
-capabilities - check this out (I removed the dump of the anonymous functions,
-you should get the idea by now :) :</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>extern sin(x);</b>
-Read extern:
-declare double @sin(double)
-
-ready&gt; <b>extern cos(x);</b>
-Read extern:
-declare double @cos(double)
-
-ready&gt; <b>sin(1.0);</b>
-<em>Evaluated to 0.841471</em>
-
-ready&gt; <b>def foo(x) sin(x)*sin(x) + cos(x)*cos(x);</b>
-Read function definition:
-define double @foo(double %x) {
-entry:
- %calltmp = call double @sin( double %x )
- %multmp = fmul double %calltmp, %calltmp
- %calltmp2 = call double @cos( double %x )
- %multmp4 = fmul double %calltmp2, %calltmp2
- %addtmp = fadd double %multmp, %multmp4
- ret double %addtmp
-}
-
-ready&gt; <b>foo(4.0);</b>
-<em>Evaluated to 1.000000</em>
-</pre>
-</div>
-
-<p>Whoa, how does the JIT know about sin and cos? The answer is surprisingly
-simple: in this example, the JIT started execution of a function and got to a
-function call. It realized that the function was not yet JIT compiled and
-invoked the standard set of routines to resolve the function. In this case,
-there is no body defined for the function, so the JIT ended up calling
-"<tt>dlsym("sin")</tt>" on the Kaleidoscope process itself. Since
-"<tt>sin</tt>" is defined within the JIT's address space, it simply patches up
-calls in the module to call the libm version of <tt>sin</tt> directly.</p>
-
-<p>The LLVM JIT provides a number of interfaces (look in the
-<tt>llvm_executionengine.mli</tt> file) for controlling how unknown functions
-get resolved. It allows you to establish explicit mappings between IR objects
-and addresses (useful for LLVM global variables that you want to map to static
-tables, for example), allows you to dynamically decide on the fly based on the
-function name, and even allows you to have the JIT compile functions lazily the
-first time they're called.</p>
-
-<p>One interesting application of this is that we can now extend the language
-by writing arbitrary C code to implement operations. For example, if we add:
-</p>
-
-<div class="doc_code">
-<pre>
-/* putchard - putchar that takes a double and returns 0. */
-extern "C"
-double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-</pre>
-</div>
-
-<p>Now we can produce simple output to the console by using things like:
-"<tt>extern putchard(x); putchard(120);</tt>", which prints a lowercase 'x' on
-the console (120 is the ASCII code for 'x'). Similar code could be used to
-implement file I/O, console input, and many other capabilities in
-Kaleidoscope.</p>
-
-<p>This completes the JIT and optimizer chapter of the Kaleidoscope tutorial. At
-this point, we can compile a non-Turing-complete programming language, optimize
-and JIT compile it in a user-driven way. Next up we'll look into <a
-href="OCamlLangImpl5.html">extending the language with control flow
-constructs</a>, tackling some interesting LLVM IR issues along the way.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for our running example, enhanced with the
-LLVM JIT and optimizer. To build this example, use:
-</p>
-
-<div class="doc_code">
-<pre>
-# Compile
-ocamlbuild toy.byte
-# Run
-./toy.byte
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<dl>
-<dt>_tags:</dt>
-<dd class="doc_code">
-<pre>
-&lt;{lexer,parser}.ml&gt;: use_camlp4, pp(camlp4of)
-&lt;*.{byte,native}&gt;: g++, use_llvm, use_llvm_analysis
-&lt;*.{byte,native}&gt;: use_llvm_executionengine, use_llvm_target
-&lt;*.{byte,native}&gt;: use_llvm_scalar_opts, use_bindings
-</pre>
-</dd>
-
-<dt>myocamlbuild.ml:</dt>
-<dd class="doc_code">
-<pre>
-open Ocamlbuild_plugin;;
-
-ocaml_lib ~extern:true "llvm";;
-ocaml_lib ~extern:true "llvm_analysis";;
-ocaml_lib ~extern:true "llvm_executionengine";;
-ocaml_lib ~extern:true "llvm_target";;
-ocaml_lib ~extern:true "llvm_scalar_opts";;
-
-flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
-dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];;
-</pre>
-</dd>
-
-<dt>token.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer Tokens
- *===----------------------------------------------------------------------===*)
-
-(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
- * these others for known things. *)
-type token =
- (* commands *)
- | Def | Extern
-
- (* primary *)
- | Ident of string | Number of float
-
- (* unknown *)
- | Kwd of char
-</pre>
-</dd>
-
-<dt>lexer.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer
- *===----------------------------------------------------------------------===*)
-
-let rec lex = parser
- (* Skip any whitespace. *)
- | [&lt; ' (' ' | '\n' | '\r' | '\t'); stream &gt;] -&gt; lex stream
-
- (* identifier: [a-zA-Z][a-zA-Z0-9] *)
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_ident buffer stream
-
- (* number: [0-9.]+ *)
- | [&lt; ' ('0' .. '9' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_number buffer stream
-
- (* Comment until end of line. *)
- | [&lt; ' ('#'); stream &gt;] -&gt;
- lex_comment stream
-
- (* Otherwise, just return the character as its ascii value. *)
- | [&lt; 'c; stream &gt;] -&gt;
- [&lt; 'Token.Kwd c; lex stream &gt;]
-
- (* end of stream. *)
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-
-and lex_number buffer = parser
- | [&lt; ' ('0' .. '9' | '.' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_number buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- [&lt; 'Token.Number (float_of_string (Buffer.contents buffer)); stream &gt;]
-
-and lex_ident buffer = parser
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_ident buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- match Buffer.contents buffer with
- | "def" -&gt; [&lt; 'Token.Def; stream &gt;]
- | "extern" -&gt; [&lt; 'Token.Extern; stream &gt;]
- | id -&gt; [&lt; 'Token.Ident id; stream &gt;]
-
-and lex_comment = parser
- | [&lt; ' ('\n'); stream=lex &gt;] -&gt; stream
- | [&lt; 'c; e=lex_comment &gt;] -&gt; e
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-</pre>
-</dd>
-
-<dt>ast.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Abstract Syntax Tree (aka Parse Tree)
- *===----------------------------------------------------------------------===*)
-
-(* expr - Base type for all expression nodes. *)
-type expr =
- (* variant for numeric literals like "1.0". *)
- | Number of float
-
- (* variant for referencing a variable, like "a". *)
- | Variable of string
-
- (* variant for a binary operator. *)
- | Binary of char * expr * expr
-
- (* variant for function calls. *)
- | Call of string * expr array
-
-(* proto - This type represents the "prototype" for a function, which captures
- * its name, and its argument names (thus implicitly the number of arguments the
- * function takes). *)
-type proto = Prototype of string * string array
-
-(* func - This type represents a function definition itself. *)
-type func = Function of proto * expr
-</pre>
-</dd>
-
-<dt>parser.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===---------------------------------------------------------------------===
- * Parser
- *===---------------------------------------------------------------------===*)
-
-(* binop_precedence - This holds the precedence for each binary operator that is
- * defined *)
-let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
-
-(* precedence - Get the precedence of the pending binary operator token. *)
-let precedence c = try Hashtbl.find binop_precedence c with Not_found -&gt; -1
-
-(* primary
- * ::= identifier
- * ::= numberexpr
- * ::= parenexpr *)
-let rec parse_primary = parser
- (* numberexpr ::= number *)
- | [&lt; 'Token.Number n &gt;] -&gt; Ast.Number n
-
- (* parenexpr ::= '(' expression ')' *)
- | [&lt; 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" &gt;] -&gt; e
-
- (* identifierexpr
- * ::= identifier
- * ::= identifier '(' argumentexpr ')' *)
- | [&lt; 'Token.Ident id; stream &gt;] -&gt;
- let rec parse_args accumulator = parser
- | [&lt; e=parse_expr; stream &gt;] -&gt;
- begin parser
- | [&lt; 'Token.Kwd ','; e=parse_args (e :: accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; e :: accumulator
- end stream
- | [&lt; &gt;] -&gt; accumulator
- in
- let rec parse_ident id = parser
- (* Call. *)
- | [&lt; 'Token.Kwd '(';
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')'"&gt;] -&gt;
- Ast.Call (id, Array.of_list (List.rev args))
-
- (* Simple variable ref. *)
- | [&lt; &gt;] -&gt; Ast.Variable id
- in
- parse_ident id stream
-
- | [&lt; &gt;] -&gt; raise (Stream.Error "unknown token when expecting an expression.")
-
-(* binoprhs
- * ::= ('+' primary)* *)
-and parse_bin_rhs expr_prec lhs stream =
- match Stream.peek stream with
- (* If this is a binop, find its precedence. *)
- | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -&gt;
- let token_prec = precedence c in
-
- (* If this is a binop that binds at least as tightly as the current binop,
- * consume it, otherwise we are done. *)
- if token_prec &lt; expr_prec then lhs else begin
- (* Eat the binop. *)
- Stream.junk stream;
-
- (* Parse the primary expression after the binary operator. *)
- let rhs = parse_primary stream in
-
- (* Okay, we know this is a binop. *)
- let rhs =
- match Stream.peek stream with
- | Some (Token.Kwd c2) -&gt;
- (* If BinOp binds less tightly with rhs than the operator after
- * rhs, let the pending operator take rhs as its lhs. *)
- let next_prec = precedence c2 in
- if token_prec &lt; next_prec
- then parse_bin_rhs (token_prec + 1) rhs stream
- else rhs
- | _ -&gt; rhs
- in
-
- (* Merge lhs/rhs. *)
- let lhs = Ast.Binary (c, lhs, rhs) in
- parse_bin_rhs expr_prec lhs stream
- end
- | _ -&gt; lhs
-
-(* expression
- * ::= primary binoprhs *)
-and parse_expr = parser
- | [&lt; lhs=parse_primary; stream &gt;] -&gt; parse_bin_rhs 0 lhs stream
-
-(* prototype
- * ::= id '(' id* ')' *)
-let parse_prototype =
- let rec parse_args accumulator = parser
- | [&lt; 'Token.Ident id; e=parse_args (id::accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
- in
-
- parser
- | [&lt; 'Token.Ident id;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- (* success. *)
- Ast.Prototype (id, Array.of_list (List.rev args))
-
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected function name in prototype")
-
-(* definition ::= 'def' prototype expression *)
-let parse_definition = parser
- | [&lt; 'Token.Def; p=parse_prototype; e=parse_expr &gt;] -&gt;
- Ast.Function (p, e)
-
-(* toplevelexpr ::= expression *)
-let parse_toplevel = parser
- | [&lt; e=parse_expr &gt;] -&gt;
- (* Make an anonymous proto. *)
- Ast.Function (Ast.Prototype ("", [||]), e)
-
-(* external ::= 'extern' prototype *)
-let parse_extern = parser
- | [&lt; 'Token.Extern; e=parse_prototype &gt;] -&gt; e
-</pre>
-</dd>
-
-<dt>codegen.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Code Generation
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-
-exception Error of string
-
-let context = global_context ()
-let the_module = create_module context "my cool jit"
-let builder = builder context
-let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
-let double_type = double_type context
-
-let rec codegen_expr = function
- | Ast.Number n -&gt; const_float double_type n
- | Ast.Variable name -&gt;
- (try Hashtbl.find named_values name with
- | Not_found -&gt; raise (Error "unknown variable name"))
- | Ast.Binary (op, lhs, rhs) -&gt;
- let lhs_val = codegen_expr lhs in
- let rhs_val = codegen_expr rhs in
- begin
- match op with
- | '+' -&gt; build_add lhs_val rhs_val "addtmp" builder
- | '-' -&gt; build_sub lhs_val rhs_val "subtmp" builder
- | '*' -&gt; build_mul lhs_val rhs_val "multmp" builder
- | '&lt;' -&gt;
- (* Convert bool 0/1 to double 0.0 or 1.0 *)
- let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
- build_uitofp i double_type "booltmp" builder
- | _ -&gt; raise (Error "invalid binary operator")
- end
- | Ast.Call (callee, args) -&gt;
- (* Look up the name in the module table. *)
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "unknown function referenced")
- in
- let params = params callee in
-
- (* If argument mismatch error. *)
- if Array.length params == Array.length args then () else
- raise (Error "incorrect # arguments passed");
- let args = Array.map codegen_expr args in
- build_call callee args "calltmp" builder
-
-let codegen_proto = function
- | Ast.Prototype (name, args) -&gt;
- (* Make the function type: double(double,double) etc. *)
- let doubles = Array.make (Array.length args) double_type in
- let ft = function_type double_type doubles in
- let f =
- match lookup_function name the_module with
- | None -&gt; declare_function name ft the_module
-
- (* If 'f' conflicted, there was already something named 'name'. If it
- * has a body, don't allow redefinition or reextern. *)
- | Some f -&gt;
- (* If 'f' already has a body, reject this. *)
- if block_begin f &lt;&gt; At_end f then
- raise (Error "redefinition of function");
-
- (* If 'f' took a different number of arguments, reject. *)
- if element_type (type_of f) &lt;&gt; ft then
- raise (Error "redefinition of function with different # args");
- f
- in
-
- (* Set names for all arguments. *)
- Array.iteri (fun i a -&gt;
- let n = args.(i) in
- set_value_name n a;
- Hashtbl.add named_values n a;
- ) (params f);
- f
-
-let codegen_func the_fpm = function
- | Ast.Function (proto, body) -&gt;
- Hashtbl.clear named_values;
- let the_function = codegen_proto proto in
-
- (* Create a new basic block to start insertion into. *)
- let bb = append_block context "entry" the_function in
- position_at_end bb builder;
-
- try
- let ret_val = codegen_expr body in
-
- (* Finish off the function. *)
- let _ = build_ret ret_val builder in
-
- (* Validate the generated code, checking for consistency. *)
- Llvm_analysis.assert_valid_function the_function;
-
- (* Optimize the function. *)
- let _ = PassManager.run_function the_function the_fpm in
-
- the_function
- with e -&gt;
- delete_function the_function;
- raise e
-</pre>
-</dd>
-
-<dt>toplevel.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Top-Level parsing and JIT Driver
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-open Llvm_executionengine
-
-(* top ::= definition | external | expression | ';' *)
-let rec main_loop the_fpm the_execution_engine stream =
- match Stream.peek stream with
- | None -&gt; ()
-
- (* ignore top-level semicolons. *)
- | Some (Token.Kwd ';') -&gt;
- Stream.junk stream;
- main_loop the_fpm the_execution_engine stream
-
- | Some token -&gt;
- begin
- try match token with
- | Token.Def -&gt;
- let e = Parser.parse_definition stream in
- print_endline "parsed a function definition.";
- dump_value (Codegen.codegen_func the_fpm e);
- | Token.Extern -&gt;
- let e = Parser.parse_extern stream in
- print_endline "parsed an extern.";
- dump_value (Codegen.codegen_proto e);
- | _ -&gt;
- (* Evaluate a top-level expression into an anonymous function. *)
- let e = Parser.parse_toplevel stream in
- print_endline "parsed a top-level expr";
- let the_function = Codegen.codegen_func the_fpm e in
- dump_value the_function;
-
- (* JIT the function, returning a function pointer. *)
- let result = ExecutionEngine.run_function the_function [||]
- the_execution_engine in
-
- print_string "Evaluated to ";
- print_float (GenericValue.as_float Codegen.double_type result);
- print_newline ();
- with Stream.Error s | Codegen.Error s -&gt;
- (* Skip token for error recovery. *)
- Stream.junk stream;
- print_endline s;
- end;
- print_string "ready&gt; "; flush stdout;
- main_loop the_fpm the_execution_engine stream
-</pre>
-</dd>
-
-<dt>toy.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Main driver code.
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-open Llvm_executionengine
-open Llvm_target
-open Llvm_scalar_opts
-
-let main () =
- ignore (initialize_native_target ());
-
- (* Install standard binary operators.
- * 1 is the lowest precedence. *)
- Hashtbl.add Parser.binop_precedence '&lt;' 10;
- Hashtbl.add Parser.binop_precedence '+' 20;
- Hashtbl.add Parser.binop_precedence '-' 20;
- Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
-
- (* Prime the first token. *)
- print_string "ready&gt; "; flush stdout;
- let stream = Lexer.lex (Stream.of_channel stdin) in
-
- (* Create the JIT. *)
- let the_execution_engine = ExecutionEngine.create Codegen.the_module in
- let the_fpm = PassManager.create_function Codegen.the_module in
-
- (* Set up the optimizer pipeline. Start with registering info about how the
- * target lays out data structures. *)
- TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
-
- (* Do simple "peephole" optimizations and bit-twiddling optzn. *)
- add_instruction_combination the_fpm;
-
- (* reassociate expressions. *)
- add_reassociation the_fpm;
-
- (* Eliminate Common SubExpressions. *)
- add_gvn the_fpm;
-
- (* Simplify the control flow graph (deleting unreachable blocks, etc). *)
- add_cfg_simplification the_fpm;
-
- ignore (PassManager.initialize the_fpm);
-
- (* Run the main "interpreter loop" now. *)
- Toplevel.main_loop the_fpm the_execution_engine stream;
-
- (* Print out all the generated code. *)
- dump_module Codegen.the_module
-;;
-
-main ()
-</pre>
-</dd>
-
-<dt>bindings.c</dt>
-<dd class="doc_code">
-<pre>
-#include &lt;stdio.h&gt;
-
-/* putchard - putchar that takes a double and returns 0. */
-extern double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-</pre>
-</dd>
-</dl>
-
-<a href="OCamlLangImpl5.html">Next: Extending the language: control flow</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/OCamlLangImpl5.html b/docs/tutorial/OCamlLangImpl5.html
deleted file mode 100644
index 131d5b2..0000000
--- a/docs/tutorial/OCamlLangImpl5.html
+++ /dev/null
@@ -1,1569 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Extending the Language: Control Flow</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <meta name="author" content="Erick Tryzelaar">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 5
- <ol>
- <li><a href="#intro">Chapter 5 Introduction</a></li>
- <li><a href="#ifthen">If/Then/Else</a>
- <ol>
- <li><a href="#iflexer">Lexer Extensions</a></li>
- <li><a href="#ifast">AST Extensions</a></li>
- <li><a href="#ifparser">Parser Extensions</a></li>
- <li><a href="#ifir">LLVM IR</a></li>
- <li><a href="#ifcodegen">Code Generation</a></li>
- </ol>
- </li>
- <li><a href="#for">'for' Loop Expression</a>
- <ol>
- <li><a href="#forlexer">Lexer Extensions</a></li>
- <li><a href="#forast">AST Extensions</a></li>
- <li><a href="#forparser">Parser Extensions</a></li>
- <li><a href="#forir">LLVM IR</a></li>
- <li><a href="#forcodegen">Code Generation</a></li>
- </ol>
- </li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="OCamlLangImpl6.html">Chapter 6</a>: Extending the Language:
-User-defined Operators</li>
-</ul>
-
-<div class="doc_author">
- <p>
- Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
- and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a>
- </p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 5 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple
-Kaleidoscope language and included support for generating LLVM IR, followed by
-optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is
-mostly useless: it has no control flow other than call and return. This means
-that you can't have conditional branches in the code, significantly limiting its
-power. In this episode of "build that compiler", we'll extend Kaleidoscope to
-have an if/then/else expression plus a simple 'for' loop.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="ifthen">If/Then/Else</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Extending Kaleidoscope to support if/then/else is quite straightforward. It
-basically requires adding lexer support for this "new" concept to the lexer,
-parser, AST, and LLVM code emitter. This example is nice, because it shows how
-easy it is to "grow" a language over time, incrementally extending it as new
-ideas are discovered.</p>
-
-<p>Before we get going on "how" we add this extension, lets talk about "what" we
-want. The basic idea is that we want to be able to write this sort of thing:
-</p>
-
-<div class="doc_code">
-<pre>
-def fib(x)
- if x &lt; 3 then
- 1
- else
- fib(x-1)+fib(x-2);
-</pre>
-</div>
-
-<p>In Kaleidoscope, every construct is an expression: there are no statements.
-As such, the if/then/else expression needs to return a value like any other.
-Since we're using a mostly functional form, we'll have it evaluate its
-conditional, then return the 'then' or 'else' value based on how the condition
-was resolved. This is very similar to the C "?:" expression.</p>
-
-<p>The semantics of the if/then/else expression is that it evaluates the
-condition to a boolean equality value: 0.0 is considered to be false and
-everything else is considered to be true.
-If the condition is true, the first subexpression is evaluated and returned, if
-the condition is false, the second subexpression is evaluated and returned.
-Since Kaleidoscope allows side-effects, this behavior is important to nail down.
-</p>
-
-<p>Now that we know what we "want", lets break this down into its constituent
-pieces.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for
-If/Then/Else</a></div>
-<!-- ======================================================================= -->
-
-
-<div class="doc_text">
-
-<p>The lexer extensions are straightforward. First we add new variants
-for the relevant tokens:</p>
-
-<div class="doc_code">
-<pre>
- (* control *)
- | If | Then | Else | For | In
-</pre>
-</div>
-
-<p>Once we have that, we recognize the new keywords in the lexer. This is pretty simple
-stuff:</p>
-
-<div class="doc_code">
-<pre>
- ...
- match Buffer.contents buffer with
- | "def" -&gt; [&lt; 'Token.Def; stream &gt;]
- | "extern" -&gt; [&lt; 'Token.Extern; stream &gt;]
- | "if" -&gt; [&lt; 'Token.If; stream &gt;]
- | "then" -&gt; [&lt; 'Token.Then; stream &gt;]
- | "else" -&gt; [&lt; 'Token.Else; stream &gt;]
- | "for" -&gt; [&lt; 'Token.For; stream &gt;]
- | "in" -&gt; [&lt; 'Token.In; stream &gt;]
- | id -&gt; [&lt; 'Token.Ident id; stream &gt;]
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="ifast">AST Extensions for
- If/Then/Else</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>To represent the new expression we add a new AST variant for it:</p>
-
-<div class="doc_code">
-<pre>
-type expr =
- ...
- (* variant for if/then/else. *)
- | If of expr * expr * expr
-</pre>
-</div>
-
-<p>The AST variant just has pointers to the various subexpressions.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="ifparser">Parser Extensions for
-If/Then/Else</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>Now that we have the relevant tokens coming from the lexer and we have the
-AST node to build, our parsing logic is relatively straightforward. First we
-define a new parsing function:</p>
-
-<div class="doc_code">
-<pre>
-let rec parse_primary = parser
- ...
- (* ifexpr ::= 'if' expr 'then' expr 'else' expr *)
- | [&lt; 'Token.If; c=parse_expr;
- 'Token.Then ?? "expected 'then'"; t=parse_expr;
- 'Token.Else ?? "expected 'else'"; e=parse_expr &gt;] -&gt;
- Ast.If (c, t, e)
-</pre>
-</div>
-
-<p>Next we hook it up as a primary expression:</p>
-
-<div class="doc_code">
-<pre>
-let rec parse_primary = parser
- ...
- (* ifexpr ::= 'if' expr 'then' expr 'else' expr *)
- | [&lt; 'Token.If; c=parse_expr;
- 'Token.Then ?? "expected 'then'"; t=parse_expr;
- 'Token.Else ?? "expected 'else'"; e=parse_expr &gt;] -&gt;
- Ast.If (c, t, e)
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>Now that we have it parsing and building the AST, the final piece is adding
-LLVM code generation support. This is the most interesting part of the
-if/then/else example, because this is where it starts to introduce new concepts.
-All of the code above has been thoroughly described in previous chapters.
-</p>
-
-<p>To motivate the code we want to produce, lets take a look at a simple
-example. Consider:</p>
-
-<div class="doc_code">
-<pre>
-extern foo();
-extern bar();
-def baz(x) if x then foo() else bar();
-</pre>
-</div>
-
-<p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
-looks like this:</p>
-
-<div class="doc_code">
-<pre>
-declare double @foo()
-
-declare double @bar()
-
-define double @baz(double %x) {
-entry:
- %ifcond = fcmp one double %x, 0.000000e+00
- br i1 %ifcond, label %then, label %else
-
-then: ; preds = %entry
- %calltmp = call double @foo()
- br label %ifcont
-
-else: ; preds = %entry
- %calltmp1 = call double @bar()
- br label %ifcont
-
-ifcont: ; preds = %else, %then
- %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
- ret double %iftmp
-}
-</pre>
-</div>
-
-<p>To visualize the control flow graph, you can use a nifty feature of the LLVM
-'<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR
-into "t.ll" and run "<tt>llvm-as &lt; t.ll | opt -analyze -view-cfg</tt>", <a
-href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
-see this graph:</p>
-
-<div style="text-align: center"><img src="LangImpl5-cfg.png" alt="Example CFG" width="423"
-height="315"></div>
-
-<p>Another way to get this is to call "<tt>Llvm_analysis.view_function_cfg
-f</tt>" or "<tt>Llvm_analysis.view_function_cfg_only f</tt>" (where <tt>f</tt>
-is a "<tt>Function</tt>") either by inserting actual calls into the code and
-recompiling or by calling these in the debugger. LLVM has many nice features
-for visualizing various graphs.</p>
-
-<p>Getting back to the generated code, it is fairly simple: the entry block
-evaluates the conditional expression ("x" in our case here) and compares the
-result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
-instruction ('one' is "Ordered and Not Equal"). Based on the result of this
-expression, the code jumps to either the "then" or "else" blocks, which contain
-the expressions for the true/false cases.</p>
-
-<p>Once the then/else blocks are finished executing, they both branch back to the
-'ifcont' block to execute the code that happens after the if/then/else. In this
-case the only thing left to do is to return to the caller of the function. The
-question then becomes: how does the code know which expression to return?</p>
-
-<p>The answer to this question involves an important SSA operation: the
-<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
-operation</a>. If you're not familiar with SSA, <a
-href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
-article</a> is a good introduction and there are various other introductions to
-it available on your favorite search engine. The short version is that
-"execution" of the Phi operation requires "remembering" which block control came
-from. The Phi operation takes on the value corresponding to the input control
-block. In this case, if control comes in from the "then" block, it gets the
-value of "calltmp". If control comes from the "else" block, it gets the value
-of "calltmp1".</p>
-
-<p>At this point, you are probably starting to think "Oh no! This means my
-simple and elegant front-end will have to start generating SSA form in order to
-use LLVM!". Fortunately, this is not the case, and we strongly advise
-<em>not</em> implementing an SSA construction algorithm in your front-end
-unless there is an amazingly good reason to do so. In practice, there are two
-sorts of values that float around in code written for your average imperative
-programming language that might need Phi nodes:</p>
-
-<ol>
-<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
-<li>Values that are implicit in the structure of your AST, such as the Phi node
-in this case.</li>
-</ol>
-
-<p>In <a href="OCamlLangImpl7.html">Chapter 7</a> of this tutorial ("mutable
-variables"), we'll talk about #1
-in depth. For now, just believe me that you don't need SSA construction to
-handle this case. For #2, you have the choice of using the techniques that we will
-describe for #1, or you can insert Phi nodes directly, if convenient. In this
-case, it is really really easy to generate the Phi node, so we choose to do it
-directly.</p>
-
-<p>Okay, enough of the motivation and overview, lets generate code!</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="ifcodegen">Code Generation for
-If/Then/Else</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>In order to generate code for this, we implement the <tt>Codegen</tt> method
-for <tt>IfExprAST</tt>:</p>
-
-<div class="doc_code">
-<pre>
-let rec codegen_expr = function
- ...
- | Ast.If (cond, then_, else_) -&gt;
- let cond = codegen_expr cond in
-
- (* Convert condition to a bool by comparing equal to 0.0 *)
- let zero = const_float double_type 0.0 in
- let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in
-</pre>
-</div>
-
-<p>This code is straightforward and similar to what we saw before. We emit the
-expression for the condition, then compare that value to zero to get a truth
-value as a 1-bit (bool) value.</p>
-
-<div class="doc_code">
-<pre>
- (* Grab the first block so that we might later add the conditional branch
- * to it at the end of the function. *)
- let start_bb = insertion_block builder in
- let the_function = block_parent start_bb in
-
- let then_bb = append_block context "then" the_function in
- position_at_end then_bb builder;
-</pre>
-</div>
-
-<p>
-As opposed to the <a href="LangImpl5.html">C++ tutorial</a>, we have to build
-our basic blocks bottom up since we can't have dangling BasicBlocks. We start
-off by saving a pointer to the first block (which might not be the entry
-block), which we'll need to build a conditional branch later. We do this by
-asking the <tt>builder</tt> for the current BasicBlock. The fourth line
-gets the current Function object that is being built. It gets this by the
-<tt>start_bb</tt> for its "parent" (the function it is currently embedded
-into).</p>
-
-<p>Once it has that, it creates one block. It is automatically appended into
-the function's list of blocks.</p>
-
-<div class="doc_code">
-<pre>
- (* Emit 'then' value. *)
- position_at_end then_bb builder;
- let then_val = codegen_expr then_ in
-
- (* Codegen of 'then' can change the current block, update then_bb for the
- * phi. We create a new name because one is used for the phi node, and the
- * other is used for the conditional branch. *)
- let new_then_bb = insertion_block builder in
-</pre>
-</div>
-
-<p>We move the builder to start inserting into the "then" block. Strictly
-speaking, this call moves the insertion point to be at the end of the specified
-block. However, since the "then" block is empty, it also starts out by
-inserting at the beginning of the block. :)</p>
-
-<p>Once the insertion point is set, we recursively codegen the "then" expression
-from the AST.</p>
-
-<p>The final line here is quite subtle, but is very important. The basic issue
-is that when we create the Phi node in the merge block, we need to set up the
-block/value pairs that indicate how the Phi will work. Importantly, the Phi
-node expects to have an entry for each predecessor of the block in the CFG. Why
-then, are we getting the current block when we just set it to ThenBB 5 lines
-above? The problem is that the "Then" expression may actually itself change the
-block that the Builder is emitting into if, for example, it contains a nested
-"if/then/else" expression. Because calling Codegen recursively could
-arbitrarily change the notion of the current block, we are required to get an
-up-to-date value for code that will set up the Phi node.</p>
-
-<div class="doc_code">
-<pre>
- (* Emit 'else' value. *)
- let else_bb = append_block context "else" the_function in
- position_at_end else_bb builder;
- let else_val = codegen_expr else_ in
-
- (* Codegen of 'else' can change the current block, update else_bb for the
- * phi. *)
- let new_else_bb = insertion_block builder in
-</pre>
-</div>
-
-<p>Code generation for the 'else' block is basically identical to codegen for
-the 'then' block.</p>
-
-<div class="doc_code">
-<pre>
- (* Emit merge block. *)
- let merge_bb = append_block context "ifcont" the_function in
- position_at_end merge_bb builder;
- let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in
- let phi = build_phi incoming "iftmp" builder in
-</pre>
-</div>
-
-<p>The first two lines here are now familiar: the first adds the "merge" block
-to the Function object. The second block changes the insertion point so that
-newly created code will go into the "merge" block. Once that is done, we need
-to create the PHI node and set up the block/value pairs for the PHI.</p>
-
-<div class="doc_code">
-<pre>
- (* Return to the start block to add the conditional branch. *)
- position_at_end start_bb builder;
- ignore (build_cond_br cond_val then_bb else_bb builder);
-</pre>
-</div>
-
-<p>Once the blocks are created, we can emit the conditional branch that chooses
-between them. Note that creating new blocks does not implicitly affect the
-IRBuilder, so it is still inserting into the block that the condition
-went into. This is why we needed to save the "start" block.</p>
-
-<div class="doc_code">
-<pre>
- (* Set a unconditional branch at the end of the 'then' block and the
- * 'else' block to the 'merge' block. *)
- position_at_end new_then_bb builder; ignore (build_br merge_bb builder);
- position_at_end new_else_bb builder; ignore (build_br merge_bb builder);
-
- (* Finally, set the builder to the end of the merge block. *)
- position_at_end merge_bb builder;
-
- phi
-</pre>
-</div>
-
-<p>To finish off the blocks, we create an unconditional branch
-to the merge block. One interesting (and very important) aspect of the LLVM IR
-is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
-to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
-instruction</a> such as return or branch. This means that all control flow,
-<em>including fall throughs</em> must be made explicit in the LLVM IR. If you
-violate this rule, the verifier will emit an error.
-
-<p>Finally, the CodeGen function returns the phi node as the value computed by
-the if/then/else expression. In our example above, this returned value will
-feed into the code for the top-level function, which will create the return
-instruction.</p>
-
-<p>Overall, we now have the ability to execute conditional code in
-Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language
-that can calculate a wide variety of numeric functions. Next up we'll add
-another useful expression that is familiar from non-functional languages...</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="for">'for' Loop Expression</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Now that we know how to add basic control flow constructs to the language,
-we have the tools to add more powerful things. Lets add something more
-aggressive, a 'for' expression:</p>
-
-<div class="doc_code">
-<pre>
- extern putchard(char);
- def printstar(n)
- for i = 1, i &lt; n, 1.0 in
- putchard(42); # ascii 42 = '*'
-
- # print 100 '*' characters
- printstar(100);
-</pre>
-</div>
-
-<p>This expression defines a new variable ("i" in this case) which iterates from
-a starting value, while the condition ("i &lt; n" in this case) is true,
-incrementing by an optional step value ("1.0" in this case). If the step value
-is omitted, it defaults to 1.0. While the loop is true, it executes its
-body expression. Because we don't have anything better to return, we'll just
-define the loop as always returning 0.0. In the future when we have mutable
-variables, it will get more useful.</p>
-
-<p>As before, lets talk about the changes that we need to Kaleidoscope to
-support this.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for
-the 'for' Loop</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>The lexer extensions are the same sort of thing as for if/then/else:</p>
-
-<div class="doc_code">
-<pre>
- ... in Token.token ...
- (* control *)
- | If | Then | Else
- <b>| For | In</b>
-
- ... in Lexer.lex_ident...
- match Buffer.contents buffer with
- | "def" -&gt; [&lt; 'Token.Def; stream &gt;]
- | "extern" -&gt; [&lt; 'Token.Extern; stream &gt;]
- | "if" -&gt; [&lt; 'Token.If; stream &gt;]
- | "then" -&gt; [&lt; 'Token.Then; stream &gt;]
- | "else" -&gt; [&lt; 'Token.Else; stream &gt;]
- <b>| "for" -&gt; [&lt; 'Token.For; stream &gt;]
- | "in" -&gt; [&lt; 'Token.In; stream &gt;]</b>
- | id -&gt; [&lt; 'Token.Ident id; stream &gt;]
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="forast">AST Extensions for
-the 'for' Loop</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>The AST variant is just as simple. It basically boils down to capturing
-the variable name and the constituent expressions in the node.</p>
-
-<div class="doc_code">
-<pre>
-type expr =
- ...
- (* variant for for/in. *)
- | For of string * expr * expr * expr option * expr
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="forparser">Parser Extensions for
-the 'for' Loop</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>The parser code is also fairly standard. The only interesting thing here is
-handling of the optional step value. The parser code handles it by checking to
-see if the second comma is present. If not, it sets the step value to null in
-the AST node:</p>
-
-<div class="doc_code">
-<pre>
-let rec parse_primary = parser
- ...
- (* forexpr
- ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *)
- | [&lt; 'Token.For;
- 'Token.Ident id ?? "expected identifier after for";
- 'Token.Kwd '=' ?? "expected '=' after for";
- stream &gt;] -&gt;
- begin parser
- | [&lt;
- start=parse_expr;
- 'Token.Kwd ',' ?? "expected ',' after for";
- end_=parse_expr;
- stream &gt;] -&gt;
- let step =
- begin parser
- | [&lt; 'Token.Kwd ','; step=parse_expr &gt;] -&gt; Some step
- | [&lt; &gt;] -&gt; None
- end stream
- in
- begin parser
- | [&lt; 'Token.In; body=parse_expr &gt;] -&gt;
- Ast.For (id, start, end_, step, body)
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected 'in' after for")
- end stream
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected '=' after for")
- end stream
-</pre>
-</div>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="forir">LLVM IR for
-the 'for' Loop</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>Now we get to the good part: the LLVM IR we want to generate for this thing.
-With the simple example above, we get this LLVM IR (note that this dump is
-generated with optimizations disabled for clarity):
-</p>
-
-<div class="doc_code">
-<pre>
-declare double @putchard(double)
-
-define double @printstar(double %n) {
-entry:
- ; initial value = 1.0 (inlined into phi)
- br label %loop
-
-loop: ; preds = %loop, %entry
- %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
- ; body
- %calltmp = call double @putchard( double 4.200000e+01 )
- ; increment
- %nextvar = fadd double %i, 1.000000e+00
-
- ; termination test
- %cmptmp = fcmp ult double %i, %n
- %booltmp = uitofp i1 %cmptmp to double
- %loopcond = fcmp one double %booltmp, 0.000000e+00
- br i1 %loopcond, label %loop, label %afterloop
-
-afterloop: ; preds = %loop
- ; loop always returns 0.0
- ret double 0.000000e+00
-}
-</pre>
-</div>
-
-<p>This loop contains all the same constructs we saw before: a phi node, several
-expressions, and some basic blocks. Lets see how this fits together.</p>
-
-</div>
-
-<!-- ======================================================================= -->
-<div class="doc_subsubsection"><a name="forcodegen">Code Generation for
-the 'for' Loop</a></div>
-<!-- ======================================================================= -->
-
-<div class="doc_text">
-
-<p>The first part of Codegen is very simple: we just output the start expression
-for the loop value:</p>
-
-<div class="doc_code">
-<pre>
-let rec codegen_expr = function
- ...
- | Ast.For (var_name, start, end_, step, body) -&gt;
- (* Emit the start code first, without 'variable' in scope. *)
- let start_val = codegen_expr start in
-</pre>
-</div>
-
-<p>With this out of the way, the next step is to set up the LLVM basic block
-for the start of the loop body. In the case above, the whole loop body is one
-block, but remember that the body code itself could consist of multiple blocks
-(e.g. if it contains an if/then/else or a for/in expression).</p>
-
-<div class="doc_code">
-<pre>
- (* Make the new basic block for the loop header, inserting after current
- * block. *)
- let preheader_bb = insertion_block builder in
- let the_function = block_parent preheader_bb in
- let loop_bb = append_block context "loop" the_function in
-
- (* Insert an explicit fall through from the current block to the
- * loop_bb. *)
- ignore (build_br loop_bb builder);
-</pre>
-</div>
-
-<p>This code is similar to what we saw for if/then/else. Because we will need
-it to create the Phi node, we remember the block that falls through into the
-loop. Once we have that, we create the actual block that starts the loop and
-create an unconditional branch for the fall-through between the two blocks.</p>
-
-<div class="doc_code">
-<pre>
- (* Start insertion in loop_bb. *)
- position_at_end loop_bb builder;
-
- (* Start the PHI node with an entry for start. *)
- let variable = build_phi [(start_val, preheader_bb)] var_name builder in
-</pre>
-</div>
-
-<p>Now that the "preheader" for the loop is set up, we switch to emitting code
-for the loop body. To begin with, we move the insertion point and create the
-PHI node for the loop induction variable. Since we already know the incoming
-value for the starting value, we add it to the Phi node. Note that the Phi will
-eventually get a second value for the backedge, but we can't set it up yet
-(because it doesn't exist!).</p>
-
-<div class="doc_code">
-<pre>
- (* Within the loop, the variable is defined equal to the PHI node. If it
- * shadows an existing variable, we have to restore it, so save it
- * now. *)
- let old_val =
- try Some (Hashtbl.find named_values var_name) with Not_found -&gt; None
- in
- Hashtbl.add named_values var_name variable;
-
- (* Emit the body of the loop. This, like any other expr, can change the
- * current BB. Note that we ignore the value computed by the body, but
- * don't allow an error *)
- ignore (codegen_expr body);
-</pre>
-</div>
-
-<p>Now the code starts to get more interesting. Our 'for' loop introduces a new
-variable to the symbol table. This means that our symbol table can now contain
-either function arguments or loop variables. To handle this, before we codegen
-the body of the loop, we add the loop variable as the current value for its
-name. Note that it is possible that there is a variable of the same name in the
-outer scope. It would be easy to make this an error (emit an error and return
-null if there is already an entry for VarName) but we choose to allow shadowing
-of variables. In order to handle this correctly, we remember the Value that
-we are potentially shadowing in <tt>old_val</tt> (which will be None if there is
-no shadowed variable).</p>
-
-<p>Once the loop variable is set into the symbol table, the code recursively
-codegen's the body. This allows the body to use the loop variable: any
-references to it will naturally find it in the symbol table.</p>
-
-<div class="doc_code">
-<pre>
- (* Emit the step value. *)
- let step_val =
- match step with
- | Some step -&gt; codegen_expr step
- (* If not specified, use 1.0. *)
- | None -&gt; const_float double_type 1.0
- in
-
- let next_var = build_add variable step_val "nextvar" builder in
-</pre>
-</div>
-
-<p>Now that the body is emitted, we compute the next value of the iteration
-variable by adding the step value, or 1.0 if it isn't present.
-'<tt>next_var</tt>' will be the value of the loop variable on the next iteration
-of the loop.</p>
-
-<div class="doc_code">
-<pre>
- (* Compute the end condition. *)
- let end_cond = codegen_expr end_ in
-
- (* Convert condition to a bool by comparing equal to 0.0. *)
- let zero = const_float double_type 0.0 in
- let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in
-</pre>
-</div>
-
-<p>Finally, we evaluate the exit value of the loop, to determine whether the
-loop should exit. This mirrors the condition evaluation for the if/then/else
-statement.</p>
-
-<div class="doc_code">
-<pre>
- (* Create the "after loop" block and insert it. *)
- let loop_end_bb = insertion_block builder in
- let after_bb = append_block context "afterloop" the_function in
-
- (* Insert the conditional branch into the end of loop_end_bb. *)
- ignore (build_cond_br end_cond loop_bb after_bb builder);
-
- (* Any new code will be inserted in after_bb. *)
- position_at_end after_bb builder;
-</pre>
-</div>
-
-<p>With the code for the body of the loop complete, we just need to finish up
-the control flow for it. This code remembers the end block (for the phi node), then creates the block for the loop exit ("afterloop"). Based on the value of the
-exit condition, it creates a conditional branch that chooses between executing
-the loop again and exiting the loop. Any future code is emitted in the
-"afterloop" block, so it sets the insertion position to it.</p>
-
-<div class="doc_code">
-<pre>
- (* Add a new entry to the PHI node for the backedge. *)
- add_incoming (next_var, loop_end_bb) variable;
-
- (* Restore the unshadowed variable. *)
- begin match old_val with
- | Some old_val -&gt; Hashtbl.add named_values var_name old_val
- | None -&gt; ()
- end;
-
- (* for expr always returns 0.0. *)
- const_null double_type
-</pre>
-</div>
-
-<p>The final code handles various cleanups: now that we have the
-"<tt>next_var</tt>" value, we can add the incoming value to the loop PHI node.
-After that, we remove the loop variable from the symbol table, so that it isn't
-in scope after the for loop. Finally, code generation of the for loop always
-returns 0.0, so that is what we return from <tt>Codegen.codegen_expr</tt>.</p>
-
-<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
-the tutorial. In this chapter we added two control flow constructs, and used
-them to motivate a couple of aspects of the LLVM IR that are important for
-front-end implementors to know. In the next chapter of our saga, we will get
-a bit crazier and add <a href="OCamlLangImpl6.html">user-defined operators</a>
-to our poor innocent language.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for our running example, enhanced with the
-if/then/else and for expressions.. To build this example, use:
-</p>
-
-<div class="doc_code">
-<pre>
-# Compile
-ocamlbuild toy.byte
-# Run
-./toy.byte
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<dl>
-<dt>_tags:</dt>
-<dd class="doc_code">
-<pre>
-&lt;{lexer,parser}.ml&gt;: use_camlp4, pp(camlp4of)
-&lt;*.{byte,native}&gt;: g++, use_llvm, use_llvm_analysis
-&lt;*.{byte,native}&gt;: use_llvm_executionengine, use_llvm_target
-&lt;*.{byte,native}&gt;: use_llvm_scalar_opts, use_bindings
-</pre>
-</dd>
-
-<dt>myocamlbuild.ml:</dt>
-<dd class="doc_code">
-<pre>
-open Ocamlbuild_plugin;;
-
-ocaml_lib ~extern:true "llvm";;
-ocaml_lib ~extern:true "llvm_analysis";;
-ocaml_lib ~extern:true "llvm_executionengine";;
-ocaml_lib ~extern:true "llvm_target";;
-ocaml_lib ~extern:true "llvm_scalar_opts";;
-
-flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
-dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];;
-</pre>
-</dd>
-
-<dt>token.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer Tokens
- *===----------------------------------------------------------------------===*)
-
-(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
- * these others for known things. *)
-type token =
- (* commands *)
- | Def | Extern
-
- (* primary *)
- | Ident of string | Number of float
-
- (* unknown *)
- | Kwd of char
-
- (* control *)
- | If | Then | Else
- | For | In
-</pre>
-</dd>
-
-<dt>lexer.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer
- *===----------------------------------------------------------------------===*)
-
-let rec lex = parser
- (* Skip any whitespace. *)
- | [&lt; ' (' ' | '\n' | '\r' | '\t'); stream &gt;] -&gt; lex stream
-
- (* identifier: [a-zA-Z][a-zA-Z0-9] *)
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_ident buffer stream
-
- (* number: [0-9.]+ *)
- | [&lt; ' ('0' .. '9' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_number buffer stream
-
- (* Comment until end of line. *)
- | [&lt; ' ('#'); stream &gt;] -&gt;
- lex_comment stream
-
- (* Otherwise, just return the character as its ascii value. *)
- | [&lt; 'c; stream &gt;] -&gt;
- [&lt; 'Token.Kwd c; lex stream &gt;]
-
- (* end of stream. *)
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-
-and lex_number buffer = parser
- | [&lt; ' ('0' .. '9' | '.' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_number buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- [&lt; 'Token.Number (float_of_string (Buffer.contents buffer)); stream &gt;]
-
-and lex_ident buffer = parser
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_ident buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- match Buffer.contents buffer with
- | "def" -&gt; [&lt; 'Token.Def; stream &gt;]
- | "extern" -&gt; [&lt; 'Token.Extern; stream &gt;]
- | "if" -&gt; [&lt; 'Token.If; stream &gt;]
- | "then" -&gt; [&lt; 'Token.Then; stream &gt;]
- | "else" -&gt; [&lt; 'Token.Else; stream &gt;]
- | "for" -&gt; [&lt; 'Token.For; stream &gt;]
- | "in" -&gt; [&lt; 'Token.In; stream &gt;]
- | id -&gt; [&lt; 'Token.Ident id; stream &gt;]
-
-and lex_comment = parser
- | [&lt; ' ('\n'); stream=lex &gt;] -&gt; stream
- | [&lt; 'c; e=lex_comment &gt;] -&gt; e
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-</pre>
-</dd>
-
-<dt>ast.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Abstract Syntax Tree (aka Parse Tree)
- *===----------------------------------------------------------------------===*)
-
-(* expr - Base type for all expression nodes. *)
-type expr =
- (* variant for numeric literals like "1.0". *)
- | Number of float
-
- (* variant for referencing a variable, like "a". *)
- | Variable of string
-
- (* variant for a binary operator. *)
- | Binary of char * expr * expr
-
- (* variant for function calls. *)
- | Call of string * expr array
-
- (* variant for if/then/else. *)
- | If of expr * expr * expr
-
- (* variant for for/in. *)
- | For of string * expr * expr * expr option * expr
-
-(* proto - This type represents the "prototype" for a function, which captures
- * its name, and its argument names (thus implicitly the number of arguments the
- * function takes). *)
-type proto = Prototype of string * string array
-
-(* func - This type represents a function definition itself. *)
-type func = Function of proto * expr
-</pre>
-</dd>
-
-<dt>parser.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===---------------------------------------------------------------------===
- * Parser
- *===---------------------------------------------------------------------===*)
-
-(* binop_precedence - This holds the precedence for each binary operator that is
- * defined *)
-let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
-
-(* precedence - Get the precedence of the pending binary operator token. *)
-let precedence c = try Hashtbl.find binop_precedence c with Not_found -&gt; -1
-
-(* primary
- * ::= identifier
- * ::= numberexpr
- * ::= parenexpr
- * ::= ifexpr
- * ::= forexpr *)
-let rec parse_primary = parser
- (* numberexpr ::= number *)
- | [&lt; 'Token.Number n &gt;] -&gt; Ast.Number n
-
- (* parenexpr ::= '(' expression ')' *)
- | [&lt; 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" &gt;] -&gt; e
-
- (* identifierexpr
- * ::= identifier
- * ::= identifier '(' argumentexpr ')' *)
- | [&lt; 'Token.Ident id; stream &gt;] -&gt;
- let rec parse_args accumulator = parser
- | [&lt; e=parse_expr; stream &gt;] -&gt;
- begin parser
- | [&lt; 'Token.Kwd ','; e=parse_args (e :: accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; e :: accumulator
- end stream
- | [&lt; &gt;] -&gt; accumulator
- in
- let rec parse_ident id = parser
- (* Call. *)
- | [&lt; 'Token.Kwd '(';
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')'"&gt;] -&gt;
- Ast.Call (id, Array.of_list (List.rev args))
-
- (* Simple variable ref. *)
- | [&lt; &gt;] -&gt; Ast.Variable id
- in
- parse_ident id stream
-
- (* ifexpr ::= 'if' expr 'then' expr 'else' expr *)
- | [&lt; 'Token.If; c=parse_expr;
- 'Token.Then ?? "expected 'then'"; t=parse_expr;
- 'Token.Else ?? "expected 'else'"; e=parse_expr &gt;] -&gt;
- Ast.If (c, t, e)
-
- (* forexpr
- ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *)
- | [&lt; 'Token.For;
- 'Token.Ident id ?? "expected identifier after for";
- 'Token.Kwd '=' ?? "expected '=' after for";
- stream &gt;] -&gt;
- begin parser
- | [&lt;
- start=parse_expr;
- 'Token.Kwd ',' ?? "expected ',' after for";
- end_=parse_expr;
- stream &gt;] -&gt;
- let step =
- begin parser
- | [&lt; 'Token.Kwd ','; step=parse_expr &gt;] -&gt; Some step
- | [&lt; &gt;] -&gt; None
- end stream
- in
- begin parser
- | [&lt; 'Token.In; body=parse_expr &gt;] -&gt;
- Ast.For (id, start, end_, step, body)
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected 'in' after for")
- end stream
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected '=' after for")
- end stream
-
- | [&lt; &gt;] -&gt; raise (Stream.Error "unknown token when expecting an expression.")
-
-(* binoprhs
- * ::= ('+' primary)* *)
-and parse_bin_rhs expr_prec lhs stream =
- match Stream.peek stream with
- (* If this is a binop, find its precedence. *)
- | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -&gt;
- let token_prec = precedence c in
-
- (* If this is a binop that binds at least as tightly as the current binop,
- * consume it, otherwise we are done. *)
- if token_prec &lt; expr_prec then lhs else begin
- (* Eat the binop. *)
- Stream.junk stream;
-
- (* Parse the primary expression after the binary operator. *)
- let rhs = parse_primary stream in
-
- (* Okay, we know this is a binop. *)
- let rhs =
- match Stream.peek stream with
- | Some (Token.Kwd c2) -&gt;
- (* If BinOp binds less tightly with rhs than the operator after
- * rhs, let the pending operator take rhs as its lhs. *)
- let next_prec = precedence c2 in
- if token_prec &lt; next_prec
- then parse_bin_rhs (token_prec + 1) rhs stream
- else rhs
- | _ -&gt; rhs
- in
-
- (* Merge lhs/rhs. *)
- let lhs = Ast.Binary (c, lhs, rhs) in
- parse_bin_rhs expr_prec lhs stream
- end
- | _ -&gt; lhs
-
-(* expression
- * ::= primary binoprhs *)
-and parse_expr = parser
- | [&lt; lhs=parse_primary; stream &gt;] -&gt; parse_bin_rhs 0 lhs stream
-
-(* prototype
- * ::= id '(' id* ')' *)
-let parse_prototype =
- let rec parse_args accumulator = parser
- | [&lt; 'Token.Ident id; e=parse_args (id::accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
- in
-
- parser
- | [&lt; 'Token.Ident id;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- (* success. *)
- Ast.Prototype (id, Array.of_list (List.rev args))
-
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected function name in prototype")
-
-(* definition ::= 'def' prototype expression *)
-let parse_definition = parser
- | [&lt; 'Token.Def; p=parse_prototype; e=parse_expr &gt;] -&gt;
- Ast.Function (p, e)
-
-(* toplevelexpr ::= expression *)
-let parse_toplevel = parser
- | [&lt; e=parse_expr &gt;] -&gt;
- (* Make an anonymous proto. *)
- Ast.Function (Ast.Prototype ("", [||]), e)
-
-(* external ::= 'extern' prototype *)
-let parse_extern = parser
- | [&lt; 'Token.Extern; e=parse_prototype &gt;] -&gt; e
-</pre>
-</dd>
-
-<dt>codegen.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Code Generation
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-
-exception Error of string
-
-let context = global_context ()
-let the_module = create_module context "my cool jit"
-let builder = builder context
-let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
-let double_type = double_type context
-
-let rec codegen_expr = function
- | Ast.Number n -&gt; const_float double_type n
- | Ast.Variable name -&gt;
- (try Hashtbl.find named_values name with
- | Not_found -&gt; raise (Error "unknown variable name"))
- | Ast.Binary (op, lhs, rhs) -&gt;
- let lhs_val = codegen_expr lhs in
- let rhs_val = codegen_expr rhs in
- begin
- match op with
- | '+' -&gt; build_add lhs_val rhs_val "addtmp" builder
- | '-' -&gt; build_sub lhs_val rhs_val "subtmp" builder
- | '*' -&gt; build_mul lhs_val rhs_val "multmp" builder
- | '&lt;' -&gt;
- (* Convert bool 0/1 to double 0.0 or 1.0 *)
- let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
- build_uitofp i double_type "booltmp" builder
- | _ -&gt; raise (Error "invalid binary operator")
- end
- | Ast.Call (callee, args) -&gt;
- (* Look up the name in the module table. *)
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "unknown function referenced")
- in
- let params = params callee in
-
- (* If argument mismatch error. *)
- if Array.length params == Array.length args then () else
- raise (Error "incorrect # arguments passed");
- let args = Array.map codegen_expr args in
- build_call callee args "calltmp" builder
- | Ast.If (cond, then_, else_) -&gt;
- let cond = codegen_expr cond in
-
- (* Convert condition to a bool by comparing equal to 0.0 *)
- let zero = const_float double_type 0.0 in
- let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in
-
- (* Grab the first block so that we might later add the conditional branch
- * to it at the end of the function. *)
- let start_bb = insertion_block builder in
- let the_function = block_parent start_bb in
-
- let then_bb = append_block context "then" the_function in
-
- (* Emit 'then' value. *)
- position_at_end then_bb builder;
- let then_val = codegen_expr then_ in
-
- (* Codegen of 'then' can change the current block, update then_bb for the
- * phi. We create a new name because one is used for the phi node, and the
- * other is used for the conditional branch. *)
- let new_then_bb = insertion_block builder in
-
- (* Emit 'else' value. *)
- let else_bb = append_block context "else" the_function in
- position_at_end else_bb builder;
- let else_val = codegen_expr else_ in
-
- (* Codegen of 'else' can change the current block, update else_bb for the
- * phi. *)
- let new_else_bb = insertion_block builder in
-
- (* Emit merge block. *)
- let merge_bb = append_block context "ifcont" the_function in
- position_at_end merge_bb builder;
- let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in
- let phi = build_phi incoming "iftmp" builder in
-
- (* Return to the start block to add the conditional branch. *)
- position_at_end start_bb builder;
- ignore (build_cond_br cond_val then_bb else_bb builder);
-
- (* Set a unconditional branch at the end of the 'then' block and the
- * 'else' block to the 'merge' block. *)
- position_at_end new_then_bb builder; ignore (build_br merge_bb builder);
- position_at_end new_else_bb builder; ignore (build_br merge_bb builder);
-
- (* Finally, set the builder to the end of the merge block. *)
- position_at_end merge_bb builder;
-
- phi
- | Ast.For (var_name, start, end_, step, body) -&gt;
- (* Emit the start code first, without 'variable' in scope. *)
- let start_val = codegen_expr start in
-
- (* Make the new basic block for the loop header, inserting after current
- * block. *)
- let preheader_bb = insertion_block builder in
- let the_function = block_parent preheader_bb in
- let loop_bb = append_block context "loop" the_function in
-
- (* Insert an explicit fall through from the current block to the
- * loop_bb. *)
- ignore (build_br loop_bb builder);
-
- (* Start insertion in loop_bb. *)
- position_at_end loop_bb builder;
-
- (* Start the PHI node with an entry for start. *)
- let variable = build_phi [(start_val, preheader_bb)] var_name builder in
-
- (* Within the loop, the variable is defined equal to the PHI node. If it
- * shadows an existing variable, we have to restore it, so save it
- * now. *)
- let old_val =
- try Some (Hashtbl.find named_values var_name) with Not_found -&gt; None
- in
- Hashtbl.add named_values var_name variable;
-
- (* Emit the body of the loop. This, like any other expr, can change the
- * current BB. Note that we ignore the value computed by the body, but
- * don't allow an error *)
- ignore (codegen_expr body);
-
- (* Emit the step value. *)
- let step_val =
- match step with
- | Some step -&gt; codegen_expr step
- (* If not specified, use 1.0. *)
- | None -&gt; const_float double_type 1.0
- in
-
- let next_var = build_add variable step_val "nextvar" builder in
-
- (* Compute the end condition. *)
- let end_cond = codegen_expr end_ in
-
- (* Convert condition to a bool by comparing equal to 0.0. *)
- let zero = const_float double_type 0.0 in
- let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in
-
- (* Create the "after loop" block and insert it. *)
- let loop_end_bb = insertion_block builder in
- let after_bb = append_block context "afterloop" the_function in
-
- (* Insert the conditional branch into the end of loop_end_bb. *)
- ignore (build_cond_br end_cond loop_bb after_bb builder);
-
- (* Any new code will be inserted in after_bb. *)
- position_at_end after_bb builder;
-
- (* Add a new entry to the PHI node for the backedge. *)
- add_incoming (next_var, loop_end_bb) variable;
-
- (* Restore the unshadowed variable. *)
- begin match old_val with
- | Some old_val -&gt; Hashtbl.add named_values var_name old_val
- | None -&gt; ()
- end;
-
- (* for expr always returns 0.0. *)
- const_null double_type
-
-let codegen_proto = function
- | Ast.Prototype (name, args) -&gt;
- (* Make the function type: double(double,double) etc. *)
- let doubles = Array.make (Array.length args) double_type in
- let ft = function_type double_type doubles in
- let f =
- match lookup_function name the_module with
- | None -&gt; declare_function name ft the_module
-
- (* If 'f' conflicted, there was already something named 'name'. If it
- * has a body, don't allow redefinition or reextern. *)
- | Some f -&gt;
- (* If 'f' already has a body, reject this. *)
- if block_begin f &lt;&gt; At_end f then
- raise (Error "redefinition of function");
-
- (* If 'f' took a different number of arguments, reject. *)
- if element_type (type_of f) &lt;&gt; ft then
- raise (Error "redefinition of function with different # args");
- f
- in
-
- (* Set names for all arguments. *)
- Array.iteri (fun i a -&gt;
- let n = args.(i) in
- set_value_name n a;
- Hashtbl.add named_values n a;
- ) (params f);
- f
-
-let codegen_func the_fpm = function
- | Ast.Function (proto, body) -&gt;
- Hashtbl.clear named_values;
- let the_function = codegen_proto proto in
-
- (* Create a new basic block to start insertion into. *)
- let bb = append_block context "entry" the_function in
- position_at_end bb builder;
-
- try
- let ret_val = codegen_expr body in
-
- (* Finish off the function. *)
- let _ = build_ret ret_val builder in
-
- (* Validate the generated code, checking for consistency. *)
- Llvm_analysis.assert_valid_function the_function;
-
- (* Optimize the function. *)
- let _ = PassManager.run_function the_function the_fpm in
-
- the_function
- with e -&gt;
- delete_function the_function;
- raise e
-</pre>
-</dd>
-
-<dt>toplevel.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Top-Level parsing and JIT Driver
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-open Llvm_executionengine
-
-(* top ::= definition | external | expression | ';' *)
-let rec main_loop the_fpm the_execution_engine stream =
- match Stream.peek stream with
- | None -&gt; ()
-
- (* ignore top-level semicolons. *)
- | Some (Token.Kwd ';') -&gt;
- Stream.junk stream;
- main_loop the_fpm the_execution_engine stream
-
- | Some token -&gt;
- begin
- try match token with
- | Token.Def -&gt;
- let e = Parser.parse_definition stream in
- print_endline "parsed a function definition.";
- dump_value (Codegen.codegen_func the_fpm e);
- | Token.Extern -&gt;
- let e = Parser.parse_extern stream in
- print_endline "parsed an extern.";
- dump_value (Codegen.codegen_proto e);
- | _ -&gt;
- (* Evaluate a top-level expression into an anonymous function. *)
- let e = Parser.parse_toplevel stream in
- print_endline "parsed a top-level expr";
- let the_function = Codegen.codegen_func the_fpm e in
- dump_value the_function;
-
- (* JIT the function, returning a function pointer. *)
- let result = ExecutionEngine.run_function the_function [||]
- the_execution_engine in
-
- print_string "Evaluated to ";
- print_float (GenericValue.as_float Codegen.double_type result);
- print_newline ();
- with Stream.Error s | Codegen.Error s -&gt;
- (* Skip token for error recovery. *)
- Stream.junk stream;
- print_endline s;
- end;
- print_string "ready&gt; "; flush stdout;
- main_loop the_fpm the_execution_engine stream
-</pre>
-</dd>
-
-<dt>toy.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Main driver code.
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-open Llvm_executionengine
-open Llvm_target
-open Llvm_scalar_opts
-
-let main () =
- ignore (initialize_native_target ());
-
- (* Install standard binary operators.
- * 1 is the lowest precedence. *)
- Hashtbl.add Parser.binop_precedence '&lt;' 10;
- Hashtbl.add Parser.binop_precedence '+' 20;
- Hashtbl.add Parser.binop_precedence '-' 20;
- Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
-
- (* Prime the first token. *)
- print_string "ready&gt; "; flush stdout;
- let stream = Lexer.lex (Stream.of_channel stdin) in
-
- (* Create the JIT. *)
- let the_execution_engine = ExecutionEngine.create Codegen.the_module in
- let the_fpm = PassManager.create_function Codegen.the_module in
-
- (* Set up the optimizer pipeline. Start with registering info about how the
- * target lays out data structures. *)
- TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
-
- (* Do simple "peephole" optimizations and bit-twiddling optzn. *)
- add_instruction_combination the_fpm;
-
- (* reassociate expressions. *)
- add_reassociation the_fpm;
-
- (* Eliminate Common SubExpressions. *)
- add_gvn the_fpm;
-
- (* Simplify the control flow graph (deleting unreachable blocks, etc). *)
- add_cfg_simplification the_fpm;
-
- ignore (PassManager.initialize the_fpm);
-
- (* Run the main "interpreter loop" now. *)
- Toplevel.main_loop the_fpm the_execution_engine stream;
-
- (* Print out all the generated code. *)
- dump_module Codegen.the_module
-;;
-
-main ()
-</pre>
-</dd>
-
-<dt>bindings.c</dt>
-<dd class="doc_code">
-<pre>
-#include &lt;stdio.h&gt;
-
-/* putchard - putchar that takes a double and returns 0. */
-extern double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-</pre>
-</dd>
-</dl>
-
-<a href="OCamlLangImpl6.html">Next: Extending the language: user-defined
-operators</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/OCamlLangImpl6.html b/docs/tutorial/OCamlLangImpl6.html
deleted file mode 100644
index b444fff..0000000
--- a/docs/tutorial/OCamlLangImpl6.html
+++ /dev/null
@@ -1,1574 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Extending the Language: User-defined Operators</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <meta name="author" content="Erick Tryzelaar">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Extending the Language: User-defined Operators</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 6
- <ol>
- <li><a href="#intro">Chapter 6 Introduction</a></li>
- <li><a href="#idea">User-defined Operators: the Idea</a></li>
- <li><a href="#binary">User-defined Binary Operators</a></li>
- <li><a href="#unary">User-defined Unary Operators</a></li>
- <li><a href="#example">Kicking the Tires</a></li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="OCamlLangImpl7.html">Chapter 7</a>: Extending the Language: Mutable
-Variables / SSA Construction</li>
-</ul>
-
-<div class="doc_author">
- <p>
- Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
- and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a>
- </p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 6 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 6 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. At this point in our tutorial, we now have a fully
-functional language that is fairly minimal, but also useful. There
-is still one big problem with it, however. Our language doesn't have many
-useful operators (like division, logical negation, or even any comparisons
-besides less-than).</p>
-
-<p>This chapter of the tutorial takes a wild digression into adding user-defined
-operators to the simple and beautiful Kaleidoscope language. This digression now
-gives us a simple and ugly language in some ways, but also a powerful one at the
-same time. One of the great things about creating your own language is that you
-get to decide what is good or bad. In this tutorial we'll assume that it is
-okay to use this as a way to show some interesting parsing techniques.</p>
-
-<p>At the end of this tutorial, we'll run through an example Kaleidoscope
-application that <a href="#example">renders the Mandelbrot set</a>. This gives
-an example of what you can build with Kaleidoscope and its feature set.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="idea">User-defined Operators: the Idea</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-The "operator overloading" that we will add to Kaleidoscope is more general than
-languages like C++. In C++, you are only allowed to redefine existing
-operators: you can't programatically change the grammar, introduce new
-operators, change precedence levels, etc. In this chapter, we will add this
-capability to Kaleidoscope, which will let the user round out the set of
-operators that are supported.</p>
-
-<p>The point of going into user-defined operators in a tutorial like this is to
-show the power and flexibility of using a hand-written parser. Thus far, the parser
-we have been implementing uses recursive descent for most parts of the grammar and
-operator precedence parsing for the expressions. See <a
-href="OCamlLangImpl2.html">Chapter 2</a> for details. Without using operator
-precedence parsing, it would be very difficult to allow the programmer to
-introduce new operators into the grammar: the grammar is dynamically extensible
-as the JIT runs.</p>
-
-<p>The two specific features we'll add are programmable unary operators (right
-now, Kaleidoscope has no unary operators at all) as well as binary operators.
-An example of this is:</p>
-
-<div class="doc_code">
-<pre>
-# Logical unary not.
-def unary!(v)
- if v then
- 0
- else
- 1;
-
-# Define &gt; with the same precedence as &lt;.
-def binary&gt; 10 (LHS RHS)
- RHS &lt; LHS;
-
-# Binary "logical or", (note that it does not "short circuit")
-def binary| 5 (LHS RHS)
- if LHS then
- 1
- else if RHS then
- 1
- else
- 0;
-
-# Define = with slightly lower precedence than relationals.
-def binary= 9 (LHS RHS)
- !(LHS &lt; RHS | LHS &gt; RHS);
-</pre>
-</div>
-
-<p>Many languages aspire to being able to implement their standard runtime
-library in the language itself. In Kaleidoscope, we can implement significant
-parts of the language in the library!</p>
-
-<p>We will break down implementation of these features into two parts:
-implementing support for user-defined binary operators and adding unary
-operators.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="binary">User-defined Binary Operators</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Adding support for user-defined binary operators is pretty simple with our
-current framework. We'll first add support for the unary/binary keywords:</p>
-
-<div class="doc_code">
-<pre>
-type token =
- ...
- <b>(* operators *)
- | Binary | Unary</b>
-
-...
-
-and lex_ident buffer = parser
- ...
- | "for" -&gt; [&lt; 'Token.For; stream &gt;]
- | "in" -&gt; [&lt; 'Token.In; stream &gt;]
- <b>| "binary" -&gt; [&lt; 'Token.Binary; stream &gt;]
- | "unary" -&gt; [&lt; 'Token.Unary; stream &gt;]</b>
-</pre>
-</div>
-
-<p>This just adds lexer support for the unary and binary keywords, like we
-did in <a href="OCamlLangImpl5.html#iflexer">previous chapters</a>. One nice
-thing about our current AST, is that we represent binary operators with full
-generalisation by using their ASCII code as the opcode. For our extended
-operators, we'll use this same representation, so we don't need any new AST or
-parser support.</p>
-
-<p>On the other hand, we have to be able to represent the definitions of these
-new operators, in the "def binary| 5" part of the function definition. In our
-grammar so far, the "name" for the function definition is parsed as the
-"prototype" production and into the <tt>Ast.Prototype</tt> AST node. To
-represent our new user-defined operators as prototypes, we have to extend
-the <tt>Ast.Prototype</tt> AST node like this:</p>
-
-<div class="doc_code">
-<pre>
-(* proto - This type represents the "prototype" for a function, which captures
- * its name, and its argument names (thus implicitly the number of arguments the
- * function takes). *)
-type proto =
- | Prototype of string * string array
- <b>| BinOpPrototype of string * string array * int</b>
-</pre>
-</div>
-
-<p>Basically, in addition to knowing a name for the prototype, we now keep track
-of whether it was an operator, and if it was, what precedence level the operator
-is at. The precedence is only used for binary operators (as you'll see below,
-it just doesn't apply for unary operators). Now that we have a way to represent
-the prototype for a user-defined operator, we need to parse it:</p>
-
-<div class="doc_code">
-<pre>
-(* prototype
- * ::= id '(' id* ')'
- <b>* ::= binary LETTER number? (id, id)
- * ::= unary LETTER number? (id) *)</b>
-let parse_prototype =
- let rec parse_args accumulator = parser
- | [&lt; 'Token.Ident id; e=parse_args (id::accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
- in
- let parse_operator = parser
- | [&lt; 'Token.Unary &gt;] -&gt; "unary", 1
- | [&lt; 'Token.Binary &gt;] -&gt; "binary", 2
- in
- let parse_binary_precedence = parser
- | [&lt; 'Token.Number n &gt;] -&gt; int_of_float n
- | [&lt; &gt;] -&gt; 30
- in
- parser
- | [&lt; 'Token.Ident id;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- (* success. *)
- Ast.Prototype (id, Array.of_list (List.rev args))
- <b>| [&lt; (prefix, kind)=parse_operator;
- 'Token.Kwd op ?? "expected an operator";
- (* Read the precedence if present. *)
- binary_precedence=parse_binary_precedence;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- let name = prefix ^ (String.make 1 op) in
- let args = Array.of_list (List.rev args) in
-
- (* Verify right number of arguments for operator. *)
- if Array.length args != kind
- then raise (Stream.Error "invalid number of operands for operator")
- else
- if kind == 1 then
- Ast.Prototype (name, args)
- else
- Ast.BinOpPrototype (name, args, binary_precedence)</b>
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected function name in prototype")
-</pre>
-</div>
-
-<p>This is all fairly straightforward parsing code, and we have already seen
-a lot of similar code in the past. One interesting part about the code above is
-the couple lines that set up <tt>name</tt> for binary operators. This builds
-names like "binary@" for a newly defined "@" operator. This then takes
-advantage of the fact that symbol names in the LLVM symbol table are allowed to
-have any character in them, including embedded nul characters.</p>
-
-<p>The next interesting thing to add, is codegen support for these binary
-operators. Given our current structure, this is a simple addition of a default
-case for our existing binary operator node:</p>
-
-<div class="doc_code">
-<pre>
-let codegen_expr = function
- ...
- | Ast.Binary (op, lhs, rhs) -&gt;
- let lhs_val = codegen_expr lhs in
- let rhs_val = codegen_expr rhs in
- begin
- match op with
- | '+' -&gt; build_add lhs_val rhs_val "addtmp" builder
- | '-' -&gt; build_sub lhs_val rhs_val "subtmp" builder
- | '*' -&gt; build_mul lhs_val rhs_val "multmp" builder
- | '&lt;' -&gt;
- (* Convert bool 0/1 to double 0.0 or 1.0 *)
- let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
- build_uitofp i double_type "booltmp" builder
- <b>| _ -&gt;
- (* If it wasn't a builtin binary operator, it must be a user defined
- * one. Emit a call to it. *)
- let callee = "binary" ^ (String.make 1 op) in
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "binary operator not found!")
- in
- build_call callee [|lhs_val; rhs_val|] "binop" builder</b>
- end
-</pre>
-</div>
-
-<p>As you can see above, the new code is actually really simple. It just does
-a lookup for the appropriate operator in the symbol table and generates a
-function call to it. Since user-defined operators are just built as normal
-functions (because the "prototype" boils down to a function with the right
-name) everything falls into place.</p>
-
-<p>The final piece of code we are missing, is a bit of top level magic:</p>
-
-<div class="doc_code">
-<pre>
-let codegen_func the_fpm = function
- | Ast.Function (proto, body) -&gt;
- Hashtbl.clear named_values;
- let the_function = codegen_proto proto in
-
- <b>(* If this is an operator, install it. *)
- begin match proto with
- | Ast.BinOpPrototype (name, args, prec) -&gt;
- let op = name.[String.length name - 1] in
- Hashtbl.add Parser.binop_precedence op prec;
- | _ -&gt; ()
- end;</b>
-
- (* Create a new basic block to start insertion into. *)
- let bb = append_block context "entry" the_function in
- position_at_end bb builder;
- ...
-</pre>
-</div>
-
-<p>Basically, before codegening a function, if it is a user-defined operator, we
-register it in the precedence table. This allows the binary operator parsing
-logic we already have in place to handle it. Since we are working on a
-fully-general operator precedence parser, this is all we need to do to "extend
-the grammar".</p>
-
-<p>Now we have useful user-defined binary operators. This builds a lot
-on the previous framework we built for other operators. Adding unary operators
-is a bit more challenging, because we don't have any framework for it yet - lets
-see what it takes.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="unary">User-defined Unary Operators</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Since we don't currently support unary operators in the Kaleidoscope
-language, we'll need to add everything to support them. Above, we added simple
-support for the 'unary' keyword to the lexer. In addition to that, we need an
-AST node:</p>
-
-<div class="doc_code">
-<pre>
-type expr =
- ...
- (* variant for a unary operator. *)
- | Unary of char * expr
- ...
-</pre>
-</div>
-
-<p>This AST node is very simple and obvious by now. It directly mirrors the
-binary operator AST node, except that it only has one child. With this, we
-need to add the parsing logic. Parsing a unary operator is pretty simple: we'll
-add a new function to do it:</p>
-
-<div class="doc_code">
-<pre>
-(* unary
- * ::= primary
- * ::= '!' unary *)
-and parse_unary = parser
- (* If this is a unary operator, read it. *)
- | [&lt; 'Token.Kwd op when op != '(' &amp;&amp; op != ')'; operand=parse_expr &gt;] -&gt;
- Ast.Unary (op, operand)
-
- (* If the current token is not an operator, it must be a primary expr. *)
- | [&lt; stream &gt;] -&gt; parse_primary stream
-</pre>
-</div>
-
-<p>The grammar we add is pretty straightforward here. If we see a unary
-operator when parsing a primary operator, we eat the operator as a prefix and
-parse the remaining piece as another unary operator. This allows us to handle
-multiple unary operators (e.g. "!!x"). Note that unary operators can't have
-ambiguous parses like binary operators can, so there is no need for precedence
-information.</p>
-
-<p>The problem with this function, is that we need to call ParseUnary from
-somewhere. To do this, we change previous callers of ParsePrimary to call
-<tt>parse_unary</tt> instead:</p>
-
-<div class="doc_code">
-<pre>
-(* binoprhs
- * ::= ('+' primary)* *)
-and parse_bin_rhs expr_prec lhs stream =
- ...
- <b>(* Parse the unary expression after the binary operator. *)
- let rhs = parse_unary stream in</b>
- ...
-
-...
-
-(* expression
- * ::= primary binoprhs *)
-and parse_expr = parser
- | [&lt; lhs=<b>parse_unary</b>; stream &gt;] -&gt; parse_bin_rhs 0 lhs stream
-</pre>
-</div>
-
-<p>With these two simple changes, we are now able to parse unary operators and build the
-AST for them. Next up, we need to add parser support for prototypes, to parse
-the unary operator prototype. We extend the binary operator code above
-with:</p>
-
-<div class="doc_code">
-<pre>
-(* prototype
- * ::= id '(' id* ')'
- * ::= binary LETTER number? (id, id)
- <b>* ::= unary LETTER number? (id)</b> *)
-let parse_prototype =
- let rec parse_args accumulator = parser
- | [&lt; 'Token.Ident id; e=parse_args (id::accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
- in
- <b>let parse_operator = parser
- | [&lt; 'Token.Unary &gt;] -&gt; "unary", 1
- | [&lt; 'Token.Binary &gt;] -&gt; "binary", 2
- in</b>
- let parse_binary_precedence = parser
- | [&lt; 'Token.Number n &gt;] -&gt; int_of_float n
- | [&lt; &gt;] -&gt; 30
- in
- parser
- | [&lt; 'Token.Ident id;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- (* success. *)
- Ast.Prototype (id, Array.of_list (List.rev args))
- <b>| [&lt; (prefix, kind)=parse_operator;
- 'Token.Kwd op ?? "expected an operator";
- (* Read the precedence if present. *)
- binary_precedence=parse_binary_precedence;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- let name = prefix ^ (String.make 1 op) in
- let args = Array.of_list (List.rev args) in
-
- (* Verify right number of arguments for operator. *)
- if Array.length args != kind
- then raise (Stream.Error "invalid number of operands for operator")
- else
- if kind == 1 then
- Ast.Prototype (name, args)
- else
- Ast.BinOpPrototype (name, args, binary_precedence)</b>
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected function name in prototype")
-</pre>
-</div>
-
-<p>As with binary operators, we name unary operators with a name that includes
-the operator character. This assists us at code generation time. Speaking of,
-the final piece we need to add is codegen support for unary operators. It looks
-like this:</p>
-
-<div class="doc_code">
-<pre>
-let rec codegen_expr = function
- ...
- | Ast.Unary (op, operand) -&gt;
- let operand = codegen_expr operand in
- let callee = "unary" ^ (String.make 1 op) in
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "unknown unary operator")
- in
- build_call callee [|operand|] "unop" builder
-</pre>
-</div>
-
-<p>This code is similar to, but simpler than, the code for binary operators. It
-is simpler primarily because it doesn't need to handle any predefined operators.
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="example">Kicking the Tires</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>It is somewhat hard to believe, but with a few simple extensions we've
-covered in the last chapters, we have grown a real-ish language. With this, we
-can do a lot of interesting things, including I/O, math, and a bunch of other
-things. For example, we can now add a nice sequencing operator (printd is
-defined to print out the specified value and a newline):</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>extern printd(x);</b>
-Read extern: declare double @printd(double)
-ready&gt; <b>def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.</b>
-..
-ready&gt; <b>printd(123) : printd(456) : printd(789);</b>
-123.000000
-456.000000
-789.000000
-Evaluated to 0.000000
-</pre>
-</div>
-
-<p>We can also define a bunch of other "primitive" operations, such as:</p>
-
-<div class="doc_code">
-<pre>
-# Logical unary not.
-def unary!(v)
- if v then
- 0
- else
- 1;
-
-# Unary negate.
-def unary-(v)
- 0-v;
-
-# Define &gt; with the same precedence as &gt;.
-def binary&gt; 10 (LHS RHS)
- RHS &lt; LHS;
-
-# Binary logical or, which does not short circuit.
-def binary| 5 (LHS RHS)
- if LHS then
- 1
- else if RHS then
- 1
- else
- 0;
-
-# Binary logical and, which does not short circuit.
-def binary&amp; 6 (LHS RHS)
- if !LHS then
- 0
- else
- !!RHS;
-
-# Define = with slightly lower precedence than relationals.
-def binary = 9 (LHS RHS)
- !(LHS &lt; RHS | LHS &gt; RHS);
-
-</pre>
-</div>
-
-
-<p>Given the previous if/then/else support, we can also define interesting
-functions for I/O. For example, the following prints out a character whose
-"density" reflects the value passed in: the lower the value, the denser the
-character:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt;
-<b>
-extern putchard(char)
-def printdensity(d)
- if d &gt; 8 then
- putchard(32) # ' '
- else if d &gt; 4 then
- putchard(46) # '.'
- else if d &gt; 2 then
- putchard(43) # '+'
- else
- putchard(42); # '*'</b>
-...
-ready&gt; <b>printdensity(1): printdensity(2): printdensity(3) :
- printdensity(4): printdensity(5): printdensity(9): putchard(10);</b>
-*++..
-Evaluated to 0.000000
-</pre>
-</div>
-
-<p>Based on these simple primitive operations, we can start to define more
-interesting things. For example, here's a little function that solves for the
-number of iterations it takes a function in the complex plane to
-converge:</p>
-
-<div class="doc_code">
-<pre>
-# determine whether the specific location diverges.
-# Solve for z = z^2 + c in the complex plane.
-def mandleconverger(real imag iters creal cimag)
- if iters &gt; 255 | (real*real + imag*imag &gt; 4) then
- iters
- else
- mandleconverger(real*real - imag*imag + creal,
- 2*real*imag + cimag,
- iters+1, creal, cimag);
-
-# return the number of iterations required for the iteration to escape
-def mandleconverge(real imag)
- mandleconverger(real, imag, 0, real, imag);
-</pre>
-</div>
-
-<p>This "z = z<sup>2</sup> + c" function is a beautiful little creature that is the basis
-for computation of the <a
-href="http://en.wikipedia.org/wiki/Mandelbrot_set">Mandelbrot Set</a>. Our
-<tt>mandelconverge</tt> function returns the number of iterations that it takes
-for a complex orbit to escape, saturating to 255. This is not a very useful
-function by itself, but if you plot its value over a two-dimensional plane,
-you can see the Mandelbrot set. Given that we are limited to using putchard
-here, our amazing graphical output is limited, but we can whip together
-something using the density plotter above:</p>
-
-<div class="doc_code">
-<pre>
-# compute and plot the mandlebrot set with the specified 2 dimensional range
-# info.
-def mandelhelp(xmin xmax xstep ymin ymax ystep)
- for y = ymin, y &lt; ymax, ystep in (
- (for x = xmin, x &lt; xmax, xstep in
- printdensity(mandleconverge(x,y)))
- : putchard(10)
- )
-
-# mandel - This is a convenient helper function for ploting the mandelbrot set
-# from the specified position with the specified Magnification.
-def mandel(realstart imagstart realmag imagmag)
- mandelhelp(realstart, realstart+realmag*78, realmag,
- imagstart, imagstart+imagmag*40, imagmag);
-</pre>
-</div>
-
-<p>Given this, we can try plotting out the mandlebrot set! Lets try it out:</p>
-
-<div class="doc_code">
-<pre>
-ready&gt; <b>mandel(-2.3, -1.3, 0.05, 0.07);</b>
-*******************************+++++++++++*************************************
-*************************+++++++++++++++++++++++*******************************
-**********************+++++++++++++++++++++++++++++****************************
-*******************+++++++++++++++++++++.. ...++++++++*************************
-*****************++++++++++++++++++++++.... ...+++++++++***********************
-***************+++++++++++++++++++++++..... ...+++++++++*********************
-**************+++++++++++++++++++++++.... ....+++++++++********************
-*************++++++++++++++++++++++...... .....++++++++*******************
-************+++++++++++++++++++++....... .......+++++++******************
-***********+++++++++++++++++++.... ... .+++++++*****************
-**********+++++++++++++++++....... .+++++++****************
-*********++++++++++++++........... ...+++++++***************
-********++++++++++++............ ...++++++++**************
-********++++++++++... .......... .++++++++**************
-*******+++++++++..... .+++++++++*************
-*******++++++++...... ..+++++++++*************
-*******++++++....... ..+++++++++*************
-*******+++++...... ..+++++++++*************
-*******.... .... ...+++++++++*************
-*******.... . ...+++++++++*************
-*******+++++...... ...+++++++++*************
-*******++++++....... ..+++++++++*************
-*******++++++++...... .+++++++++*************
-*******+++++++++..... ..+++++++++*************
-********++++++++++... .......... .++++++++**************
-********++++++++++++............ ...++++++++**************
-*********++++++++++++++.......... ...+++++++***************
-**********++++++++++++++++........ .+++++++****************
-**********++++++++++++++++++++.... ... ..+++++++****************
-***********++++++++++++++++++++++....... .......++++++++*****************
-************+++++++++++++++++++++++...... ......++++++++******************
-**************+++++++++++++++++++++++.... ....++++++++********************
-***************+++++++++++++++++++++++..... ...+++++++++*********************
-*****************++++++++++++++++++++++.... ...++++++++***********************
-*******************+++++++++++++++++++++......++++++++*************************
-*********************++++++++++++++++++++++.++++++++***************************
-*************************+++++++++++++++++++++++*******************************
-******************************+++++++++++++************************************
-*******************************************************************************
-*******************************************************************************
-*******************************************************************************
-Evaluated to 0.000000
-ready&gt; <b>mandel(-2, -1, 0.02, 0.04);</b>
-**************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
-***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-*********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
-*******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
-*****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
-***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
-**************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
-************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
-***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
-**********++++++++++++++++++++++++++++++++++++++++++++++.............
-********+++++++++++++++++++++++++++++++++++++++++++..................
-*******+++++++++++++++++++++++++++++++++++++++.......................
-******+++++++++++++++++++++++++++++++++++...........................
-*****++++++++++++++++++++++++++++++++............................
-*****++++++++++++++++++++++++++++...............................
-****++++++++++++++++++++++++++...... .........................
-***++++++++++++++++++++++++......... ...... ...........
-***++++++++++++++++++++++............
-**+++++++++++++++++++++..............
-**+++++++++++++++++++................
-*++++++++++++++++++.................
-*++++++++++++++++............ ...
-*++++++++++++++..............
-*+++....++++................
-*.......... ...........
-*
-*.......... ...........
-*+++....++++................
-*++++++++++++++..............
-*++++++++++++++++............ ...
-*++++++++++++++++++.................
-**+++++++++++++++++++................
-**+++++++++++++++++++++..............
-***++++++++++++++++++++++............
-***++++++++++++++++++++++++......... ...... ...........
-****++++++++++++++++++++++++++...... .........................
-*****++++++++++++++++++++++++++++...............................
-*****++++++++++++++++++++++++++++++++............................
-******+++++++++++++++++++++++++++++++++++...........................
-*******+++++++++++++++++++++++++++++++++++++++.......................
-********+++++++++++++++++++++++++++++++++++++++++++..................
-Evaluated to 0.000000
-ready&gt; <b>mandel(-0.9, -1.4, 0.02, 0.03);</b>
-*******************************************************************************
-*******************************************************************************
-*******************************************************************************
-**********+++++++++++++++++++++************************************************
-*+++++++++++++++++++++++++++++++++++++++***************************************
-+++++++++++++++++++++++++++++++++++++++++++++**********************************
-++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
-++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
-+++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
-+++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
-+++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
-+++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
-++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
-+++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
-++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
-++++++++++++++++++++++++............. .......++++++++++++++++++++++******
-+++++++++++++++++++++++............. ........+++++++++++++++++++++++****
-++++++++++++++++++++++........... ..........++++++++++++++++++++++***
-++++++++++++++++++++........... .........++++++++++++++++++++++*
-++++++++++++++++++............ ...........++++++++++++++++++++
-++++++++++++++++............... .............++++++++++++++++++
-++++++++++++++................. ...............++++++++++++++++
-++++++++++++.................. .................++++++++++++++
-+++++++++.................. .................+++++++++++++
-++++++........ . ......... ..++++++++++++
-++............ ...... ....++++++++++
-.............. ...++++++++++
-.............. ....+++++++++
-.............. .....++++++++
-............. ......++++++++
-........... .......++++++++
-......... ........+++++++
-......... ........+++++++
-......... ....+++++++
-........ ...+++++++
-....... ...+++++++
- ....+++++++
- .....+++++++
- ....+++++++
- ....+++++++
- ....+++++++
-Evaluated to 0.000000
-ready&gt; <b>^D</b>
-</pre>
-</div>
-
-<p>At this point, you may be starting to realize that Kaleidoscope is a real
-and powerful language. It may not be self-similar :), but it can be used to
-plot things that are!</p>
-
-<p>With this, we conclude the "adding user-defined operators" chapter of the
-tutorial. We have successfully augmented our language, adding the ability to
-extend the language in the library, and we have shown how this can be used to
-build a simple but interesting end-user application in Kaleidoscope. At this
-point, Kaleidoscope can build a variety of applications that are functional and
-can call functions with side-effects, but it can't actually define and mutate a
-variable itself.</p>
-
-<p>Strikingly, variable mutation is an important feature of some
-languages, and it is not at all obvious how to <a href="OCamlLangImpl7.html">add
-support for mutable variables</a> without having to add an "SSA construction"
-phase to your front-end. In the next chapter, we will describe how you can
-add variable mutation without building SSA in your front-end.</p>
-
-</div>
-
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for our running example, enhanced with the
-if/then/else and for expressions.. To build this example, use:
-</p>
-
-<div class="doc_code">
-<pre>
-# Compile
-ocamlbuild toy.byte
-# Run
-./toy.byte
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<dl>
-<dt>_tags:</dt>
-<dd class="doc_code">
-<pre>
-&lt;{lexer,parser}.ml&gt;: use_camlp4, pp(camlp4of)
-&lt;*.{byte,native}&gt;: g++, use_llvm, use_llvm_analysis
-&lt;*.{byte,native}&gt;: use_llvm_executionengine, use_llvm_target
-&lt;*.{byte,native}&gt;: use_llvm_scalar_opts, use_bindings
-</pre>
-</dd>
-
-<dt>myocamlbuild.ml:</dt>
-<dd class="doc_code">
-<pre>
-open Ocamlbuild_plugin;;
-
-ocaml_lib ~extern:true "llvm";;
-ocaml_lib ~extern:true "llvm_analysis";;
-ocaml_lib ~extern:true "llvm_executionengine";;
-ocaml_lib ~extern:true "llvm_target";;
-ocaml_lib ~extern:true "llvm_scalar_opts";;
-
-flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);;
-dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];;
-</pre>
-</dd>
-
-<dt>token.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer Tokens
- *===----------------------------------------------------------------------===*)
-
-(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
- * these others for known things. *)
-type token =
- (* commands *)
- | Def | Extern
-
- (* primary *)
- | Ident of string | Number of float
-
- (* unknown *)
- | Kwd of char
-
- (* control *)
- | If | Then | Else
- | For | In
-
- (* operators *)
- | Binary | Unary
-</pre>
-</dd>
-
-<dt>lexer.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer
- *===----------------------------------------------------------------------===*)
-
-let rec lex = parser
- (* Skip any whitespace. *)
- | [&lt; ' (' ' | '\n' | '\r' | '\t'); stream &gt;] -&gt; lex stream
-
- (* identifier: [a-zA-Z][a-zA-Z0-9] *)
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_ident buffer stream
-
- (* number: [0-9.]+ *)
- | [&lt; ' ('0' .. '9' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_number buffer stream
-
- (* Comment until end of line. *)
- | [&lt; ' ('#'); stream &gt;] -&gt;
- lex_comment stream
-
- (* Otherwise, just return the character as its ascii value. *)
- | [&lt; 'c; stream &gt;] -&gt;
- [&lt; 'Token.Kwd c; lex stream &gt;]
-
- (* end of stream. *)
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-
-and lex_number buffer = parser
- | [&lt; ' ('0' .. '9' | '.' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_number buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- [&lt; 'Token.Number (float_of_string (Buffer.contents buffer)); stream &gt;]
-
-and lex_ident buffer = parser
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_ident buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- match Buffer.contents buffer with
- | "def" -&gt; [&lt; 'Token.Def; stream &gt;]
- | "extern" -&gt; [&lt; 'Token.Extern; stream &gt;]
- | "if" -&gt; [&lt; 'Token.If; stream &gt;]
- | "then" -&gt; [&lt; 'Token.Then; stream &gt;]
- | "else" -&gt; [&lt; 'Token.Else; stream &gt;]
- | "for" -&gt; [&lt; 'Token.For; stream &gt;]
- | "in" -&gt; [&lt; 'Token.In; stream &gt;]
- | "binary" -&gt; [&lt; 'Token.Binary; stream &gt;]
- | "unary" -&gt; [&lt; 'Token.Unary; stream &gt;]
- | id -&gt; [&lt; 'Token.Ident id; stream &gt;]
-
-and lex_comment = parser
- | [&lt; ' ('\n'); stream=lex &gt;] -&gt; stream
- | [&lt; 'c; e=lex_comment &gt;] -&gt; e
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-</pre>
-</dd>
-
-<dt>ast.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Abstract Syntax Tree (aka Parse Tree)
- *===----------------------------------------------------------------------===*)
-
-(* expr - Base type for all expression nodes. *)
-type expr =
- (* variant for numeric literals like "1.0". *)
- | Number of float
-
- (* variant for referencing a variable, like "a". *)
- | Variable of string
-
- (* variant for a unary operator. *)
- | Unary of char * expr
-
- (* variant for a binary operator. *)
- | Binary of char * expr * expr
-
- (* variant for function calls. *)
- | Call of string * expr array
-
- (* variant for if/then/else. *)
- | If of expr * expr * expr
-
- (* variant for for/in. *)
- | For of string * expr * expr * expr option * expr
-
-(* proto - This type represents the "prototype" for a function, which captures
- * its name, and its argument names (thus implicitly the number of arguments the
- * function takes). *)
-type proto =
- | Prototype of string * string array
- | BinOpPrototype of string * string array * int
-
-(* func - This type represents a function definition itself. *)
-type func = Function of proto * expr
-</pre>
-</dd>
-
-<dt>parser.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===---------------------------------------------------------------------===
- * Parser
- *===---------------------------------------------------------------------===*)
-
-(* binop_precedence - This holds the precedence for each binary operator that is
- * defined *)
-let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
-
-(* precedence - Get the precedence of the pending binary operator token. *)
-let precedence c = try Hashtbl.find binop_precedence c with Not_found -&gt; -1
-
-(* primary
- * ::= identifier
- * ::= numberexpr
- * ::= parenexpr
- * ::= ifexpr
- * ::= forexpr *)
-let rec parse_primary = parser
- (* numberexpr ::= number *)
- | [&lt; 'Token.Number n &gt;] -&gt; Ast.Number n
-
- (* parenexpr ::= '(' expression ')' *)
- | [&lt; 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" &gt;] -&gt; e
-
- (* identifierexpr
- * ::= identifier
- * ::= identifier '(' argumentexpr ')' *)
- | [&lt; 'Token.Ident id; stream &gt;] -&gt;
- let rec parse_args accumulator = parser
- | [&lt; e=parse_expr; stream &gt;] -&gt;
- begin parser
- | [&lt; 'Token.Kwd ','; e=parse_args (e :: accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; e :: accumulator
- end stream
- | [&lt; &gt;] -&gt; accumulator
- in
- let rec parse_ident id = parser
- (* Call. *)
- | [&lt; 'Token.Kwd '(';
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')'"&gt;] -&gt;
- Ast.Call (id, Array.of_list (List.rev args))
-
- (* Simple variable ref. *)
- | [&lt; &gt;] -&gt; Ast.Variable id
- in
- parse_ident id stream
-
- (* ifexpr ::= 'if' expr 'then' expr 'else' expr *)
- | [&lt; 'Token.If; c=parse_expr;
- 'Token.Then ?? "expected 'then'"; t=parse_expr;
- 'Token.Else ?? "expected 'else'"; e=parse_expr &gt;] -&gt;
- Ast.If (c, t, e)
-
- (* forexpr
- ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *)
- | [&lt; 'Token.For;
- 'Token.Ident id ?? "expected identifier after for";
- 'Token.Kwd '=' ?? "expected '=' after for";
- stream &gt;] -&gt;
- begin parser
- | [&lt;
- start=parse_expr;
- 'Token.Kwd ',' ?? "expected ',' after for";
- end_=parse_expr;
- stream &gt;] -&gt;
- let step =
- begin parser
- | [&lt; 'Token.Kwd ','; step=parse_expr &gt;] -&gt; Some step
- | [&lt; &gt;] -&gt; None
- end stream
- in
- begin parser
- | [&lt; 'Token.In; body=parse_expr &gt;] -&gt;
- Ast.For (id, start, end_, step, body)
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected 'in' after for")
- end stream
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected '=' after for")
- end stream
-
- | [&lt; &gt;] -&gt; raise (Stream.Error "unknown token when expecting an expression.")
-
-(* unary
- * ::= primary
- * ::= '!' unary *)
-and parse_unary = parser
- (* If this is a unary operator, read it. *)
- | [&lt; 'Token.Kwd op when op != '(' &amp;&amp; op != ')'; operand=parse_expr &gt;] -&gt;
- Ast.Unary (op, operand)
-
- (* If the current token is not an operator, it must be a primary expr. *)
- | [&lt; stream &gt;] -&gt; parse_primary stream
-
-(* binoprhs
- * ::= ('+' primary)* *)
-and parse_bin_rhs expr_prec lhs stream =
- match Stream.peek stream with
- (* If this is a binop, find its precedence. *)
- | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -&gt;
- let token_prec = precedence c in
-
- (* If this is a binop that binds at least as tightly as the current binop,
- * consume it, otherwise we are done. *)
- if token_prec &lt; expr_prec then lhs else begin
- (* Eat the binop. *)
- Stream.junk stream;
-
- (* Parse the unary expression after the binary operator. *)
- let rhs = parse_unary stream in
-
- (* Okay, we know this is a binop. *)
- let rhs =
- match Stream.peek stream with
- | Some (Token.Kwd c2) -&gt;
- (* If BinOp binds less tightly with rhs than the operator after
- * rhs, let the pending operator take rhs as its lhs. *)
- let next_prec = precedence c2 in
- if token_prec &lt; next_prec
- then parse_bin_rhs (token_prec + 1) rhs stream
- else rhs
- | _ -&gt; rhs
- in
-
- (* Merge lhs/rhs. *)
- let lhs = Ast.Binary (c, lhs, rhs) in
- parse_bin_rhs expr_prec lhs stream
- end
- | _ -&gt; lhs
-
-(* expression
- * ::= primary binoprhs *)
-and parse_expr = parser
- | [&lt; lhs=parse_unary; stream &gt;] -&gt; parse_bin_rhs 0 lhs stream
-
-(* prototype
- * ::= id '(' id* ')'
- * ::= binary LETTER number? (id, id)
- * ::= unary LETTER number? (id) *)
-let parse_prototype =
- let rec parse_args accumulator = parser
- | [&lt; 'Token.Ident id; e=parse_args (id::accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
- in
- let parse_operator = parser
- | [&lt; 'Token.Unary &gt;] -&gt; "unary", 1
- | [&lt; 'Token.Binary &gt;] -&gt; "binary", 2
- in
- let parse_binary_precedence = parser
- | [&lt; 'Token.Number n &gt;] -&gt; int_of_float n
- | [&lt; &gt;] -&gt; 30
- in
- parser
- | [&lt; 'Token.Ident id;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- (* success. *)
- Ast.Prototype (id, Array.of_list (List.rev args))
- | [&lt; (prefix, kind)=parse_operator;
- 'Token.Kwd op ?? "expected an operator";
- (* Read the precedence if present. *)
- binary_precedence=parse_binary_precedence;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- let name = prefix ^ (String.make 1 op) in
- let args = Array.of_list (List.rev args) in
-
- (* Verify right number of arguments for operator. *)
- if Array.length args != kind
- then raise (Stream.Error "invalid number of operands for operator")
- else
- if kind == 1 then
- Ast.Prototype (name, args)
- else
- Ast.BinOpPrototype (name, args, binary_precedence)
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected function name in prototype")
-
-(* definition ::= 'def' prototype expression *)
-let parse_definition = parser
- | [&lt; 'Token.Def; p=parse_prototype; e=parse_expr &gt;] -&gt;
- Ast.Function (p, e)
-
-(* toplevelexpr ::= expression *)
-let parse_toplevel = parser
- | [&lt; e=parse_expr &gt;] -&gt;
- (* Make an anonymous proto. *)
- Ast.Function (Ast.Prototype ("", [||]), e)
-
-(* external ::= 'extern' prototype *)
-let parse_extern = parser
- | [&lt; 'Token.Extern; e=parse_prototype &gt;] -&gt; e
-</pre>
-</dd>
-
-<dt>codegen.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Code Generation
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-
-exception Error of string
-
-let context = global_context ()
-let the_module = create_module context "my cool jit"
-let builder = builder context
-let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
-let double_type = double_type context
-
-let rec codegen_expr = function
- | Ast.Number n -&gt; const_float double_type n
- | Ast.Variable name -&gt;
- (try Hashtbl.find named_values name with
- | Not_found -&gt; raise (Error "unknown variable name"))
- | Ast.Unary (op, operand) -&gt;
- let operand = codegen_expr operand in
- let callee = "unary" ^ (String.make 1 op) in
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "unknown unary operator")
- in
- build_call callee [|operand|] "unop" builder
- | Ast.Binary (op, lhs, rhs) -&gt;
- let lhs_val = codegen_expr lhs in
- let rhs_val = codegen_expr rhs in
- begin
- match op with
- | '+' -&gt; build_add lhs_val rhs_val "addtmp" builder
- | '-' -&gt; build_sub lhs_val rhs_val "subtmp" builder
- | '*' -&gt; build_mul lhs_val rhs_val "multmp" builder
- | '&lt;' -&gt;
- (* Convert bool 0/1 to double 0.0 or 1.0 *)
- let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
- build_uitofp i double_type "booltmp" builder
- | _ -&gt;
- (* If it wasn't a builtin binary operator, it must be a user defined
- * one. Emit a call to it. *)
- let callee = "binary" ^ (String.make 1 op) in
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "binary operator not found!")
- in
- build_call callee [|lhs_val; rhs_val|] "binop" builder
- end
- | Ast.Call (callee, args) -&gt;
- (* Look up the name in the module table. *)
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "unknown function referenced")
- in
- let params = params callee in
-
- (* If argument mismatch error. *)
- if Array.length params == Array.length args then () else
- raise (Error "incorrect # arguments passed");
- let args = Array.map codegen_expr args in
- build_call callee args "calltmp" builder
- | Ast.If (cond, then_, else_) -&gt;
- let cond = codegen_expr cond in
-
- (* Convert condition to a bool by comparing equal to 0.0 *)
- let zero = const_float double_type 0.0 in
- let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in
-
- (* Grab the first block so that we might later add the conditional branch
- * to it at the end of the function. *)
- let start_bb = insertion_block builder in
- let the_function = block_parent start_bb in
-
- let then_bb = append_block context "then" the_function in
-
- (* Emit 'then' value. *)
- position_at_end then_bb builder;
- let then_val = codegen_expr then_ in
-
- (* Codegen of 'then' can change the current block, update then_bb for the
- * phi. We create a new name because one is used for the phi node, and the
- * other is used for the conditional branch. *)
- let new_then_bb = insertion_block builder in
-
- (* Emit 'else' value. *)
- let else_bb = append_block context "else" the_function in
- position_at_end else_bb builder;
- let else_val = codegen_expr else_ in
-
- (* Codegen of 'else' can change the current block, update else_bb for the
- * phi. *)
- let new_else_bb = insertion_block builder in
-
- (* Emit merge block. *)
- let merge_bb = append_block context "ifcont" the_function in
- position_at_end merge_bb builder;
- let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in
- let phi = build_phi incoming "iftmp" builder in
-
- (* Return to the start block to add the conditional branch. *)
- position_at_end start_bb builder;
- ignore (build_cond_br cond_val then_bb else_bb builder);
-
- (* Set a unconditional branch at the end of the 'then' block and the
- * 'else' block to the 'merge' block. *)
- position_at_end new_then_bb builder; ignore (build_br merge_bb builder);
- position_at_end new_else_bb builder; ignore (build_br merge_bb builder);
-
- (* Finally, set the builder to the end of the merge block. *)
- position_at_end merge_bb builder;
-
- phi
- | Ast.For (var_name, start, end_, step, body) -&gt;
- (* Emit the start code first, without 'variable' in scope. *)
- let start_val = codegen_expr start in
-
- (* Make the new basic block for the loop header, inserting after current
- * block. *)
- let preheader_bb = insertion_block builder in
- let the_function = block_parent preheader_bb in
- let loop_bb = append_block context "loop" the_function in
-
- (* Insert an explicit fall through from the current block to the
- * loop_bb. *)
- ignore (build_br loop_bb builder);
-
- (* Start insertion in loop_bb. *)
- position_at_end loop_bb builder;
-
- (* Start the PHI node with an entry for start. *)
- let variable = build_phi [(start_val, preheader_bb)] var_name builder in
-
- (* Within the loop, the variable is defined equal to the PHI node. If it
- * shadows an existing variable, we have to restore it, so save it
- * now. *)
- let old_val =
- try Some (Hashtbl.find named_values var_name) with Not_found -&gt; None
- in
- Hashtbl.add named_values var_name variable;
-
- (* Emit the body of the loop. This, like any other expr, can change the
- * current BB. Note that we ignore the value computed by the body, but
- * don't allow an error *)
- ignore (codegen_expr body);
-
- (* Emit the step value. *)
- let step_val =
- match step with
- | Some step -&gt; codegen_expr step
- (* If not specified, use 1.0. *)
- | None -&gt; const_float double_type 1.0
- in
-
- let next_var = build_add variable step_val "nextvar" builder in
-
- (* Compute the end condition. *)
- let end_cond = codegen_expr end_ in
-
- (* Convert condition to a bool by comparing equal to 0.0. *)
- let zero = const_float double_type 0.0 in
- let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in
-
- (* Create the "after loop" block and insert it. *)
- let loop_end_bb = insertion_block builder in
- let after_bb = append_block context "afterloop" the_function in
-
- (* Insert the conditional branch into the end of loop_end_bb. *)
- ignore (build_cond_br end_cond loop_bb after_bb builder);
-
- (* Any new code will be inserted in after_bb. *)
- position_at_end after_bb builder;
-
- (* Add a new entry to the PHI node for the backedge. *)
- add_incoming (next_var, loop_end_bb) variable;
-
- (* Restore the unshadowed variable. *)
- begin match old_val with
- | Some old_val -&gt; Hashtbl.add named_values var_name old_val
- | None -&gt; ()
- end;
-
- (* for expr always returns 0.0. *)
- const_null double_type
-
-let codegen_proto = function
- | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -&gt;
- (* Make the function type: double(double,double) etc. *)
- let doubles = Array.make (Array.length args) double_type in
- let ft = function_type double_type doubles in
- let f =
- match lookup_function name the_module with
- | None -&gt; declare_function name ft the_module
-
- (* If 'f' conflicted, there was already something named 'name'. If it
- * has a body, don't allow redefinition or reextern. *)
- | Some f -&gt;
- (* If 'f' already has a body, reject this. *)
- if block_begin f &lt;&gt; At_end f then
- raise (Error "redefinition of function");
-
- (* If 'f' took a different number of arguments, reject. *)
- if element_type (type_of f) &lt;&gt; ft then
- raise (Error "redefinition of function with different # args");
- f
- in
-
- (* Set names for all arguments. *)
- Array.iteri (fun i a -&gt;
- let n = args.(i) in
- set_value_name n a;
- Hashtbl.add named_values n a;
- ) (params f);
- f
-
-let codegen_func the_fpm = function
- | Ast.Function (proto, body) -&gt;
- Hashtbl.clear named_values;
- let the_function = codegen_proto proto in
-
- (* If this is an operator, install it. *)
- begin match proto with
- | Ast.BinOpPrototype (name, args, prec) -&gt;
- let op = name.[String.length name - 1] in
- Hashtbl.add Parser.binop_precedence op prec;
- | _ -&gt; ()
- end;
-
- (* Create a new basic block to start insertion into. *)
- let bb = append_block context "entry" the_function in
- position_at_end bb builder;
-
- try
- let ret_val = codegen_expr body in
-
- (* Finish off the function. *)
- let _ = build_ret ret_val builder in
-
- (* Validate the generated code, checking for consistency. *)
- Llvm_analysis.assert_valid_function the_function;
-
- (* Optimize the function. *)
- let _ = PassManager.run_function the_function the_fpm in
-
- the_function
- with e -&gt;
- delete_function the_function;
- raise e
-</pre>
-</dd>
-
-<dt>toplevel.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Top-Level parsing and JIT Driver
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-open Llvm_executionengine
-
-(* top ::= definition | external | expression | ';' *)
-let rec main_loop the_fpm the_execution_engine stream =
- match Stream.peek stream with
- | None -&gt; ()
-
- (* ignore top-level semicolons. *)
- | Some (Token.Kwd ';') -&gt;
- Stream.junk stream;
- main_loop the_fpm the_execution_engine stream
-
- | Some token -&gt;
- begin
- try match token with
- | Token.Def -&gt;
- let e = Parser.parse_definition stream in
- print_endline "parsed a function definition.";
- dump_value (Codegen.codegen_func the_fpm e);
- | Token.Extern -&gt;
- let e = Parser.parse_extern stream in
- print_endline "parsed an extern.";
- dump_value (Codegen.codegen_proto e);
- | _ -&gt;
- (* Evaluate a top-level expression into an anonymous function. *)
- let e = Parser.parse_toplevel stream in
- print_endline "parsed a top-level expr";
- let the_function = Codegen.codegen_func the_fpm e in
- dump_value the_function;
-
- (* JIT the function, returning a function pointer. *)
- let result = ExecutionEngine.run_function the_function [||]
- the_execution_engine in
-
- print_string "Evaluated to ";
- print_float (GenericValue.as_float Codegen.double_type result);
- print_newline ();
- with Stream.Error s | Codegen.Error s -&gt;
- (* Skip token for error recovery. *)
- Stream.junk stream;
- print_endline s;
- end;
- print_string "ready&gt; "; flush stdout;
- main_loop the_fpm the_execution_engine stream
-</pre>
-</dd>
-
-<dt>toy.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Main driver code.
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-open Llvm_executionengine
-open Llvm_target
-open Llvm_scalar_opts
-
-let main () =
- ignore (initialize_native_target ());
-
- (* Install standard binary operators.
- * 1 is the lowest precedence. *)
- Hashtbl.add Parser.binop_precedence '&lt;' 10;
- Hashtbl.add Parser.binop_precedence '+' 20;
- Hashtbl.add Parser.binop_precedence '-' 20;
- Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
-
- (* Prime the first token. *)
- print_string "ready&gt; "; flush stdout;
- let stream = Lexer.lex (Stream.of_channel stdin) in
-
- (* Create the JIT. *)
- let the_execution_engine = ExecutionEngine.create Codegen.the_module in
- let the_fpm = PassManager.create_function Codegen.the_module in
-
- (* Set up the optimizer pipeline. Start with registering info about how the
- * target lays out data structures. *)
- TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
-
- (* Do simple "peephole" optimizations and bit-twiddling optzn. *)
- add_instruction_combination the_fpm;
-
- (* reassociate expressions. *)
- add_reassociation the_fpm;
-
- (* Eliminate Common SubExpressions. *)
- add_gvn the_fpm;
-
- (* Simplify the control flow graph (deleting unreachable blocks, etc). *)
- add_cfg_simplification the_fpm;
-
- ignore (PassManager.initialize the_fpm);
-
- (* Run the main "interpreter loop" now. *)
- Toplevel.main_loop the_fpm the_execution_engine stream;
-
- (* Print out all the generated code. *)
- dump_module Codegen.the_module
-;;
-
-main ()
-</pre>
-</dd>
-
-<dt>bindings.c</dt>
-<dd class="doc_code">
-<pre>
-#include &lt;stdio.h&gt;
-
-/* putchard - putchar that takes a double and returns 0. */
-extern double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-
-/* printd - printf that takes a double prints it as "%f\n", returning 0. */
-extern double printd(double X) {
- printf("%f\n", X);
- return 0;
-}
-</pre>
-</dd>
-</dl>
-
-<a href="OCamlLangImpl7.html">Next: Extending the language: mutable variables /
-SSA construction</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/OCamlLangImpl7.html b/docs/tutorial/OCamlLangImpl7.html
deleted file mode 100644
index c140888..0000000
--- a/docs/tutorial/OCamlLangImpl7.html
+++ /dev/null
@@ -1,1907 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-
-<html>
-<head>
- <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
- construction</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Chris Lattner">
- <meta name="author" content="Erick Tryzelaar">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
-
-<ul>
-<li><a href="index.html">Up to Tutorial Index</a></li>
-<li>Chapter 7
- <ol>
- <li><a href="#intro">Chapter 7 Introduction</a></li>
- <li><a href="#why">Why is this a hard problem?</a></li>
- <li><a href="#memory">Memory in LLVM</a></li>
- <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
- <li><a href="#adjustments">Adjusting Existing Variables for
- Mutation</a></li>
- <li><a href="#assignment">New Assignment Operator</a></li>
- <li><a href="#localvars">User-defined Local Variables</a></li>
- <li><a href="#code">Full Code Listing</a></li>
- </ol>
-</li>
-<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
- tidbits</li>
-</ul>
-
-<div class="doc_author">
- <p>
- Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
- and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a>
- </p>
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
-with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very
-respectable, albeit simple, <a
-href="http://en.wikipedia.org/wiki/Functional_programming">functional
-programming language</a>. In our journey, we learned some parsing techniques,
-how to build and represent an AST, how to build LLVM IR, and how to optimize
-the resultant code as well as JIT compile it.</p>
-
-<p>While Kaleidoscope is interesting as a functional language, the fact that it
-is functional makes it "too easy" to generate LLVM IR for it. In particular, a
-functional language makes it very easy to build LLVM IR directly in <a
-href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
-Since LLVM requires that the input code be in SSA form, this is a very nice
-property and it is often unclear to newcomers how to generate code for an
-imperative language with mutable variables.</p>
-
-<p>The short (and happy) summary of this chapter is that there is no need for
-your front-end to build SSA form: LLVM provides highly tuned and well tested
-support for this, though the way it works is a bit unexpected for some.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-To understand why mutable variables cause complexities in SSA construction,
-consider this extremely simple C example:
-</p>
-
-<div class="doc_code">
-<pre>
-int G, H;
-int test(_Bool Condition) {
- int X;
- if (Condition)
- X = G;
- else
- X = H;
- return X;
-}
-</pre>
-</div>
-
-<p>In this case, we have the variable "X", whose value depends on the path
-executed in the program. Because there are two different possible values for X
-before the return instruction, a PHI node is inserted to merge the two values.
-The LLVM IR that we want for this example looks like this:</p>
-
-<div class="doc_code">
-<pre>
-@G = weak global i32 0 ; type of @G is i32*
-@H = weak global i32 0 ; type of @H is i32*
-
-define i32 @test(i1 %Condition) {
-entry:
- br i1 %Condition, label %cond_true, label %cond_false
-
-cond_true:
- %X.0 = load i32* @G
- br label %cond_next
-
-cond_false:
- %X.1 = load i32* @H
- br label %cond_next
-
-cond_next:
- %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
- ret i32 %X.2
-}
-</pre>
-</div>
-
-<p>In this example, the loads from the G and H global variables are explicit in
-the LLVM IR, and they live in the then/else branches of the if statement
-(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
-in the cond_next block selects the right value to use based on where control
-flow is coming from: if control flow comes from the cond_false block, X.2 gets
-the value of X.1. Alternatively, if control flow comes from cond_true, it gets
-the value of X.0. The intent of this chapter is not to explain the details of
-SSA form. For more information, see one of the many <a
-href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
-references</a>.</p>
-
-<p>The question for this article is "who places the phi nodes when lowering
-assignments to mutable variables?". The issue here is that LLVM
-<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
-However, SSA construction requires non-trivial algorithms and data structures,
-so it is inconvenient and wasteful for every front-end to have to reproduce this
-logic.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="memory">Memory in LLVM</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>The 'trick' here is that while LLVM does require all register values to be
-in SSA form, it does not require (or permit) memory objects to be in SSA form.
-In the example above, note that the loads from G and H are direct accesses to
-G and H: they are not renamed or versioned. This differs from some other
-compiler systems, which do try to version memory objects. In LLVM, instead of
-encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
-href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
-demand.</p>
-
-<p>
-With this in mind, the high-level idea is that we want to make a stack variable
-(which lives in memory, because it is on the stack) for each mutable object in
-a function. To take advantage of this trick, we need to talk about how LLVM
-represents stack variables.
-</p>
-
-<p>In LLVM, all memory accesses are explicit with load/store instructions, and
-it is carefully designed not to have (or need) an "address-of" operator. Notice
-how the type of the @G/@H global variables is actually "i32*" even though the
-variable is defined as "i32". What this means is that @G defines <em>space</em>
-for an i32 in the global data area, but its <em>name</em> actually refers to the
-address for that space. Stack variables work the same way, except that instead of
-being declared with global variable definitions, they are declared with the
-<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
-
-<div class="doc_code">
-<pre>
-define i32 @example() {
-entry:
- %X = alloca i32 ; type of %X is i32*.
- ...
- %tmp = load i32* %X ; load the stack value %X from the stack.
- %tmp2 = add i32 %tmp, 1 ; increment it
- store i32 %tmp2, i32* %X ; store it back
- ...
-</pre>
-</div>
-
-<p>This code shows an example of how you can declare and manipulate a stack
-variable in the LLVM IR. Stack memory allocated with the alloca instruction is
-fully general: you can pass the address of the stack slot to functions, you can
-store it in other variables, etc. In our example above, we could rewrite the
-example to use the alloca technique to avoid using a PHI node:</p>
-
-<div class="doc_code">
-<pre>
-@G = weak global i32 0 ; type of @G is i32*
-@H = weak global i32 0 ; type of @H is i32*
-
-define i32 @test(i1 %Condition) {
-entry:
- %X = alloca i32 ; type of %X is i32*.
- br i1 %Condition, label %cond_true, label %cond_false
-
-cond_true:
- %X.0 = load i32* @G
- store i32 %X.0, i32* %X ; Update X
- br label %cond_next
-
-cond_false:
- %X.1 = load i32* @H
- store i32 %X.1, i32* %X ; Update X
- br label %cond_next
-
-cond_next:
- %X.2 = load i32* %X ; Read X
- ret i32 %X.2
-}
-</pre>
-</div>
-
-<p>With this, we have discovered a way to handle arbitrary mutable variables
-without the need to create Phi nodes at all:</p>
-
-<ol>
-<li>Each mutable variable becomes a stack allocation.</li>
-<li>Each read of the variable becomes a load from the stack.</li>
-<li>Each update of the variable becomes a store to the stack.</li>
-<li>Taking the address of a variable just uses the stack address directly.</li>
-</ol>
-
-<p>While this solution has solved our immediate problem, it introduced another
-one: we have now apparently introduced a lot of stack traffic for very simple
-and common operations, a major performance problem. Fortunately for us, the
-LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
-this case, promoting allocas like this into SSA registers, inserting Phi nodes
-as appropriate. If you run this example through the pass, for example, you'll
-get:</p>
-
-<div class="doc_code">
-<pre>
-$ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
-@G = weak global i32 0
-@H = weak global i32 0
-
-define i32 @test(i1 %Condition) {
-entry:
- br i1 %Condition, label %cond_true, label %cond_false
-
-cond_true:
- %X.0 = load i32* @G
- br label %cond_next
-
-cond_false:
- %X.1 = load i32* @H
- br label %cond_next
-
-cond_next:
- %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
- ret i32 %X.01
-}
-</pre>
-</div>
-
-<p>The mem2reg pass implements the standard "iterated dominance frontier"
-algorithm for constructing SSA form and has a number of optimizations that speed
-up (very common) degenerate cases. The mem2reg optimization pass is the answer
-to dealing with mutable variables, and we highly recommend that you depend on
-it. Note that mem2reg only works on variables in certain circumstances:</p>
-
-<ol>
-<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
-promotes them. It does not apply to global variables or heap allocations.</li>
-
-<li>mem2reg only looks for alloca instructions in the entry block of the
-function. Being in the entry block guarantees that the alloca is only executed
-once, which makes analysis simpler.</li>
-
-<li>mem2reg only promotes allocas whose uses are direct loads and stores. If
-the address of the stack object is passed to a function, or if any funny pointer
-arithmetic is involved, the alloca will not be promoted.</li>
-
-<li>mem2reg only works on allocas of <a
-href="../LangRef.html#t_classifications">first class</a>
-values (such as pointers, scalars and vectors), and only if the array size
-of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
-promoting structs or arrays to registers. Note that the "scalarrepl" pass is
-more powerful and can promote structs, "unions", and arrays in many cases.</li>
-
-</ol>
-
-<p>
-All of these properties are easy to satisfy for most imperative languages, and
-we'll illustrate it below with Kaleidoscope. The final question you may be
-asking is: should I bother with this nonsense for my front-end? Wouldn't it be
-better if I just did SSA construction directly, avoiding use of the mem2reg
-optimization pass? In short, we strongly recommend that you use this technique
-for building SSA form, unless there is an extremely good reason not to. Using
-this technique is:</p>
-
-<ul>
-<li>Proven and well tested: llvm-gcc and clang both use this technique for local
-mutable variables. As such, the most common clients of LLVM are using this to
-handle a bulk of their variables. You can be sure that bugs are found fast and
-fixed early.</li>
-
-<li>Extremely Fast: mem2reg has a number of special cases that make it fast in
-common cases as well as fully general. For example, it has fast-paths for
-variables that are only used in a single block, variables that only have one
-assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
-</li>
-
-<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
-Debug information in LLVM</a> relies on having the address of the variable
-exposed so that debug info can be attached to it. This technique dovetails
-very naturally with this style of debug info.</li>
-</ul>
-
-<p>If nothing else, this makes it much easier to get your front-end up and
-running, and is very simple to implement. Lets extend Kaleidoscope with mutable
-variables now!
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="kalvars">Mutable Variables in
-Kaleidoscope</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Now that we know the sort of problem we want to tackle, lets see what this
-looks like in the context of our little Kaleidoscope language. We're going to
-add two features:</p>
-
-<ol>
-<li>The ability to mutate variables with the '=' operator.</li>
-<li>The ability to define new variables.</li>
-</ol>
-
-<p>While the first item is really what this is about, we only have variables
-for incoming arguments as well as for induction variables, and redefining those only
-goes so far :). Also, the ability to define new variables is a
-useful thing regardless of whether you will be mutating them. Here's a
-motivating example that shows how we could use these:</p>
-
-<div class="doc_code">
-<pre>
-# Define ':' for sequencing: as a low-precedence operator that ignores operands
-# and just returns the RHS.
-def binary : 1 (x y) y;
-
-# Recursive fib, we could do this before.
-def fib(x)
- if (x &lt; 3) then
- 1
- else
- fib(x-1)+fib(x-2);
-
-# Iterative fib.
-def fibi(x)
- <b>var a = 1, b = 1, c in</b>
- (for i = 3, i &lt; x in
- <b>c = a + b</b> :
- <b>a = b</b> :
- <b>b = c</b>) :
- b;
-
-# Call it.
-fibi(10);
-</pre>
-</div>
-
-<p>
-In order to mutate variables, we have to change our existing variables to use
-the "alloca trick". Once we have that, we'll add our new operator, then extend
-Kaleidoscope to support new variable definitions.
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
-Mutation</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-The symbol table in Kaleidoscope is managed at code generation time by the
-'<tt>named_values</tt>' map. This map currently keeps track of the LLVM
-"Value*" that holds the double value for the named variable. In order to
-support mutation, we need to change this slightly, so that it
-<tt>named_values</tt> holds the <em>memory location</em> of the variable in
-question. Note that this change is a refactoring: it changes the structure of
-the code, but does not (by itself) change the behavior of the compiler. All of
-these changes are isolated in the Kaleidoscope code generator.</p>
-
-<p>
-At this point in Kaleidoscope's development, it only supports variables for two
-things: incoming arguments to functions and the induction variable of 'for'
-loops. For consistency, we'll allow mutation of these variables in addition to
-other user-defined variables. This means that these will both need memory
-locations.
-</p>
-
-<p>To start our transformation of Kaleidoscope, we'll change the
-<tt>named_values</tt> map so that it maps to AllocaInst* instead of Value*.
-Once we do this, the C++ compiler will tell us what parts of the code we need to
-update:</p>
-
-<p><b>Note:</b> the ocaml bindings currently model both <tt>Value*</tt>s and
-<tt>AllocInst*</tt>s as <tt>Llvm.llvalue</tt>s, but this may change in the
-future to be more type safe.</p>
-
-<div class="doc_code">
-<pre>
-let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
-</pre>
-</div>
-
-<p>Also, since we will need to create these alloca's, we'll use a helper
-function that ensures that the allocas are created in the entry block of the
-function:</p>
-
-<div class="doc_code">
-<pre>
-(* Create an alloca instruction in the entry block of the function. This
- * is used for mutable variables etc. *)
-let create_entry_block_alloca the_function var_name =
- let builder = builder_at (instr_begin (entry_block the_function)) in
- build_alloca double_type var_name builder
-</pre>
-</div>
-
-<p>This funny looking code creates an <tt>Llvm.llbuilder</tt> object that is
-pointing at the first instruction of the entry block. It then creates an alloca
-with the expected name and returns it. Because all values in Kaleidoscope are
-doubles, there is no need to pass in a type to use.</p>
-
-<p>With this in place, the first functionality change we want to make is to
-variable references. In our new scheme, variables live on the stack, so code
-generating a reference to them actually needs to produce a load from the stack
-slot:</p>
-
-<div class="doc_code">
-<pre>
-let rec codegen_expr = function
- ...
- | Ast.Variable name -&gt;
- let v = try Hashtbl.find named_values name with
- | Not_found -&gt; raise (Error "unknown variable name")
- in
- <b>(* Load the value. *)
- build_load v name builder</b>
-</pre>
-</div>
-
-<p>As you can see, this is pretty straightforward. Now we need to update the
-things that define the variables to set up the alloca. We'll start with
-<tt>codegen_expr Ast.For ...</tt> (see the <a href="#code">full code listing</a>
-for the unabridged code):</p>
-
-<div class="doc_code">
-<pre>
- | Ast.For (var_name, start, end_, step, body) -&gt;
- let the_function = block_parent (insertion_block builder) in
-
- (* Create an alloca for the variable in the entry block. *)
- <b>let alloca = create_entry_block_alloca the_function var_name in</b>
-
- (* Emit the start code first, without 'variable' in scope. *)
- let start_val = codegen_expr start in
-
- <b>(* Store the value into the alloca. *)
- ignore(build_store start_val alloca builder);</b>
-
- ...
-
- (* Within the loop, the variable is defined equal to the PHI node. If it
- * shadows an existing variable, we have to restore it, so save it
- * now. *)
- let old_val =
- try Some (Hashtbl.find named_values var_name) with Not_found -&gt; None
- in
- <b>Hashtbl.add named_values var_name alloca;</b>
-
- ...
-
- (* Compute the end condition. *)
- let end_cond = codegen_expr end_ in
-
- <b>(* Reload, increment, and restore the alloca. This handles the case where
- * the body of the loop mutates the variable. *)
- let cur_var = build_load alloca var_name builder in
- let next_var = build_add cur_var step_val "nextvar" builder in
- ignore(build_store next_var alloca builder);</b>
- ...
-</pre>
-</div>
-
-<p>This code is virtually identical to the code <a
-href="OCamlLangImpl5.html#forcodegen">before we allowed mutable variables</a>.
-The big difference is that we no longer have to construct a PHI node, and we use
-load/store to access the variable as needed.</p>
-
-<p>To support mutable argument variables, we need to also make allocas for them.
-The code for this is also pretty simple:</p>
-
-<div class="doc_code">
-<pre>
-(* Create an alloca for each argument and register the argument in the symbol
- * table so that references to it will succeed. *)
-let create_argument_allocas the_function proto =
- let args = match proto with
- | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -&gt; args
- in
- Array.iteri (fun i ai -&gt;
- let var_name = args.(i) in
- (* Create an alloca for this variable. *)
- let alloca = create_entry_block_alloca the_function var_name in
-
- (* Store the initial value into the alloca. *)
- ignore(build_store ai alloca builder);
-
- (* Add arguments to variable symbol table. *)
- Hashtbl.add named_values var_name alloca;
- ) (params the_function)
-</pre>
-</div>
-
-<p>For each argument, we make an alloca, store the input value to the function
-into the alloca, and register the alloca as the memory location for the
-argument. This method gets invoked by <tt>Codegen.codegen_func</tt> right after
-it sets up the entry block for the function.</p>
-
-<p>The final missing piece is adding the mem2reg pass, which allows us to get
-good codegen once again:</p>
-
-<div class="doc_code">
-<pre>
-let main () =
- ...
- let the_fpm = PassManager.create_function Codegen.the_module in
-
- (* Set up the optimizer pipeline. Start with registering info about how the
- * target lays out data structures. *)
- TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
-
- <b>(* Promote allocas to registers. *)
- add_memory_to_register_promotion the_fpm;</b>
-
- (* Do simple "peephole" optimizations and bit-twiddling optzn. *)
- add_instruction_combining the_fpm;
-
- (* reassociate expressions. *)
- add_reassociation the_fpm;
-</pre>
-</div>
-
-<p>It is interesting to see what the code looks like before and after the
-mem2reg optimization runs. For example, this is the before/after code for our
-recursive fib function. Before the optimization:</p>
-
-<div class="doc_code">
-<pre>
-define double @fib(double %x) {
-entry:
- <b>%x1 = alloca double
- store double %x, double* %x1
- %x2 = load double* %x1</b>
- %cmptmp = fcmp ult double %x2, 3.000000e+00
- %booltmp = uitofp i1 %cmptmp to double
- %ifcond = fcmp one double %booltmp, 0.000000e+00
- br i1 %ifcond, label %then, label %else
-
-then: ; preds = %entry
- br label %ifcont
-
-else: ; preds = %entry
- <b>%x3 = load double* %x1</b>
- %subtmp = fsub double %x3, 1.000000e+00
- %calltmp = call double @fib( double %subtmp )
- <b>%x4 = load double* %x1</b>
- %subtmp5 = fsub double %x4, 2.000000e+00
- %calltmp6 = call double @fib( double %subtmp5 )
- %addtmp = fadd double %calltmp, %calltmp6
- br label %ifcont
-
-ifcont: ; preds = %else, %then
- %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
- ret double %iftmp
-}
-</pre>
-</div>
-
-<p>Here there is only one variable (x, the input argument) but you can still
-see the extremely simple-minded code generation strategy we are using. In the
-entry block, an alloca is created, and the initial input value is stored into
-it. Each reference to the variable does a reload from the stack. Also, note
-that we didn't modify the if/then/else expression, so it still inserts a PHI
-node. While we could make an alloca for it, it is actually easier to create a
-PHI node for it, so we still just make the PHI.</p>
-
-<p>Here is the code after the mem2reg pass runs:</p>
-
-<div class="doc_code">
-<pre>
-define double @fib(double %x) {
-entry:
- %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
- %booltmp = uitofp i1 %cmptmp to double
- %ifcond = fcmp one double %booltmp, 0.000000e+00
- br i1 %ifcond, label %then, label %else
-
-then:
- br label %ifcont
-
-else:
- %subtmp = fsub double <b>%x</b>, 1.000000e+00
- %calltmp = call double @fib( double %subtmp )
- %subtmp5 = fsub double <b>%x</b>, 2.000000e+00
- %calltmp6 = call double @fib( double %subtmp5 )
- %addtmp = fadd double %calltmp, %calltmp6
- br label %ifcont
-
-ifcont: ; preds = %else, %then
- %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
- ret double %iftmp
-}
-</pre>
-</div>
-
-<p>This is a trivial case for mem2reg, since there are no redefinitions of the
-variable. The point of showing this is to calm your tension about inserting
-such blatent inefficiencies :).</p>
-
-<p>After the rest of the optimizers run, we get:</p>
-
-<div class="doc_code">
-<pre>
-define double @fib(double %x) {
-entry:
- %cmptmp = fcmp ult double %x, 3.000000e+00
- %booltmp = uitofp i1 %cmptmp to double
- %ifcond = fcmp ueq double %booltmp, 0.000000e+00
- br i1 %ifcond, label %else, label %ifcont
-
-else:
- %subtmp = fsub double %x, 1.000000e+00
- %calltmp = call double @fib( double %subtmp )
- %subtmp5 = fsub double %x, 2.000000e+00
- %calltmp6 = call double @fib( double %subtmp5 )
- %addtmp = fadd double %calltmp, %calltmp6
- ret double %addtmp
-
-ifcont:
- ret double 1.000000e+00
-}
-</pre>
-</div>
-
-<p>Here we see that the simplifycfg pass decided to clone the return instruction
-into the end of the 'else' block. This allowed it to eliminate some branches
-and the PHI node.</p>
-
-<p>Now that all symbol table references are updated to use stack variables,
-we'll add the assignment operator.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>With our current framework, adding a new assignment operator is really
-simple. We will parse it just like any other binary operator, but handle it
-internally (instead of allowing the user to define it). The first step is to
-set a precedence:</p>
-
-<div class="doc_code">
-<pre>
-let main () =
- (* Install standard binary operators.
- * 1 is the lowest precedence. *)
- <b>Hashtbl.add Parser.binop_precedence '=' 2;</b>
- Hashtbl.add Parser.binop_precedence '&lt;' 10;
- Hashtbl.add Parser.binop_precedence '+' 20;
- Hashtbl.add Parser.binop_precedence '-' 20;
- ...
-</pre>
-</div>
-
-<p>Now that the parser knows the precedence of the binary operator, it takes
-care of all the parsing and AST generation. We just need to implement codegen
-for the assignment operator. This looks like:</p>
-
-<div class="doc_code">
-<pre>
-let rec codegen_expr = function
- begin match op with
- | '=' -&gt;
- (* Special case '=' because we don't want to emit the LHS as an
- * expression. *)
- let name =
- match lhs with
- | Ast.Variable name -&gt; name
- | _ -&gt; raise (Error "destination of '=' must be a variable")
- in
-</pre>
-</div>
-
-<p>Unlike the rest of the binary operators, our assignment operator doesn't
-follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
-as a special case before the other binary operators are handled. The other
-strange thing is that it requires the LHS to be a variable. It is invalid to
-have "(x+1) = expr" - only things like "x = expr" are allowed.
-</p>
-
-
-<div class="doc_code">
-<pre>
- (* Codegen the rhs. *)
- let val_ = codegen_expr rhs in
-
- (* Lookup the name. *)
- let variable = try Hashtbl.find named_values name with
- | Not_found -&gt; raise (Error "unknown variable name")
- in
- ignore(build_store val_ variable builder);
- val_
- | _ -&gt;
- ...
-</pre>
-</div>
-
-<p>Once we have the variable, codegen'ing the assignment is straightforward:
-we emit the RHS of the assignment, create a store, and return the computed
-value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
-
-<p>Now that we have an assignment operator, we can mutate loop variables and
-arguments. For example, we can now run code like this:</p>
-
-<div class="doc_code">
-<pre>
-# Function to print a double.
-extern printd(x);
-
-# Define ':' for sequencing: as a low-precedence operator that ignores operands
-# and just returns the RHS.
-def binary : 1 (x y) y;
-
-def test(x)
- printd(x) :
- x = 4 :
- printd(x);
-
-test(123);
-</pre>
-</div>
-
-<p>When run, this example prints "123" and then "4", showing that we did
-actually mutate the value! Okay, we have now officially implemented our goal:
-getting this to work requires SSA construction in the general case. However,
-to be really useful, we want the ability to define our own local variables, lets
-add this next!
-</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="localvars">User-defined Local
-Variables</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>Adding var/in is just like any other other extensions we made to
-Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
-The first step for adding our new 'var/in' construct is to extend the lexer.
-As before, this is pretty trivial, the code looks like this:</p>
-
-<div class="doc_code">
-<pre>
-type token =
- ...
- <b>(* var definition *)
- | Var</b>
-
-...
-
-and lex_ident buffer = parser
- ...
- | "in" -&gt; [&lt; 'Token.In; stream &gt;]
- | "binary" -&gt; [&lt; 'Token.Binary; stream &gt;]
- | "unary" -&gt; [&lt; 'Token.Unary; stream &gt;]
- <b>| "var" -&gt; [&lt; 'Token.Var; stream &gt;]</b>
- ...
-</pre>
-</div>
-
-<p>The next step is to define the AST node that we will construct. For var/in,
-it looks like this:</p>
-
-<div class="doc_code">
-<pre>
-type expr =
- ...
- (* variant for var/in. *)
- | Var of (string * expr option) array * expr
- ...
-</pre>
-</div>
-
-<p>var/in allows a list of names to be defined all at once, and each name can
-optionally have an initializer value. As such, we capture this information in
-the VarNames vector. Also, var/in has a body, this body is allowed to access
-the variables defined by the var/in.</p>
-
-<p>With this in place, we can define the parser pieces. The first thing we do
-is add it as a primary expression:</p>
-
-<div class="doc_code">
-<pre>
-(* primary
- * ::= identifier
- * ::= numberexpr
- * ::= parenexpr
- * ::= ifexpr
- * ::= forexpr
- <b>* ::= varexpr</b> *)
-let rec parse_primary = parser
- ...
- <b>(* varexpr
- * ::= 'var' identifier ('=' expression?
- * (',' identifier ('=' expression)?)* 'in' expression *)
- | [&lt; 'Token.Var;
- (* At least one variable name is required. *)
- 'Token.Ident id ?? "expected identifier after var";
- init=parse_var_init;
- var_names=parse_var_names [(id, init)];
- (* At this point, we have to have 'in'. *)
- 'Token.In ?? "expected 'in' keyword after 'var'";
- body=parse_expr &gt;] -&gt;
- Ast.Var (Array.of_list (List.rev var_names), body)</b>
-
-...
-
-and parse_var_init = parser
- (* read in the optional initializer. *)
- | [&lt; 'Token.Kwd '='; e=parse_expr &gt;] -&gt; Some e
- | [&lt; &gt;] -&gt; None
-
-and parse_var_names accumulator = parser
- | [&lt; 'Token.Kwd ',';
- 'Token.Ident id ?? "expected identifier list after var";
- init=parse_var_init;
- e=parse_var_names ((id, init) :: accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
-</pre>
-</div>
-
-<p>Now that we can parse and represent the code, we need to support emission of
-LLVM IR for it. This code starts out with:</p>
-
-<div class="doc_code">
-<pre>
-let rec codegen_expr = function
- ...
- | Ast.Var (var_names, body)
- let old_bindings = ref [] in
-
- let the_function = block_parent (insertion_block builder) in
-
- (* Register all variables and emit their initializer. *)
- Array.iter (fun (var_name, init) -&gt;
-</pre>
-</div>
-
-<p>Basically it loops over all the variables, installing them one at a time.
-For each variable we put into the symbol table, we remember the previous value
-that we replace in OldBindings.</p>
-
-<div class="doc_code">
-<pre>
- (* Emit the initializer before adding the variable to scope, this
- * prevents the initializer from referencing the variable itself, and
- * permits stuff like this:
- * var a = 1 in
- * var a = a in ... # refers to outer 'a'. *)
- let init_val =
- match init with
- | Some init -&gt; codegen_expr init
- (* If not specified, use 0.0. *)
- | None -&gt; const_float double_type 0.0
- in
-
- let alloca = create_entry_block_alloca the_function var_name in
- ignore(build_store init_val alloca builder);
-
- (* Remember the old variable binding so that we can restore the binding
- * when we unrecurse. *)
-
- begin
- try
- let old_value = Hashtbl.find named_values var_name in
- old_bindings := (var_name, old_value) :: !old_bindings;
- with Not_found &gt; ()
- end;
-
- (* Remember this binding. *)
- Hashtbl.add named_values var_name alloca;
- ) var_names;
-</pre>
-</div>
-
-<p>There are more comments here than code. The basic idea is that we emit the
-initializer, create the alloca, then update the symbol table to point to it.
-Once all the variables are installed in the symbol table, we evaluate the body
-of the var/in expression:</p>
-
-<div class="doc_code">
-<pre>
- (* Codegen the body, now that all vars are in scope. *)
- let body_val = codegen_expr body in
-</pre>
-</div>
-
-<p>Finally, before returning, we restore the previous variable bindings:</p>
-
-<div class="doc_code">
-<pre>
- (* Pop all our variables from scope. *)
- List.iter (fun (var_name, old_value) -&gt;
- Hashtbl.add named_values var_name old_value
- ) !old_bindings;
-
- (* Return the body computation. *)
- body_val
-</pre>
-</div>
-
-<p>The end result of all of this is that we get properly scoped variable
-definitions, and we even (trivially) allow mutation of them :).</p>
-
-<p>With this, we completed what we set out to do. Our nice iterative fib
-example from the intro compiles and runs just fine. The mem2reg pass optimizes
-all of our stack variables into SSA registers, inserting PHI nodes where needed,
-and our front-end remains simple: no "iterated dominance frontier" computation
-anywhere in sight.</p>
-
-</div>
-
-<!-- *********************************************************************** -->
-<div class="doc_section"><a name="code">Full Code Listing</a></div>
-<!-- *********************************************************************** -->
-
-<div class="doc_text">
-
-<p>
-Here is the complete code listing for our running example, enhanced with mutable
-variables and var/in support. To build this example, use:
-</p>
-
-<div class="doc_code">
-<pre>
-# Compile
-ocamlbuild toy.byte
-# Run
-./toy.byte
-</pre>
-</div>
-
-<p>Here is the code:</p>
-
-<dl>
-<dt>_tags:</dt>
-<dd class="doc_code">
-<pre>
-&lt;{lexer,parser}.ml&gt;: use_camlp4, pp(camlp4of)
-&lt;*.{byte,native}&gt;: g++, use_llvm, use_llvm_analysis
-&lt;*.{byte,native}&gt;: use_llvm_executionengine, use_llvm_target
-&lt;*.{byte,native}&gt;: use_llvm_scalar_opts, use_bindings
-</pre>
-</dd>
-
-<dt>myocamlbuild.ml:</dt>
-<dd class="doc_code">
-<pre>
-open Ocamlbuild_plugin;;
-
-ocaml_lib ~extern:true "llvm";;
-ocaml_lib ~extern:true "llvm_analysis";;
-ocaml_lib ~extern:true "llvm_executionengine";;
-ocaml_lib ~extern:true "llvm_target";;
-ocaml_lib ~extern:true "llvm_scalar_opts";;
-
-flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);;
-dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];;
-</pre>
-</dd>
-
-<dt>token.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer Tokens
- *===----------------------------------------------------------------------===*)
-
-(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
- * these others for known things. *)
-type token =
- (* commands *)
- | Def | Extern
-
- (* primary *)
- | Ident of string | Number of float
-
- (* unknown *)
- | Kwd of char
-
- (* control *)
- | If | Then | Else
- | For | In
-
- (* operators *)
- | Binary | Unary
-
- (* var definition *)
- | Var
-</pre>
-</dd>
-
-<dt>lexer.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Lexer
- *===----------------------------------------------------------------------===*)
-
-let rec lex = parser
- (* Skip any whitespace. *)
- | [&lt; ' (' ' | '\n' | '\r' | '\t'); stream &gt;] -&gt; lex stream
-
- (* identifier: [a-zA-Z][a-zA-Z0-9] *)
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_ident buffer stream
-
- (* number: [0-9.]+ *)
- | [&lt; ' ('0' .. '9' as c); stream &gt;] -&gt;
- let buffer = Buffer.create 1 in
- Buffer.add_char buffer c;
- lex_number buffer stream
-
- (* Comment until end of line. *)
- | [&lt; ' ('#'); stream &gt;] -&gt;
- lex_comment stream
-
- (* Otherwise, just return the character as its ascii value. *)
- | [&lt; 'c; stream &gt;] -&gt;
- [&lt; 'Token.Kwd c; lex stream &gt;]
-
- (* end of stream. *)
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-
-and lex_number buffer = parser
- | [&lt; ' ('0' .. '9' | '.' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_number buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- [&lt; 'Token.Number (float_of_string (Buffer.contents buffer)); stream &gt;]
-
-and lex_ident buffer = parser
- | [&lt; ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream &gt;] -&gt;
- Buffer.add_char buffer c;
- lex_ident buffer stream
- | [&lt; stream=lex &gt;] -&gt;
- match Buffer.contents buffer with
- | "def" -&gt; [&lt; 'Token.Def; stream &gt;]
- | "extern" -&gt; [&lt; 'Token.Extern; stream &gt;]
- | "if" -&gt; [&lt; 'Token.If; stream &gt;]
- | "then" -&gt; [&lt; 'Token.Then; stream &gt;]
- | "else" -&gt; [&lt; 'Token.Else; stream &gt;]
- | "for" -&gt; [&lt; 'Token.For; stream &gt;]
- | "in" -&gt; [&lt; 'Token.In; stream &gt;]
- | "binary" -&gt; [&lt; 'Token.Binary; stream &gt;]
- | "unary" -&gt; [&lt; 'Token.Unary; stream &gt;]
- | "var" -&gt; [&lt; 'Token.Var; stream &gt;]
- | id -&gt; [&lt; 'Token.Ident id; stream &gt;]
-
-and lex_comment = parser
- | [&lt; ' ('\n'); stream=lex &gt;] -&gt; stream
- | [&lt; 'c; e=lex_comment &gt;] -&gt; e
- | [&lt; &gt;] -&gt; [&lt; &gt;]
-</pre>
-</dd>
-
-<dt>ast.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Abstract Syntax Tree (aka Parse Tree)
- *===----------------------------------------------------------------------===*)
-
-(* expr - Base type for all expression nodes. *)
-type expr =
- (* variant for numeric literals like "1.0". *)
- | Number of float
-
- (* variant for referencing a variable, like "a". *)
- | Variable of string
-
- (* variant for a unary operator. *)
- | Unary of char * expr
-
- (* variant for a binary operator. *)
- | Binary of char * expr * expr
-
- (* variant for function calls. *)
- | Call of string * expr array
-
- (* variant for if/then/else. *)
- | If of expr * expr * expr
-
- (* variant for for/in. *)
- | For of string * expr * expr * expr option * expr
-
- (* variant for var/in. *)
- | Var of (string * expr option) array * expr
-
-(* proto - This type represents the "prototype" for a function, which captures
- * its name, and its argument names (thus implicitly the number of arguments the
- * function takes). *)
-type proto =
- | Prototype of string * string array
- | BinOpPrototype of string * string array * int
-
-(* func - This type represents a function definition itself. *)
-type func = Function of proto * expr
-</pre>
-</dd>
-
-<dt>parser.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===---------------------------------------------------------------------===
- * Parser
- *===---------------------------------------------------------------------===*)
-
-(* binop_precedence - This holds the precedence for each binary operator that is
- * defined *)
-let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
-
-(* precedence - Get the precedence of the pending binary operator token. *)
-let precedence c = try Hashtbl.find binop_precedence c with Not_found -&gt; -1
-
-(* primary
- * ::= identifier
- * ::= numberexpr
- * ::= parenexpr
- * ::= ifexpr
- * ::= forexpr
- * ::= varexpr *)
-let rec parse_primary = parser
- (* numberexpr ::= number *)
- | [&lt; 'Token.Number n &gt;] -&gt; Ast.Number n
-
- (* parenexpr ::= '(' expression ')' *)
- | [&lt; 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" &gt;] -&gt; e
-
- (* identifierexpr
- * ::= identifier
- * ::= identifier '(' argumentexpr ')' *)
- | [&lt; 'Token.Ident id; stream &gt;] -&gt;
- let rec parse_args accumulator = parser
- | [&lt; e=parse_expr; stream &gt;] -&gt;
- begin parser
- | [&lt; 'Token.Kwd ','; e=parse_args (e :: accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; e :: accumulator
- end stream
- | [&lt; &gt;] -&gt; accumulator
- in
- let rec parse_ident id = parser
- (* Call. *)
- | [&lt; 'Token.Kwd '(';
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')'"&gt;] -&gt;
- Ast.Call (id, Array.of_list (List.rev args))
-
- (* Simple variable ref. *)
- | [&lt; &gt;] -&gt; Ast.Variable id
- in
- parse_ident id stream
-
- (* ifexpr ::= 'if' expr 'then' expr 'else' expr *)
- | [&lt; 'Token.If; c=parse_expr;
- 'Token.Then ?? "expected 'then'"; t=parse_expr;
- 'Token.Else ?? "expected 'else'"; e=parse_expr &gt;] -&gt;
- Ast.If (c, t, e)
-
- (* forexpr
- ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *)
- | [&lt; 'Token.For;
- 'Token.Ident id ?? "expected identifier after for";
- 'Token.Kwd '=' ?? "expected '=' after for";
- stream &gt;] -&gt;
- begin parser
- | [&lt;
- start=parse_expr;
- 'Token.Kwd ',' ?? "expected ',' after for";
- end_=parse_expr;
- stream &gt;] -&gt;
- let step =
- begin parser
- | [&lt; 'Token.Kwd ','; step=parse_expr &gt;] -&gt; Some step
- | [&lt; &gt;] -&gt; None
- end stream
- in
- begin parser
- | [&lt; 'Token.In; body=parse_expr &gt;] -&gt;
- Ast.For (id, start, end_, step, body)
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected 'in' after for")
- end stream
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected '=' after for")
- end stream
-
- (* varexpr
- * ::= 'var' identifier ('=' expression?
- * (',' identifier ('=' expression)?)* 'in' expression *)
- | [&lt; 'Token.Var;
- (* At least one variable name is required. *)
- 'Token.Ident id ?? "expected identifier after var";
- init=parse_var_init;
- var_names=parse_var_names [(id, init)];
- (* At this point, we have to have 'in'. *)
- 'Token.In ?? "expected 'in' keyword after 'var'";
- body=parse_expr &gt;] -&gt;
- Ast.Var (Array.of_list (List.rev var_names), body)
-
- | [&lt; &gt;] -&gt; raise (Stream.Error "unknown token when expecting an expression.")
-
-(* unary
- * ::= primary
- * ::= '!' unary *)
-and parse_unary = parser
- (* If this is a unary operator, read it. *)
- | [&lt; 'Token.Kwd op when op != '(' &amp;&amp; op != ')'; operand=parse_expr &gt;] -&gt;
- Ast.Unary (op, operand)
-
- (* If the current token is not an operator, it must be a primary expr. *)
- | [&lt; stream &gt;] -&gt; parse_primary stream
-
-(* binoprhs
- * ::= ('+' primary)* *)
-and parse_bin_rhs expr_prec lhs stream =
- match Stream.peek stream with
- (* If this is a binop, find its precedence. *)
- | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -&gt;
- let token_prec = precedence c in
-
- (* If this is a binop that binds at least as tightly as the current binop,
- * consume it, otherwise we are done. *)
- if token_prec &lt; expr_prec then lhs else begin
- (* Eat the binop. *)
- Stream.junk stream;
-
- (* Parse the primary expression after the binary operator. *)
- let rhs = parse_unary stream in
-
- (* Okay, we know this is a binop. *)
- let rhs =
- match Stream.peek stream with
- | Some (Token.Kwd c2) -&gt;
- (* If BinOp binds less tightly with rhs than the operator after
- * rhs, let the pending operator take rhs as its lhs. *)
- let next_prec = precedence c2 in
- if token_prec &lt; next_prec
- then parse_bin_rhs (token_prec + 1) rhs stream
- else rhs
- | _ -&gt; rhs
- in
-
- (* Merge lhs/rhs. *)
- let lhs = Ast.Binary (c, lhs, rhs) in
- parse_bin_rhs expr_prec lhs stream
- end
- | _ -&gt; lhs
-
-and parse_var_init = parser
- (* read in the optional initializer. *)
- | [&lt; 'Token.Kwd '='; e=parse_expr &gt;] -&gt; Some e
- | [&lt; &gt;] -&gt; None
-
-and parse_var_names accumulator = parser
- | [&lt; 'Token.Kwd ',';
- 'Token.Ident id ?? "expected identifier list after var";
- init=parse_var_init;
- e=parse_var_names ((id, init) :: accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
-
-(* expression
- * ::= primary binoprhs *)
-and parse_expr = parser
- | [&lt; lhs=parse_unary; stream &gt;] -&gt; parse_bin_rhs 0 lhs stream
-
-(* prototype
- * ::= id '(' id* ')'
- * ::= binary LETTER number? (id, id)
- * ::= unary LETTER number? (id) *)
-let parse_prototype =
- let rec parse_args accumulator = parser
- | [&lt; 'Token.Ident id; e=parse_args (id::accumulator) &gt;] -&gt; e
- | [&lt; &gt;] -&gt; accumulator
- in
- let parse_operator = parser
- | [&lt; 'Token.Unary &gt;] -&gt; "unary", 1
- | [&lt; 'Token.Binary &gt;] -&gt; "binary", 2
- in
- let parse_binary_precedence = parser
- | [&lt; 'Token.Number n &gt;] -&gt; int_of_float n
- | [&lt; &gt;] -&gt; 30
- in
- parser
- | [&lt; 'Token.Ident id;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- (* success. *)
- Ast.Prototype (id, Array.of_list (List.rev args))
- | [&lt; (prefix, kind)=parse_operator;
- 'Token.Kwd op ?? "expected an operator";
- (* Read the precedence if present. *)
- binary_precedence=parse_binary_precedence;
- 'Token.Kwd '(' ?? "expected '(' in prototype";
- args=parse_args [];
- 'Token.Kwd ')' ?? "expected ')' in prototype" &gt;] -&gt;
- let name = prefix ^ (String.make 1 op) in
- let args = Array.of_list (List.rev args) in
-
- (* Verify right number of arguments for operator. *)
- if Array.length args != kind
- then raise (Stream.Error "invalid number of operands for operator")
- else
- if kind == 1 then
- Ast.Prototype (name, args)
- else
- Ast.BinOpPrototype (name, args, binary_precedence)
- | [&lt; &gt;] -&gt;
- raise (Stream.Error "expected function name in prototype")
-
-(* definition ::= 'def' prototype expression *)
-let parse_definition = parser
- | [&lt; 'Token.Def; p=parse_prototype; e=parse_expr &gt;] -&gt;
- Ast.Function (p, e)
-
-(* toplevelexpr ::= expression *)
-let parse_toplevel = parser
- | [&lt; e=parse_expr &gt;] -&gt;
- (* Make an anonymous proto. *)
- Ast.Function (Ast.Prototype ("", [||]), e)
-
-(* external ::= 'extern' prototype *)
-let parse_extern = parser
- | [&lt; 'Token.Extern; e=parse_prototype &gt;] -&gt; e
-</pre>
-</dd>
-
-<dt>codegen.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Code Generation
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-
-exception Error of string
-
-let context = global_context ()
-let the_module = create_module context "my cool jit"
-let builder = builder context
-let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
-let double_type = double_type context
-
-(* Create an alloca instruction in the entry block of the function. This
- * is used for mutable variables etc. *)
-let create_entry_block_alloca the_function var_name =
- let builder = builder_at context (instr_begin (entry_block the_function)) in
- build_alloca double_type var_name builder
-
-let rec codegen_expr = function
- | Ast.Number n -&gt; const_float double_type n
- | Ast.Variable name -&gt;
- let v = try Hashtbl.find named_values name with
- | Not_found -&gt; raise (Error "unknown variable name")
- in
- (* Load the value. *)
- build_load v name builder
- | Ast.Unary (op, operand) -&gt;
- let operand = codegen_expr operand in
- let callee = "unary" ^ (String.make 1 op) in
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "unknown unary operator")
- in
- build_call callee [|operand|] "unop" builder
- | Ast.Binary (op, lhs, rhs) -&gt;
- begin match op with
- | '=' -&gt;
- (* Special case '=' because we don't want to emit the LHS as an
- * expression. *)
- let name =
- match lhs with
- | Ast.Variable name -&gt; name
- | _ -&gt; raise (Error "destination of '=' must be a variable")
- in
-
- (* Codegen the rhs. *)
- let val_ = codegen_expr rhs in
-
- (* Lookup the name. *)
- let variable = try Hashtbl.find named_values name with
- | Not_found -&gt; raise (Error "unknown variable name")
- in
- ignore(build_store val_ variable builder);
- val_
- | _ -&gt;
- let lhs_val = codegen_expr lhs in
- let rhs_val = codegen_expr rhs in
- begin
- match op with
- | '+' -&gt; build_add lhs_val rhs_val "addtmp" builder
- | '-' -&gt; build_sub lhs_val rhs_val "subtmp" builder
- | '*' -&gt; build_mul lhs_val rhs_val "multmp" builder
- | '&lt;' -&gt;
- (* Convert bool 0/1 to double 0.0 or 1.0 *)
- let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
- build_uitofp i double_type "booltmp" builder
- | _ -&gt;
- (* If it wasn't a builtin binary operator, it must be a user defined
- * one. Emit a call to it. *)
- let callee = "binary" ^ (String.make 1 op) in
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "binary operator not found!")
- in
- build_call callee [|lhs_val; rhs_val|] "binop" builder
- end
- end
- | Ast.Call (callee, args) -&gt;
- (* Look up the name in the module table. *)
- let callee =
- match lookup_function callee the_module with
- | Some callee -&gt; callee
- | None -&gt; raise (Error "unknown function referenced")
- in
- let params = params callee in
-
- (* If argument mismatch error. *)
- if Array.length params == Array.length args then () else
- raise (Error "incorrect # arguments passed");
- let args = Array.map codegen_expr args in
- build_call callee args "calltmp" builder
- | Ast.If (cond, then_, else_) -&gt;
- let cond = codegen_expr cond in
-
- (* Convert condition to a bool by comparing equal to 0.0 *)
- let zero = const_float double_type 0.0 in
- let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in
-
- (* Grab the first block so that we might later add the conditional branch
- * to it at the end of the function. *)
- let start_bb = insertion_block builder in
- let the_function = block_parent start_bb in
-
- let then_bb = append_block context "then" the_function in
-
- (* Emit 'then' value. *)
- position_at_end then_bb builder;
- let then_val = codegen_expr then_ in
-
- (* Codegen of 'then' can change the current block, update then_bb for the
- * phi. We create a new name because one is used for the phi node, and the
- * other is used for the conditional branch. *)
- let new_then_bb = insertion_block builder in
-
- (* Emit 'else' value. *)
- let else_bb = append_block context "else" the_function in
- position_at_end else_bb builder;
- let else_val = codegen_expr else_ in
-
- (* Codegen of 'else' can change the current block, update else_bb for the
- * phi. *)
- let new_else_bb = insertion_block builder in
-
- (* Emit merge block. *)
- let merge_bb = append_block context "ifcont" the_function in
- position_at_end merge_bb builder;
- let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in
- let phi = build_phi incoming "iftmp" builder in
-
- (* Return to the start block to add the conditional branch. *)
- position_at_end start_bb builder;
- ignore (build_cond_br cond_val then_bb else_bb builder);
-
- (* Set a unconditional branch at the end of the 'then' block and the
- * 'else' block to the 'merge' block. *)
- position_at_end new_then_bb builder; ignore (build_br merge_bb builder);
- position_at_end new_else_bb builder; ignore (build_br merge_bb builder);
-
- (* Finally, set the builder to the end of the merge block. *)
- position_at_end merge_bb builder;
-
- phi
- | Ast.For (var_name, start, end_, step, body) -&gt;
- (* Output this as:
- * var = alloca double
- * ...
- * start = startexpr
- * store start -&gt; var
- * goto loop
- * loop:
- * ...
- * bodyexpr
- * ...
- * loopend:
- * step = stepexpr
- * endcond = endexpr
- *
- * curvar = load var
- * nextvar = curvar + step
- * store nextvar -&gt; var
- * br endcond, loop, endloop
- * outloop: *)
-
- let the_function = block_parent (insertion_block builder) in
-
- (* Create an alloca for the variable in the entry block. *)
- let alloca = create_entry_block_alloca the_function var_name in
-
- (* Emit the start code first, without 'variable' in scope. *)
- let start_val = codegen_expr start in
-
- (* Store the value into the alloca. *)
- ignore(build_store start_val alloca builder);
-
- (* Make the new basic block for the loop header, inserting after current
- * block. *)
- let loop_bb = append_block context "loop" the_function in
-
- (* Insert an explicit fall through from the current block to the
- * loop_bb. *)
- ignore (build_br loop_bb builder);
-
- (* Start insertion in loop_bb. *)
- position_at_end loop_bb builder;
-
- (* Within the loop, the variable is defined equal to the PHI node. If it
- * shadows an existing variable, we have to restore it, so save it
- * now. *)
- let old_val =
- try Some (Hashtbl.find named_values var_name) with Not_found -&gt; None
- in
- Hashtbl.add named_values var_name alloca;
-
- (* Emit the body of the loop. This, like any other expr, can change the
- * current BB. Note that we ignore the value computed by the body, but
- * don't allow an error *)
- ignore (codegen_expr body);
-
- (* Emit the step value. *)
- let step_val =
- match step with
- | Some step -&gt; codegen_expr step
- (* If not specified, use 1.0. *)
- | None -&gt; const_float double_type 1.0
- in
-
- (* Compute the end condition. *)
- let end_cond = codegen_expr end_ in
-
- (* Reload, increment, and restore the alloca. This handles the case where
- * the body of the loop mutates the variable. *)
- let cur_var = build_load alloca var_name builder in
- let next_var = build_add cur_var step_val "nextvar" builder in
- ignore(build_store next_var alloca builder);
-
- (* Convert condition to a bool by comparing equal to 0.0. *)
- let zero = const_float double_type 0.0 in
- let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in
-
- (* Create the "after loop" block and insert it. *)
- let after_bb = append_block context "afterloop" the_function in
-
- (* Insert the conditional branch into the end of loop_end_bb. *)
- ignore (build_cond_br end_cond loop_bb after_bb builder);
-
- (* Any new code will be inserted in after_bb. *)
- position_at_end after_bb builder;
-
- (* Restore the unshadowed variable. *)
- begin match old_val with
- | Some old_val -&gt; Hashtbl.add named_values var_name old_val
- | None -&gt; ()
- end;
-
- (* for expr always returns 0.0. *)
- const_null double_type
- | Ast.Var (var_names, body) -&gt;
- let old_bindings = ref [] in
-
- let the_function = block_parent (insertion_block builder) in
-
- (* Register all variables and emit their initializer. *)
- Array.iter (fun (var_name, init) -&gt;
- (* Emit the initializer before adding the variable to scope, this
- * prevents the initializer from referencing the variable itself, and
- * permits stuff like this:
- * var a = 1 in
- * var a = a in ... # refers to outer 'a'. *)
- let init_val =
- match init with
- | Some init -&gt; codegen_expr init
- (* If not specified, use 0.0. *)
- | None -&gt; const_float double_type 0.0
- in
-
- let alloca = create_entry_block_alloca the_function var_name in
- ignore(build_store init_val alloca builder);
-
- (* Remember the old variable binding so that we can restore the binding
- * when we unrecurse. *)
- begin
- try
- let old_value = Hashtbl.find named_values var_name in
- old_bindings := (var_name, old_value) :: !old_bindings;
- with Not_found -&gt; ()
- end;
-
- (* Remember this binding. *)
- Hashtbl.add named_values var_name alloca;
- ) var_names;
-
- (* Codegen the body, now that all vars are in scope. *)
- let body_val = codegen_expr body in
-
- (* Pop all our variables from scope. *)
- List.iter (fun (var_name, old_value) -&gt;
- Hashtbl.add named_values var_name old_value
- ) !old_bindings;
-
- (* Return the body computation. *)
- body_val
-
-let codegen_proto = function
- | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -&gt;
- (* Make the function type: double(double,double) etc. *)
- let doubles = Array.make (Array.length args) double_type in
- let ft = function_type double_type doubles in
- let f =
- match lookup_function name the_module with
- | None -&gt; declare_function name ft the_module
-
- (* If 'f' conflicted, there was already something named 'name'. If it
- * has a body, don't allow redefinition or reextern. *)
- | Some f -&gt;
- (* If 'f' already has a body, reject this. *)
- if block_begin f &lt;&gt; At_end f then
- raise (Error "redefinition of function");
-
- (* If 'f' took a different number of arguments, reject. *)
- if element_type (type_of f) &lt;&gt; ft then
- raise (Error "redefinition of function with different # args");
- f
- in
-
- (* Set names for all arguments. *)
- Array.iteri (fun i a -&gt;
- let n = args.(i) in
- set_value_name n a;
- Hashtbl.add named_values n a;
- ) (params f);
- f
-
-(* Create an alloca for each argument and register the argument in the symbol
- * table so that references to it will succeed. *)
-let create_argument_allocas the_function proto =
- let args = match proto with
- | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -&gt; args
- in
- Array.iteri (fun i ai -&gt;
- let var_name = args.(i) in
- (* Create an alloca for this variable. *)
- let alloca = create_entry_block_alloca the_function var_name in
-
- (* Store the initial value into the alloca. *)
- ignore(build_store ai alloca builder);
-
- (* Add arguments to variable symbol table. *)
- Hashtbl.add named_values var_name alloca;
- ) (params the_function)
-
-let codegen_func the_fpm = function
- | Ast.Function (proto, body) -&gt;
- Hashtbl.clear named_values;
- let the_function = codegen_proto proto in
-
- (* If this is an operator, install it. *)
- begin match proto with
- | Ast.BinOpPrototype (name, args, prec) -&gt;
- let op = name.[String.length name - 1] in
- Hashtbl.add Parser.binop_precedence op prec;
- | _ -&gt; ()
- end;
-
- (* Create a new basic block to start insertion into. *)
- let bb = append_block context "entry" the_function in
- position_at_end bb builder;
-
- try
- (* Add all arguments to the symbol table and create their allocas. *)
- create_argument_allocas the_function proto;
-
- let ret_val = codegen_expr body in
-
- (* Finish off the function. *)
- let _ = build_ret ret_val builder in
-
- (* Validate the generated code, checking for consistency. *)
- Llvm_analysis.assert_valid_function the_function;
-
- (* Optimize the function. *)
- let _ = PassManager.run_function the_function the_fpm in
-
- the_function
- with e -&gt;
- delete_function the_function;
- raise e
-</pre>
-</dd>
-
-<dt>toplevel.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Top-Level parsing and JIT Driver
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-open Llvm_executionengine
-
-(* top ::= definition | external | expression | ';' *)
-let rec main_loop the_fpm the_execution_engine stream =
- match Stream.peek stream with
- | None -&gt; ()
-
- (* ignore top-level semicolons. *)
- | Some (Token.Kwd ';') -&gt;
- Stream.junk stream;
- main_loop the_fpm the_execution_engine stream
-
- | Some token -&gt;
- begin
- try match token with
- | Token.Def -&gt;
- let e = Parser.parse_definition stream in
- print_endline "parsed a function definition.";
- dump_value (Codegen.codegen_func the_fpm e);
- | Token.Extern -&gt;
- let e = Parser.parse_extern stream in
- print_endline "parsed an extern.";
- dump_value (Codegen.codegen_proto e);
- | _ -&gt;
- (* Evaluate a top-level expression into an anonymous function. *)
- let e = Parser.parse_toplevel stream in
- print_endline "parsed a top-level expr";
- let the_function = Codegen.codegen_func the_fpm e in
- dump_value the_function;
-
- (* JIT the function, returning a function pointer. *)
- let result = ExecutionEngine.run_function the_function [||]
- the_execution_engine in
-
- print_string "Evaluated to ";
- print_float (GenericValue.as_float Codegen.double_type result);
- print_newline ();
- with Stream.Error s | Codegen.Error s -&gt;
- (* Skip token for error recovery. *)
- Stream.junk stream;
- print_endline s;
- end;
- print_string "ready&gt; "; flush stdout;
- main_loop the_fpm the_execution_engine stream
-</pre>
-</dd>
-
-<dt>toy.ml:</dt>
-<dd class="doc_code">
-<pre>
-(*===----------------------------------------------------------------------===
- * Main driver code.
- *===----------------------------------------------------------------------===*)
-
-open Llvm
-open Llvm_executionengine
-open Llvm_target
-open Llvm_scalar_opts
-
-let main () =
- ignore (initialize_native_target ());
-
- (* Install standard binary operators.
- * 1 is the lowest precedence. *)
- Hashtbl.add Parser.binop_precedence '=' 2;
- Hashtbl.add Parser.binop_precedence '&lt;' 10;
- Hashtbl.add Parser.binop_precedence '+' 20;
- Hashtbl.add Parser.binop_precedence '-' 20;
- Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
-
- (* Prime the first token. *)
- print_string "ready&gt; "; flush stdout;
- let stream = Lexer.lex (Stream.of_channel stdin) in
-
- (* Create the JIT. *)
- let the_execution_engine = ExecutionEngine.create Codegen.the_module in
- let the_fpm = PassManager.create_function Codegen.the_module in
-
- (* Set up the optimizer pipeline. Start with registering info about how the
- * target lays out data structures. *)
- TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
-
- (* Promote allocas to registers. *)
- add_memory_to_register_promotion the_fpm;
-
- (* Do simple "peephole" optimizations and bit-twiddling optzn. *)
- add_instruction_combination the_fpm;
-
- (* reassociate expressions. *)
- add_reassociation the_fpm;
-
- (* Eliminate Common SubExpressions. *)
- add_gvn the_fpm;
-
- (* Simplify the control flow graph (deleting unreachable blocks, etc). *)
- add_cfg_simplification the_fpm;
-
- ignore (PassManager.initialize the_fpm);
-
- (* Run the main "interpreter loop" now. *)
- Toplevel.main_loop the_fpm the_execution_engine stream;
-
- (* Print out all the generated code. *)
- dump_module Codegen.the_module
-;;
-
-main ()
-</pre>
-</dd>
-
-<dt>bindings.c</dt>
-<dd class="doc_code">
-<pre>
-#include &lt;stdio.h&gt;
-
-/* putchard - putchar that takes a double and returns 0. */
-extern double putchard(double X) {
- putchar((char)X);
- return 0;
-}
-
-/* printd - printf that takes a double prints it as "%f\n", returning 0. */
-extern double printd(double X) {
- printf("%f\n", X);
- return 0;
-}
-</pre>
-</dd>
-</dl>
-
-<a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a>
-</div>
-
-<!-- *********************************************************************** -->
-<hr>
-<address>
- <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
- src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
- <a href="http://validator.w3.org/check/referer"><img
- src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
-
- <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
- <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br>
- Last modified: $Date$
-</address>
-</body>
-</html>
diff --git a/docs/tutorial/index.html b/docs/tutorial/index.html
deleted file mode 100644
index 250b533..0000000
--- a/docs/tutorial/index.html
+++ /dev/null
@@ -1,48 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-<html>
-<head>
- <title>LLVM Tutorial: Table of Contents</title>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <meta name="author" content="Owen Anderson">
- <meta name="description"
- content="LLVM Tutorial: Table of Contents.">
- <link rel="stylesheet" href="../llvm.css" type="text/css">
-</head>
-
-<body>
-
-<div class="doc_title"> LLVM Tutorial: Table of Contents </div>
-
-<ol>
- <li>Kaleidoscope: Implementing a Language with LLVM
- <ol>
- <li><a href="LangImpl1.html">Tutorial Introduction and the Lexer</a></li>
- <li><a href="LangImpl2.html">Implementing a Parser and AST</a></li>
- <li><a href="LangImpl3.html">Implementing Code Generation to LLVM IR</a></li>
- <li><a href="LangImpl4.html">Adding JIT and Optimizer Support</a></li>
- <li><a href="LangImpl5.html">Extending the language: control flow</a></li>
- <li><a href="LangImpl6.html">Extending the language: user-defined operators</a></li>
- <li><a href="LangImpl7.html">Extending the language: mutable variables / SSA construction</a></li>
- <li><a href="LangImpl8.html">Conclusion and other useful LLVM tidbits</a></li>
- </ol></li>
- <li>Kaleidoscope: Implementing a Language with LLVM in Objective Caml
- <ol>
- <li><a href="OCamlLangImpl1.html">Tutorial Introduction and the Lexer</a></li>
- <li><a href="OCamlLangImpl2.html">Implementing a Parser and AST</a></li>
- <li><a href="OCamlLangImpl3.html">Implementing Code Generation to LLVM IR</a></li>
- <li><a href="OCamlLangImpl4.html">Adding JIT and Optimizer Support</a></li>
- <li><a href="OCamlLangImpl5.html">Extending the language: control flow</a></li>
- <li><a href="OCamlLangImpl6.html">Extending the language: user-defined operators</a></li>
- <li><a href="OCamlLangImpl7.html">Extending the language: mutable variables / SSA construction</a></li>
- <li><a href="LangImpl8.html">Conclusion and other useful LLVM tidbits</a></li>
- </ol></li>
- <li>Advanced Topics
- <ol>
- <li><a href="http://llvm.org/pubs/2004-09-22-LCPCLLVMTutorial.html">Writing
- an Optimization for LLVM</a></li>
- </ol></li>
-</ol>
-
-</body>
-</html>