diff options
author | mike-m <mikem.llvm@gmail.com> | 2010-05-06 23:45:43 +0000 |
---|---|---|
committer | mike-m <mikem.llvm@gmail.com> | 2010-05-06 23:45:43 +0000 |
commit | 68cb31901c590cabceee6e6356d62c84142114cb (patch) | |
tree | 6444bddc975b662fbe47d63cd98a7b776a407c1a /docs/tutorial | |
parent | c26ae5ab7e2d65b67c97524e66f50ce86445dec7 (diff) | |
download | external_llvm-68cb31901c590cabceee6e6356d62c84142114cb.zip external_llvm-68cb31901c590cabceee6e6356d62c84142114cb.tar.gz external_llvm-68cb31901c590cabceee6e6356d62c84142114cb.tar.bz2 |
Overhauled llvm/clang docs builds. Closes PR6613.
NOTE: 2nd part changeset for cfe trunk to follow.
*** PRE-PATCH ISSUES ADDRESSED
- clang api docs fail build from objdir
- clang/llvm api docs collide in install PREFIX/
- clang/llvm main docs collide in install
- clang/llvm main docs have full of hard coded destination
assumptions and make use of absolute root in static html files;
namely CommandGuide tools hard codes a website destination
for cross references and some html cross references assume
website root paths
*** IMPROVEMENTS
- bumped Doxygen from 1.4.x -> 1.6.3
- splits llvm/clang docs into 'main' and 'api' (doxygen) build trees
- provide consistent, reliable doc builds for both main+api docs
- support buid vs. install vs. website intentions
- support objdir builds
- document targets with 'make help'
- correct clean and uninstall operations
- use recursive dir delete only where absolutely necessary
- added call function fn.RMRF which safeguards against botched 'rm -rf';
if any target (or any variable is evaluated) which attempts
to remove any dirs which match a hard-coded 'safelist', a verbose
error will be printed and make will error-stop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103213 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/tutorial')
-rw-r--r-- | docs/tutorial/LangImpl1.html | 348 | ||||
-rw-r--r-- | docs/tutorial/LangImpl2.html | 1233 | ||||
-rw-r--r-- | docs/tutorial/LangImpl3.html | 1269 | ||||
-rw-r--r-- | docs/tutorial/LangImpl4.html | 1132 | ||||
-rw-r--r-- | docs/tutorial/LangImpl5-cfg.png | bin | 38586 -> 0 bytes | |||
-rw-r--r-- | docs/tutorial/LangImpl5.html | 1777 | ||||
-rw-r--r-- | docs/tutorial/LangImpl6.html | 1814 | ||||
-rw-r--r-- | docs/tutorial/LangImpl7.html | 2164 | ||||
-rw-r--r-- | docs/tutorial/LangImpl8.html | 365 | ||||
-rw-r--r-- | docs/tutorial/Makefile | 28 | ||||
-rw-r--r-- | docs/tutorial/OCamlLangImpl1.html | 365 | ||||
-rw-r--r-- | docs/tutorial/OCamlLangImpl2.html | 1045 | ||||
-rw-r--r-- | docs/tutorial/OCamlLangImpl3.html | 1093 | ||||
-rw-r--r-- | docs/tutorial/OCamlLangImpl4.html | 1029 | ||||
-rw-r--r-- | docs/tutorial/OCamlLangImpl5.html | 1569 | ||||
-rw-r--r-- | docs/tutorial/OCamlLangImpl6.html | 1574 | ||||
-rw-r--r-- | docs/tutorial/OCamlLangImpl7.html | 1907 | ||||
-rw-r--r-- | docs/tutorial/index.html | 48 |
18 files changed, 0 insertions, 18760 deletions
diff --git a/docs/tutorial/LangImpl1.html b/docs/tutorial/LangImpl1.html deleted file mode 100644 index 66843db..0000000 --- a/docs/tutorial/LangImpl1.html +++ /dev/null @@ -1,348 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Tutorial Introduction and the Lexer</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Tutorial Introduction and the Lexer</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 1 - <ol> - <li><a href="#intro">Tutorial Introduction</a></li> - <li><a href="#language">The Basic Language</a></li> - <li><a href="#lexer">The Lexer</a></li> - </ol> -</li> -<li><a href="LangImpl2.html">Chapter 2</a>: Implementing a Parser and AST</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Tutorial Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to the "Implementing a language with LLVM" tutorial. This tutorial -runs through the implementation of a simple language, showing how fun and -easy it can be. This tutorial will get you up and started as well as help to -build a framework you can extend to other languages. The code in this tutorial -can also be used as a playground to hack on other LLVM specific things. -</p> - -<p> -The goal of this tutorial is to progressively unveil our language, describing -how it is built up over time. This will let us cover a fairly broad range of -language design and LLVM-specific usage issues, showing and explaining the code -for it all along the way, without overwhelming you with tons of details up -front.</p> - -<p>It is useful to point out ahead of time that this tutorial is really about -teaching compiler techniques and LLVM specifically, <em>not</em> about teaching -modern and sane software engineering principles. In practice, this means that -we'll take a number of shortcuts to simplify the exposition. For example, the -code leaks memory, uses global variables all over the place, doesn't use nice -design patterns like <a -href="http://en.wikipedia.org/wiki/Visitor_pattern">visitors</a>, etc... but it -is very simple. If you dig in and use the code as a basis for future projects, -fixing these deficiencies shouldn't be hard.</p> - -<p>I've tried to put this tutorial together in a way that makes chapters easy to -skip over if you are already familiar with or are uninterested in the various -pieces. The structure of the tutorial is: -</p> - -<ul> -<li><b><a href="#language">Chapter #1</a>: Introduction to the Kaleidoscope -language, and the definition of its Lexer</b> - This shows where we are going -and the basic functionality that we want it to do. In order to make this -tutorial maximally understandable and hackable, we choose to implement -everything in C++ instead of using lexer and parser generators. LLVM obviously -works just fine with such tools, feel free to use one if you prefer.</li> -<li><b><a href="LangImpl2.html">Chapter #2</a>: Implementing a Parser and -AST</b> - With the lexer in place, we can talk about parsing techniques and -basic AST construction. This tutorial describes recursive descent parsing and -operator precedence parsing. Nothing in Chapters 1 or 2 is LLVM-specific, -the code doesn't even link in LLVM at this point. :)</li> -<li><b><a href="LangImpl3.html">Chapter #3</a>: Code generation to LLVM IR</b> - -With the AST ready, we can show off how easy generation of LLVM IR really -is.</li> -<li><b><a href="LangImpl4.html">Chapter #4</a>: Adding JIT and Optimizer -Support</b> - Because a lot of people are interested in using LLVM as a JIT, -we'll dive right into it and show you the 3 lines it takes to add JIT support. -LLVM is also useful in many other ways, but this is one simple and "sexy" way -to shows off its power. :)</li> -<li><b><a href="LangImpl5.html">Chapter #5</a>: Extending the Language: Control -Flow</b> - With the language up and running, we show how to extend it with -control flow operations (if/then/else and a 'for' loop). This gives us a chance -to talk about simple SSA construction and control flow.</li> -<li><b><a href="LangImpl6.html">Chapter #6</a>: Extending the Language: -User-defined Operators</b> - This is a silly but fun chapter that talks about -extending the language to let the user program define their own arbitrary -unary and binary operators (with assignable precedence!). This lets us build a -significant piece of the "language" as library routines.</li> -<li><b><a href="LangImpl7.html">Chapter #7</a>: Extending the Language: Mutable -Variables</b> - This chapter talks about adding user-defined local variables -along with an assignment operator. The interesting part about this is how -easy and trivial it is to construct SSA form in LLVM: no, LLVM does <em>not</em> -require your front-end to construct SSA form!</li> -<li><b><a href="LangImpl8.html">Chapter #8</a>: Conclusion and other useful LLVM -tidbits</b> - This chapter wraps up the series by talking about potential -ways to extend the language, but also includes a bunch of pointers to info about -"special topics" like adding garbage collection support, exceptions, debugging, -support for "spaghetti stacks", and a bunch of other tips and tricks.</li> - -</ul> - -<p>By the end of the tutorial, we'll have written a bit less than 700 lines of -non-comment, non-blank, lines of code. With this small amount of code, we'll -have built up a very reasonable compiler for a non-trivial language including -a hand-written lexer, parser, AST, as well as code generation support with a JIT -compiler. While other systems may have interesting "hello world" tutorials, -I think the breadth of this tutorial is a great testament to the strengths of -LLVM and why you should consider it if you're interested in language or compiler -design.</p> - -<p>A note about this tutorial: we expect you to extend the language and play -with it on your own. Take the code and go crazy hacking away at it, compilers -don't need to be scary creatures - it can be a lot of fun to play with -languages!</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="language">The Basic Language</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>This tutorial will be illustrated with a toy language that we'll call -"<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>" (derived -from "meaning beautiful, form, and view"). -Kaleidoscope is a procedural language that allows you to define functions, use -conditionals, math, etc. Over the course of the tutorial, we'll extend -Kaleidoscope to support the if/then/else construct, a for loop, user defined -operators, JIT compilation with a simple command line interface, etc.</p> - -<p>Because we want to keep things simple, the only datatype in Kaleidoscope is a -64-bit floating point type (aka 'double' in C parlance). As such, all values -are implicitly double precision and the language doesn't require type -declarations. This gives the language a very nice and simple syntax. For -example, the following simple example computes <a -href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers:</a></p> - -<div class="doc_code"> -<pre> -# Compute the x'th fibonacci number. -def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2) - -# This expression will compute the 40th number. -fib(40) -</pre> -</div> - -<p>We also allow Kaleidoscope to call into standard library functions (the LLVM -JIT makes this completely trivial). This means that you can use the 'extern' -keyword to define a function before you use it (this is also useful for mutually -recursive functions). For example:</p> - -<div class="doc_code"> -<pre> -extern sin(arg); -extern cos(arg); -extern atan2(arg1 arg2); - -atan2(sin(.4), cos(42)) -</pre> -</div> - -<p>A more interesting example is included in Chapter 6 where we write a little -Kaleidoscope application that <a href="LangImpl6.html#example">displays -a Mandelbrot Set</a> at various levels of magnification.</p> - -<p>Lets dive into the implementation of this language!</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="lexer">The Lexer</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>When it comes to implementing a language, the first thing needed is -the ability to process a text file and recognize what it says. The traditional -way to do this is to use a "<a -href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner') -to break the input up into "tokens". Each token returned by the lexer includes -a token code and potentially some metadata (e.g. the numeric value of a number). -First, we define the possibilities: -</p> - -<div class="doc_code"> -<pre> -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5, -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number -</pre> -</div> - -<p>Each token returned by our lexer will either be one of the Token enum values -or it will be an 'unknown' character like '+', which is returned as its ASCII -value. If the current token is an identifier, the <tt>IdentifierStr</tt> -global variable holds the name of the identifier. If the current token is a -numeric literal (like 1.0), <tt>NumVal</tt> holds its value. Note that we use -global variables for simplicity, this is not the best choice for a real language -implementation :). -</p> - -<p>The actual implementation of the lexer is a single function named -<tt>gettok</tt>. The <tt>gettok</tt> function is called to return the next token -from standard input. Its definition starts as:</p> - -<div class="doc_code"> -<pre> -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); -</pre> -</div> - -<p> -<tt>gettok</tt> works by calling the C <tt>getchar()</tt> function to read -characters one at a time from standard input. It eats them as it recognizes -them and stores the last character read, but not processed, in LastChar. The -first thing that it has to do is ignore whitespace between tokens. This is -accomplished with the loop above.</p> - -<p>The next thing <tt>gettok</tt> needs to do is recognize identifiers and -specific keywords like "def". Kaleidoscope does this with this simple loop:</p> - -<div class="doc_code"> -<pre> - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - return tok_identifier; - } -</pre> -</div> - -<p>Note that this code sets the '<tt>IdentifierStr</tt>' global whenever it -lexes an identifier. Also, since language keywords are matched by the same -loop, we handle them here inline. Numeric values are similar:</p> - -<div class="doc_code"> -<pre> - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } -</pre> -</div> - -<p>This is all pretty straight-forward code for processing input. When reading -a numeric value from input, we use the C <tt>strtod</tt> function to convert it -to a numeric value that we store in <tt>NumVal</tt>. Note that this isn't doing -sufficient error checking: it will incorrectly read "1.23.45.67" and handle it as -if you typed in "1.23". Feel free to extend it :). Next we handle comments: -</p> - -<div class="doc_code"> -<pre> - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } -</pre> -</div> - -<p>We handle comments by skipping to the end of the line and then return the -next token. Finally, if the input doesn't match one of the above cases, it is -either an operator character like '+' or the end of the file. These are handled -with this code:</p> - -<div class="doc_code"> -<pre> - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} -</pre> -</div> - -<p>With this, we have the complete lexer for the basic Kaleidoscope language -(the <a href="LangImpl2.html#code">full code listing</a> for the Lexer is -available in the <a href="LangImpl2.html">next chapter</a> of the tutorial). -Next we'll <a href="LangImpl2.html">build a simple parser that uses this to -build an Abstract Syntax Tree</a>. When we have that, we'll include a driver -so that you can use the lexer and parser together. -</p> - -<a href="LangImpl2.html">Next: Implementing a Parser and AST</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl2.html b/docs/tutorial/LangImpl2.html deleted file mode 100644 index 9c13b48..0000000 --- a/docs/tutorial/LangImpl2.html +++ /dev/null @@ -1,1233 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Implementing a Parser and AST</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 2 - <ol> - <li><a href="#intro">Chapter 2 Introduction</a></li> - <li><a href="#ast">The Abstract Syntax Tree (AST)</a></li> - <li><a href="#parserbasics">Parser Basics</a></li> - <li><a href="#parserprimexprs">Basic Expression Parsing</a></li> - <li><a href="#parserbinops">Binary Expression Parsing</a></li> - <li><a href="#parsertop">Parsing the Rest</a></li> - <li><a href="#driver">The Driver</a></li> - <li><a href="#conclusions">Conclusions</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 2 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. This chapter shows you how to use the lexer, built in -<a href="LangImpl1.html">Chapter 1</a>, to build a full <a -href="http://en.wikipedia.org/wiki/Parsing">parser</a> for -our Kaleidoscope language. Once we have a parser, we'll define and build an <a -href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax -Tree</a> (AST).</p> - -<p>The parser we will build uses a combination of <a -href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent -Parsing</a> and <a href= -"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence -Parsing</a> to parse the Kaleidoscope language (the latter for -binary expressions and the former for everything else). Before we get to -parsing though, lets talk about the output of the parser: the Abstract Syntax -Tree.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>The AST for a program captures its behavior in such a way that it is easy for -later stages of the compiler (e.g. code generation) to interpret. We basically -want one object for each construct in the language, and the AST should closely -model the language. In Kaleidoscope, we have expressions, a prototype, and a -function object. We'll start with expressions first:</p> - -<div class="doc_code"> -<pre> -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} -}; -</pre> -</div> - -<p>The code above shows the definition of the base ExprAST class and one -subclass which we use for numeric literals. The important thing to note about -this code is that the NumberExprAST class captures the numeric value of the -literal as an instance variable. This allows later phases of the compiler to -know what the stored numeric value is.</p> - -<p>Right now we only create the AST, so there are no useful accessor methods on -them. It would be very easy to add a virtual method to pretty print the code, -for example. Here are the other expression AST node definitions that we'll use -in the basic form of the Kaleidoscope language: -</p> - -<div class="doc_code"> -<pre> -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} -}; -</pre> -</div> - -<p>This is all (intentionally) rather straight-forward: variables capture the -variable name, binary operators capture their opcode (e.g. '+'), and calls -capture a function name as well as a list of any argument expressions. One thing -that is nice about our AST is that it captures the language features without -talking about the syntax of the language. Note that there is no discussion about -precedence of binary operators, lexical structure, etc.</p> - -<p>For our basic language, these are all of the expression nodes we'll define. -Because it doesn't have conditional control flow, it isn't Turing-complete; -we'll fix that in a later installment. The two things we need next are a way -to talk about the interface to a function, and a way to talk about functions -themselves:</p> - -<div class="doc_code"> -<pre> -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes). -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args) - : Name(name), Args(args) {} -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} -}; -</pre> -</div> - -<p>In Kaleidoscope, functions are typed with just a count of their arguments. -Since all values are double precision floating point, the type of each argument -doesn't need to be stored anywhere. In a more aggressive and realistic -language, the "ExprAST" class would probably have a type field.</p> - -<p>With this scaffolding, we can now talk about parsing expressions and function -bodies in Kaleidoscope.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="parserbasics">Parser Basics</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Now that we have an AST to build, we need to define the parser code to build -it. The idea here is that we want to parse something like "x+y" (which is -returned as three tokens by the lexer) into an AST that could be generated with -calls like this:</p> - -<div class="doc_code"> -<pre> - ExprAST *X = new VariableExprAST("x"); - ExprAST *Y = new VariableExprAST("y"); - ExprAST *Result = new BinaryExprAST('+', X, Y); -</pre> -</div> - -<p>In order to do this, we'll start by defining some basic helper routines:</p> - -<div class="doc_code"> -<pre> -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} -</pre> -</div> - -<p> -This implements a simple token buffer around the lexer. This allows -us to look one token ahead at what the lexer is returning. Every function in -our parser will assume that CurTok is the current token that needs to be -parsed.</p> - -<div class="doc_code"> -<pre> - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } -</pre> -</div> - -<p> -The <tt>Error</tt> routines are simple helper routines that our parser will use -to handle errors. The error recovery in our parser will not be the best and -is not particular user-friendly, but it will be enough for our tutorial. These -routines make it easier to handle errors in routines that have various return -types: they always return null.</p> - -<p>With these basic helper functions, we can implement the first -piece of our grammar: numeric literals.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="parserprimexprs">Basic Expression - Parsing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>We start with numeric literals, because they are the simplest to process. -For each production in our grammar, we'll define a function which parses that -production. For numeric literals, we have: -</p> - -<div class="doc_code"> -<pre> -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} -</pre> -</div> - -<p>This routine is very simple: it expects to be called when the current token -is a <tt>tok_number</tt> token. It takes the current number value, creates -a <tt>NumberExprAST</tt> node, advances the lexer to the next token, and finally -returns.</p> - -<p>There are some interesting aspects to this. The most important one is that -this routine eats all of the tokens that correspond to the production and -returns the lexer buffer with the next token (which is not part of the grammar -production) ready to go. This is a fairly standard way to go for recursive -descent parsers. For a better example, the parenthesis operator is defined like -this:</p> - -<div class="doc_code"> -<pre> -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} -</pre> -</div> - -<p>This function illustrates a number of interesting things about the -parser:</p> - -<p> -1) It shows how we use the Error routines. When called, this function expects -that the current token is a '(' token, but after parsing the subexpression, it -is possible that there is no ')' waiting. For example, if the user types in -"(4 x" instead of "(4)", the parser should emit an error. Because errors can -occur, the parser needs a way to indicate that they happened: in our parser, we -return null on an error.</p> - -<p>2) Another interesting aspect of this function is that it uses recursion by -calling <tt>ParseExpression</tt> (we will soon see that <tt>ParseExpression</tt> can call -<tt>ParseParenExpr</tt>). This is powerful because it allows us to handle -recursive grammars, and keeps each production very simple. Note that -parentheses do not cause construction of AST nodes themselves. While we could -do it this way, the most important role of parentheses are to guide the parser -and provide grouping. Once the parser constructs the AST, parentheses are not -needed.</p> - -<p>The next simple production is for handling variable references and function -calls:</p> - -<div class="doc_code"> -<pre> -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} -</pre> -</div> - -<p>This routine follows the same style as the other routines. (It expects to be -called if the current token is a <tt>tok_identifier</tt> token). It also has -recursion and error handling. One interesting aspect of this is that it uses -<em>look-ahead</em> to determine if the current identifier is a stand alone -variable reference or if it is a function call expression. It handles this by -checking to see if the token after the identifier is a '(' token, constructing -either a <tt>VariableExprAST</tt> or <tt>CallExprAST</tt> node as appropriate. -</p> - -<p>Now that we have all of our simple expression-parsing logic in place, we can -define a helper function to wrap it together into one entry point. We call this -class of expressions "primary" expressions, for reasons that will become more -clear <a href="LangImpl6.html#unary">later in the tutorial</a>. In order to -parse an arbitrary primary expression, we need to determine what sort of -expression it is:</p> - -<div class="doc_code"> -<pre> -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - } -} -</pre> -</div> - -<p>Now that you see the definition of this function, it is more obvious why we -can assume the state of CurTok in the various functions. This uses look-ahead -to determine which sort of expression is being inspected, and then parses it -with a function call.</p> - -<p>Now that basic expressions are handled, we need to handle binary expressions. -They are a bit more complex.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="parserbinops">Binary Expression - Parsing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Binary expressions are significantly harder to parse because they are often -ambiguous. For example, when given the string "x+y*z", the parser can choose -to parse it as either "(x+y)*z" or "x+(y*z)". With common definitions from -mathematics, we expect the later parse, because "*" (multiplication) has -higher <em>precedence</em> than "+" (addition).</p> - -<p>There are many ways to handle this, but an elegant and efficient way is to -use <a href= -"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence -Parsing</a>. This parsing technique uses the precedence of binary operators to -guide recursion. To start with, we need a table of precedences:</p> - -<div class="doc_code"> -<pre> -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -int main() { - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - ... -} -</pre> -</div> - -<p>For the basic form of Kaleidoscope, we will only support 4 binary operators -(this can obviously be extended by you, our brave and intrepid reader). The -<tt>GetTokPrecedence</tt> function returns the precedence for the current token, -or -1 if the token is not a binary operator. Having a map makes it easy to add -new operators and makes it clear that the algorithm doesn't depend on the -specific operators involved, but it would be easy enough to eliminate the map -and do the comparisons in the <tt>GetTokPrecedence</tt> function. (Or just use -a fixed-size array).</p> - -<p>With the helper above defined, we can now start parsing binary expressions. -The basic idea of operator precedence parsing is to break down an expression -with potentially ambiguous binary operators into pieces. Consider ,for example, -the expression "a+b+(c+d)*e*f+g". Operator precedence parsing considers this -as a stream of primary expressions separated by binary operators. As such, -it will first parse the leading primary expression "a", then it will see the -pairs [+, b] [+, (c+d)] [*, e] [*, f] and [+, g]. Note that because parentheses -are primary expressions, the binary expression parser doesn't need to worry -about nested subexpressions like (c+d) at all. -</p> - -<p> -To start, an expression is a primary expression potentially followed by a -sequence of [binop,primaryexpr] pairs:</p> - -<div class="doc_code"> -<pre> -/// expression -/// ::= primary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParsePrimary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} -</pre> -</div> - -<p><tt>ParseBinOpRHS</tt> is the function that parses the sequence of pairs for -us. It takes a precedence and a pointer to an expression for the part that has been -parsed so far. Note that "x" is a perfectly valid expression: As such, "binoprhs" is -allowed to be empty, in which case it returns the expression that is passed into -it. In our example above, the code passes the expression for "a" into -<tt>ParseBinOpRHS</tt> and the current token is "+".</p> - -<p>The precedence value passed into <tt>ParseBinOpRHS</tt> indicates the <em> -minimal operator precedence</em> that the function is allowed to eat. For -example, if the current pair stream is [+, x] and <tt>ParseBinOpRHS</tt> is -passed in a precedence of 40, it will not consume any tokens (because the -precedence of '+' is only 20). With this in mind, <tt>ParseBinOpRHS</tt> starts -with:</p> - -<div class="doc_code"> -<pre> -/// binoprhs -/// ::= ('+' primary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; -</pre> -</div> - -<p>This code gets the precedence of the current token and checks to see if if is -too low. Because we defined invalid tokens to have a precedence of -1, this -check implicitly knows that the pair-stream ends when the token stream runs out -of binary operators. If this check succeeds, we know that the token is a binary -operator and that it will be included in this expression:</p> - -<div class="doc_code"> -<pre> - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the primary expression after the binary operator. - ExprAST *RHS = ParsePrimary(); - if (!RHS) return 0; -</pre> -</div> - -<p>As such, this code eats (and remembers) the binary operator and then parses -the primary expression that follows. This builds up the whole pair, the first of -which is [+, b] for the running example.</p> - -<p>Now that we parsed the left-hand side of an expression and one pair of the -RHS sequence, we have to decide which way the expression associates. In -particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)". -To determine this, we look ahead at "binop" to determine its precedence and -compare it to BinOp's precedence (which is '+' in this case):</p> - -<div class="doc_code"> -<pre> - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { -</pre> -</div> - -<p>If the precedence of the binop to the right of "RHS" is lower or equal to the -precedence of our current operator, then we know that the parentheses associate -as "(a+b) binop ...". In our example, the current operator is "+" and the next -operator is "+", we know that they have the same precedence. In this case we'll -create the AST node for "a+b", and then continue parsing:</p> - -<div class="doc_code"> -<pre> - ... if body omitted ... - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } // loop around to the top of the while loop. -} -</pre> -</div> - -<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next -iteration of the loop, with "+" as the current token. The code above will eat, -remember, and parse "(c+d)" as the primary expression, which makes the -current pair equal to [+, (c+d)]. It will then evaluate the 'if' conditional above with -"*" as the binop to the right of the primary. In this case, the precedence of "*" is -higher than the precedence of "+" so the if condition will be entered.</p> - -<p>The critical question left here is "how can the if condition parse the right -hand side in full"? In particular, to build the AST correctly for our example, -it needs to get all of "(c+d)*e*f" as the RHS expression variable. The code to -do this is surprisingly simple (code from the above two blocks duplicated for -context):</p> - -<div class="doc_code"> -<pre> - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - <b>RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0;</b> - } - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } // loop around to the top of the while loop. -} -</pre> -</div> - -<p>At this point, we know that the binary operator to the RHS of our primary -has higher precedence than the binop we are currently parsing. As such, we know -that any sequence of pairs whose operators are all higher precedence than "+" -should be parsed together and returned as "RHS". To do this, we recursively -invoke the <tt>ParseBinOpRHS</tt> function specifying "TokPrec+1" as the minimum -precedence required for it to continue. In our example above, this will cause -it to return the AST node for "(c+d)*e*f" as RHS, which is then set as the RHS -of the '+' expression.</p> - -<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed -and added to the AST. With this little bit of code (14 non-trivial lines), we -correctly handle fully general binary expression parsing in a very elegant way. -This was a whirlwind tour of this code, and it is somewhat subtle. I recommend -running through it with a few tough examples to see how it works. -</p> - -<p>This wraps up handling of expressions. At this point, we can point the -parser at an arbitrary token stream and build an expression from it, stopping -at the first token that is not part of the expression. Next up we need to -handle function definitions, etc.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="parsertop">Parsing the Rest</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -The next thing missing is handling of function prototypes. In Kaleidoscope, -these are used both for 'extern' function declarations as well as function body -definitions. The code to do this is straight-forward and not very interesting -(once you've survived expressions): -</p> - -<div class="doc_code"> -<pre> -/// prototype -/// ::= id '(' id* ')' -static PrototypeAST *ParsePrototype() { - if (CurTok != tok_identifier) - return ErrorP("Expected function name in prototype"); - - std::string FnName = IdentifierStr; - getNextToken(); - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - // Read the list of argument names. - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - return new PrototypeAST(FnName, ArgNames); -} -</pre> -</div> - -<p>Given this, a function definition is very simple, just a prototype plus -an expression to implement the body:</p> - -<div class="doc_code"> -<pre> -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} -</pre> -</div> - -<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as -well as to support forward declaration of user functions. These 'extern's are just -prototypes with no body:</p> - -<div class="doc_code"> -<pre> -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} -</pre> -</div> - -<p>Finally, we'll also let the user type in arbitrary top-level expressions and -evaluate them on the fly. We will handle this by defining anonymous nullary -(zero argument) functions for them:</p> - -<div class="doc_code"> -<pre> -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} -</pre> -</div> - -<p>Now that we have all the pieces, let's build a little driver that will let us -actually <em>execute</em> this code we've built!</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="driver">The Driver</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>The driver for this simply invokes all of the parsing pieces with a top-level -dispatch loop. There isn't much interesting here, so I'll just include the -top-level loop. See <a href="#code">below</a> for full code in the "Top-Level -Parsing" section.</p> - -<div class="doc_code"> -<pre> -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} -</pre> -</div> - -<p>The most interesting part of this is that we ignore top-level semicolons. -Why is this, you ask? The basic reason is that if you type "4 + 5" at the -command line, the parser doesn't know whether that is the end of what you will type -or not. For example, on the next line you could type "def foo..." in which case -4+5 is the end of a top-level expression. Alternatively you could type "* 6", -which would continue the expression. Having top-level semicolons allows you to -type "4+5;", and the parser will know you are done.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="conclusions">Conclusions</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>With just under 400 lines of commented code (240 lines of non-comment, -non-blank code), we fully defined our minimal language, including a lexer, -parser, and AST builder. With this done, the executable will validate -Kaleidoscope code and tell us if it is grammatically invalid. For -example, here is a sample interaction:</p> - -<div class="doc_code"> -<pre> -$ <b>./a.out</b> -ready> <b>def foo(x y) x+foo(y, 4.0);</b> -Parsed a function definition. -ready> <b>def foo(x y) x+y y;</b> -Parsed a function definition. -Parsed a top-level expr -ready> <b>def foo(x y) x+y );</b> -Parsed a function definition. -Error: unknown token when expecting an expression -ready> <b>extern sin(a);</b> -ready> Parsed an extern -ready> <b>^D</b> -$ -</pre> -</div> - -<p>There is a lot of room for extension here. You can define new AST nodes, -extend the language in many ways, etc. In the <a href="LangImpl3.html">next -installment</a>, we will describe how to generate LLVM Intermediate -Representation (IR) from the AST.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for this and the previous chapter. -Note that it is fully self-contained: you don't need LLVM or any external -libraries at all for this. (Besides the C and C++ standard libraries, of -course.) To build this, just compile with:</p> - -<div class="doc_code"> -<pre> - # Compile - g++ -g -O3 toy.cpp - # Run - ./a.out -</pre> -</div> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -#include <cstdio> -#include <cstdlib> -#include <string> -#include <map> -#include <vector> - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes). -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args) - : Name(name), Args(args) {} - -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - } -} - -/// binoprhs -/// ::= ('+' primary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the primary expression after the binary operator. - ExprAST *RHS = ParsePrimary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= primary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParsePrimary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -static PrototypeAST *ParsePrototype() { - if (CurTok != tok_identifier) - return ErrorP("Expected function name in prototype"); - - std::string FnName = IdentifierStr; - getNextToken(); - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - return new PrototypeAST(FnName, ArgNames); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing -//===----------------------------------------------------------------------===// - -static void HandleDefinition() { - if (ParseDefinition()) { - fprintf(stderr, "Parsed a function definition.\n"); - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (ParseExtern()) { - fprintf(stderr, "Parsed an extern\n"); - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (ParseTopLevelExpr()) { - fprintf(stderr, "Parsed a top-level expr\n"); - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Run the main "interpreter loop" now. - MainLoop(); - - return 0; -} -</pre> -</div> -<a href="LangImpl3.html">Next: Implementing Code Generation to LLVM IR</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl3.html b/docs/tutorial/LangImpl3.html deleted file mode 100644 index fe28d41..0000000 --- a/docs/tutorial/LangImpl3.html +++ /dev/null @@ -1,1269 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Implementing code generation to LLVM IR</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 3 - <ol> - <li><a href="#intro">Chapter 3 Introduction</a></li> - <li><a href="#basics">Code Generation Setup</a></li> - <li><a href="#exprs">Expression Code Generation</a></li> - <li><a href="#funcs">Function Code Generation</a></li> - <li><a href="#driver">Driver Changes and Closing Thoughts</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl4.html">Chapter 4</a>: Adding JIT and Optimizer -Support</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 3 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. This chapter shows you how to transform the <a -href="LangImpl2.html">Abstract Syntax Tree</a>, built in Chapter 2, into LLVM IR. -This will teach you a little bit about how LLVM does things, as well as -demonstrate how easy it is to use. It's much more work to build a lexer and -parser than it is to generate LLVM IR code. :) -</p> - -<p><b>Please note</b>: the code in this chapter and later require LLVM 2.2 or -later. LLVM 2.1 and before will not work with it. Also note that you need -to use a version of this tutorial that matches your LLVM release: If you are -using an official LLVM release, use the version of the documentation included -with your release or on the <a href="http://llvm.org/releases/">llvm.org -releases page</a>.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="basics">Code Generation Setup</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -In order to generate LLVM IR, we want some simple setup to get started. First -we define virtual code generation (codegen) methods in each AST class:</p> - -<div class="doc_code"> -<pre> -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - <b>virtual Value *Codegen() = 0;</b> -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - <b>virtual Value *Codegen();</b> -}; -... -</pre> -</div> - -<p>The Codegen() method says to emit IR for that AST node along with all the things it -depends on, and they all return an LLVM Value object. -"Value" is the class used to represent a "<a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single -Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect -of SSA values is that their value is computed as the related instruction -executes, and it does not get a new value until (and if) the instruction -re-executes. In other words, there is no way to "change" an SSA value. For -more information, please read up on <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single -Assignment</a> - the concepts are really quite natural once you grok them.</p> - -<p>Note that instead of adding virtual methods to the ExprAST class hierarchy, -it could also make sense to use a <a -href="http://en.wikipedia.org/wiki/Visitor_pattern">visitor pattern</a> or some -other way to model this. Again, this tutorial won't dwell on good software -engineering practices: for our purposes, adding a virtual method is -simplest.</p> - -<p>The -second thing we want is an "Error" method like we used for the parser, which will -be used to report errors found during code generation (for example, use of an -undeclared parameter):</p> - -<div class="doc_code"> -<pre> -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, Value*> NamedValues; -</pre> -</div> - -<p>The static variables will be used during code generation. <tt>TheModule</tt> -is the LLVM construct that contains all of the functions and global variables in -a chunk of code. In many ways, it is the top-level structure that the LLVM IR -uses to contain code.</p> - -<p>The <tt>Builder</tt> object is a helper object that makes it easy to generate -LLVM instructions. Instances of the <a -href="http://llvm.org/doxygen/IRBuilder_8h-source.html"><tt>IRBuilder</tt></a> -class template keep track of the current place to insert instructions and has -methods to create new instructions.</p> - -<p>The <tt>NamedValues</tt> map keeps track of which values are defined in the -current scope and what their LLVM representation is. (In other words, it is a -symbol table for the code). In this form of Kaleidoscope, the only things that -can be referenced are function parameters. As such, function parameters will -be in this map when generating code for their function body.</p> - -<p> -With these basics in place, we can start talking about how to generate code for -each expression. Note that this assumes that the <tt>Builder</tt> has been set -up to generate code <em>into</em> something. For now, we'll assume that this -has already been done, and we'll just use it to emit code. -</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="exprs">Expression Code Generation</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Generating LLVM code for expression nodes is very straightforward: less -than 45 lines of commented code for all four of our expression nodes. First -we'll do numeric literals:</p> - -<div class="doc_code"> -<pre> -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} -</pre> -</div> - -<p>In the LLVM IR, numeric constants are represented with the -<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt> -internally (<tt>APFloat</tt> has the capability of holding floating point -constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just -creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR -that constants are all uniqued together and shared. For this reason, the API -uses the "foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".</p> - -<div class="doc_code"> -<pre> -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - return V ? V : ErrorV("Unknown variable name"); -} -</pre> -</div> - -<p>References to variables are also quite simple using LLVM. In the simple version -of Kaleidoscope, we assume that the variable has already been emitted somewhere -and its value is available. In practice, the only values that can be in the -<tt>NamedValues</tt> map are function arguments. This -code simply checks to see that the specified name is in the map (if not, an -unknown variable is being referenced) and returns the value for it. In future -chapters, we'll add support for <a href="LangImpl5.html#for">loop induction -variables</a> in the symbol table, and for <a -href="LangImpl7.html#localvars">local variables</a>.</p> - -<div class="doc_code"> -<pre> -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateAdd(L, R, "addtmp"); - case '-': return Builder.CreateSub(L, R, "subtmp"); - case '*': return Builder.CreateMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: return ErrorV("invalid binary operator"); - } -} -</pre> -</div> - -<p>Binary operators start to get more interesting. The basic idea here is that -we recursively emit code for the left-hand side of the expression, then the -right-hand side, then we compute the result of the binary expression. In this -code, we do a simple switch on the opcode to create the right LLVM instruction. -</p> - -<p>In the example above, the LLVM builder class is starting to show its value. -IRBuilder knows where to insert the newly created instruction, all you have to -do is specify what instruction to create (e.g. with <tt>CreateAdd</tt>), which -operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name -for the generated instruction.</p> - -<p>One nice thing about LLVM is that the name is just a hint. For instance, if -the code above emits multiple "addtmp" variables, LLVM will automatically -provide each one with an increasing, unique numeric suffix. Local value names -for instructions are purely optional, but it makes it much easier to read the -IR dumps.</p> - -<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained by -strict rules: for example, the Left and Right operators of -an <a href="../LangRef.html#i_add">add instruction</a> must have the same -type, and the result type of the add must match the operand types. Because -all values in Kaleidoscope are doubles, this makes for very simple code for add, -sub and mul.</p> - -<p>On the other hand, LLVM specifies that the <a -href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value -(a one bit integer). The problem with this is that Kaleidoscope wants the value to be a 0.0 or 1.0 value. In order to get these semantics, we combine the fcmp instruction with -a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction -converts its input integer into a floating point value by treating the input -as an unsigned value. In contrast, if we used the <a -href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<' -operator would return 0.0 and -1.0, depending on the input value.</p> - -<div class="doc_code"> -<pre> -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); -} -</pre> -</div> - -<p>Code generation for function calls is quite straightforward with LLVM. The -code above initially does a function name lookup in the LLVM Module's symbol -table. Recall that the LLVM Module is the container that holds all of the -functions we are JIT'ing. By giving each function the same name as what the -user specifies, we can use the LLVM symbol table to resolve function names for -us.</p> - -<p>Once we have the function to call, we recursively codegen each argument that -is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call -instruction</a>. Note that LLVM uses the native C calling conventions by -default, allowing these calls to also call into standard library functions like -"sin" and "cos", with no additional effort.</p> - -<p>This wraps up our handling of the four basic expressions that we have so far -in Kaleidoscope. Feel free to go in and add some more. For example, by -browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find -several other interesting instructions that are really easy to plug into our -basic framework.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="funcs">Function Code Generation</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Code generation for prototypes and functions must handle a number of -details, which make their code less beautiful than expression code -generation, but allows us to illustrate some important points. First, lets -talk about code generation for prototypes: they are used both for function -bodies and external function declarations. The code starts with:</p> - -<div class="doc_code"> -<pre> -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<const Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); -</pre> -</div> - -<p>This code packs a lot of power into a few lines. Note first that this -function returns a "Function*" instead of a "Value*". Because a "prototype" -really talks about the external interface for a function (not the value computed -by an expression), it makes sense for it to return the LLVM Function it -corresponds to when codegen'd.</p> - -<p>The call to <tt>FunctionType::get</tt> creates -the <tt>FunctionType</tt> that should be used for a given Prototype. Since all -function arguments in Kaleidoscope are of type double, the first line creates -a vector of "N" LLVM double types. It then uses the <tt>Functiontype::get</tt> -method to create a function type that takes "N" doubles as arguments, returns -one double as a result, and that is not vararg (the false parameter indicates -this). Note that Types in LLVM are uniqued just like Constants are, so you -don't "new" a type, you "get" it.</p> - -<p>The final line above actually creates the function that the prototype will -correspond to. This indicates the type, linkage and name to use, as well as which -module to insert into. "<a href="../LangRef.html#linkage">external linkage</a>" -means that the function may be defined outside the current module and/or that it -is callable by functions outside the module. The Name passed in is the name the -user specified: since "<tt>TheModule</tt>" is specified, this name is registered -in "<tt>TheModule</tt>"s symbol table, which is used by the function call code -above.</p> - -<div class="doc_code"> -<pre> - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); -</pre> -</div> - -<p>The Module symbol table works just like the Function symbol table when it -comes to name conflicts: if a new function is created with a name was previously -added to the symbol table, it will get implicitly renamed when added to the -Module. The code above exploits this fact to determine if there was a previous -definition of this function.</p> - -<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases: -first, we want to allow 'extern'ing a function more than once, as long as the -prototypes for the externs match (since all arguments have the same type, we -just have to check that the number of arguments match). Second, we want to -allow 'extern'ing a function and then defining a body for it. This is useful -when defining mutually recursive functions.</p> - -<p>In order to implement this, the code above first checks to see if there is -a collision on the name of the function. If so, it deletes the function we just -created (by calling <tt>eraseFromParent</tt>) and then calling -<tt>getFunction</tt> to get the existing function with the specified name. Note -that many APIs in LLVM have "erase" forms and "remove" forms. The "remove" form -unlinks the object from its parent (e.g. a Function from a Module) and returns -it. The "erase" form unlinks the object and then deletes it.</p> - -<div class="doc_code"> -<pre> - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } -</pre> -</div> - -<p>In order to verify the logic above, we first check to see if the pre-existing -function is "empty". In this case, empty means that it has no basic blocks in -it, which means it has no body. If it has no body, it is a forward -declaration. Since we don't allow anything after a full definition of the -function, the code rejects this case. If the previous reference to a function -was an 'extern', we simply verify that the number of arguments for that -definition and this one match up. If not, we emit an error.</p> - -<div class="doc_code"> -<pre> - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) { - AI->setName(Args[Idx]); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = AI; - } - return F; -} -</pre> -</div> - -<p>The last bit of code for prototypes loops over all of the arguments in the -function, setting the name of the LLVM Argument objects to match, and registering -the arguments in the <tt>NamedValues</tt> map for future use by the -<tt>VariableExprAST</tt> AST node. Once this is set up, it returns the Function -object to the caller. Note that we don't check for conflicting -argument names here (e.g. "extern foo(a b a)"). Doing so would be very -straight-forward with the mechanics we have already used above.</p> - -<div class="doc_code"> -<pre> -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; -</pre> -</div> - -<p>Code generation for function definitions starts out simply enough: we just -codegen the prototype (Proto) and verify that it is ok. We then clear out the -<tt>NamedValues</tt> map to make sure that there isn't anything in it from the -last function we compiled. Code generation of the prototype ensures that there -is an LLVM Function object that is ready to go for us.</p> - -<div class="doc_code"> -<pre> - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { -</pre> -</div> - -<p>Now we get to the point where the <tt>Builder</tt> is set up. The first -line creates a new <a href="http://en.wikipedia.org/wiki/Basic_block">basic -block</a> (named "entry"), which is inserted into <tt>TheFunction</tt>. The -second line then tells the builder that new instructions should be inserted into -the end of the new basic block. Basic blocks in LLVM are an important part -of functions that define the <a -href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>. -Since we don't have any control flow, our functions will only contain one -block at this point. We'll fix this in <a href="LangImpl5.html">Chapter 5</a> :).</p> - -<div class="doc_code"> -<pre> - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - return TheFunction; - } -</pre> -</div> - -<p>Once the insertion point is set up, we call the <tt>CodeGen()</tt> method for -the root expression of the function. If no error happens, this emits code to -compute the expression into the entry block and returns the value that was -computed. Assuming no error, we then create an LLVM <a -href="../LangRef.html#i_ret">ret instruction</a>, which completes the function. -Once the function is built, we call <tt>verifyFunction</tt>, which -is provided by LLVM. This function does a variety of consistency checks on the -generated code, to determine if our compiler is doing everything right. Using -this is important: it can catch a lot of bugs. Once the function is finished -and validated, we return it.</p> - -<div class="doc_code"> -<pre> - // Error reading body, remove function. - TheFunction->eraseFromParent(); - return 0; -} -</pre> -</div> - -<p>The only piece left here is handling of the error case. For simplicity, we -handle this by merely deleting the function we produced with the -<tt>eraseFromParent</tt> method. This allows the user to redefine a function -that they incorrectly typed in before: if we didn't delete it, it would live in -the symbol table, with a body, preventing future redefinition.</p> - -<p>This code does have a bug, though. Since the <tt>PrototypeAST::Codegen</tt> -can return a previously defined forward declaration, our code can actually delete -a forward declaration. There are a number of ways to fix this bug, see what you -can come up with! Here is a testcase:</p> - -<div class="doc_code"> -<pre> -extern foo(a b); # ok, defines foo. -def foo(a b) c; # error, 'c' is invalid. -def bar() foo(1, 2); # error, unknown function "foo" -</pre> -</div> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="driver">Driver Changes and -Closing Thoughts</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -For now, code generation to LLVM doesn't really get us much, except that we can -look at the pretty IR calls. The sample code inserts calls to Codegen into the -"<tt>HandleDefinition</tt>", "<tt>HandleExtern</tt>" etc functions, and then -dumps out the LLVM IR. This gives a nice way to look at the LLVM IR for simple -functions. For example: -</p> - -<div class="doc_code"> -<pre> -ready> <b>4+5</b>; -Read top-level expression: -define double @""() { -entry: - ret double 9.000000e+00 -} -</pre> -</div> - -<p>Note how the parser turns the top-level expression into anonymous functions -for us. This will be handy when we add <a href="LangImpl4.html#jit">JIT -support</a> in the next chapter. Also note that the code is very literally -transcribed, no optimizations are being performed except simple constant -folding done by IRBuilder. We will -<a href="LangImpl4.html#trivialconstfold">add optimizations</a> explicitly in -the next chapter.</p> - -<div class="doc_code"> -<pre> -ready> <b>def foo(a b) a*a + 2*a*b + b*b;</b> -Read function definition: -define double @foo(double %a, double %b) { -entry: - %multmp = fmul double %a, %a - %multmp1 = fmul double 2.000000e+00, %a - %multmp2 = fmul double %multmp1, %b - %addtmp = fadd double %multmp, %multmp2 - %multmp3 = fmul double %b, %b - %addtmp4 = fadd double %addtmp, %multmp3 - ret double %addtmp4 -} -</pre> -</div> - -<p>This shows some simple arithmetic. Notice the striking similarity to the -LLVM builder calls that we use to create the instructions.</p> - -<div class="doc_code"> -<pre> -ready> <b>def bar(a) foo(a, 4.0) + bar(31337);</b> -Read function definition: -define double @bar(double %a) { -entry: - %calltmp = call double @foo( double %a, double 4.000000e+00 ) - %calltmp1 = call double @bar( double 3.133700e+04 ) - %addtmp = fadd double %calltmp, %calltmp1 - ret double %addtmp -} -</pre> -</div> - -<p>This shows some function calls. Note that this function will take a long -time to execute if you call it. In the future we'll add conditional control -flow to actually make recursion useful :).</p> - -<div class="doc_code"> -<pre> -ready> <b>extern cos(x);</b> -Read extern: -declare double @cos(double) - -ready> <b>cos(1.234);</b> -Read top-level expression: -define double @""() { -entry: - %calltmp = call double @cos( double 1.234000e+00 ) - ret double %calltmp -} -</pre> -</div> - -<p>This shows an extern for the libm "cos" function, and a call to it.</p> - - -<div class="doc_code"> -<pre> -ready> <b>^D</b> -; ModuleID = 'my cool jit' - -define double @""() { -entry: - %addtmp = fadd double 4.000000e+00, 5.000000e+00 - ret double %addtmp -} - -define double @foo(double %a, double %b) { -entry: - %multmp = fmul double %a, %a - %multmp1 = fmul double 2.000000e+00, %a - %multmp2 = fmul double %multmp1, %b - %addtmp = fadd double %multmp, %multmp2 - %multmp3 = fmul double %b, %b - %addtmp4 = fadd double %addtmp, %multmp3 - ret double %addtmp4 -} - -define double @bar(double %a) { -entry: - %calltmp = call double @foo( double %a, double 4.000000e+00 ) - %calltmp1 = call double @bar( double 3.133700e+04 ) - %addtmp = fadd double %calltmp, %calltmp1 - ret double %addtmp -} - -declare double @cos(double) - -define double @""() { -entry: - %calltmp = call double @cos( double 1.234000e+00 ) - ret double %calltmp -} -</pre> -</div> - -<p>When you quit the current demo, it dumps out the IR for the entire module -generated. Here you can see the big picture with all the functions referencing -each other.</p> - -<p>This wraps up the third chapter of the Kaleidoscope tutorial. Up next, we'll -describe how to <a href="LangImpl4.html">add JIT codegen and optimizer -support</a> to this so we can actually start running code!</p> - -</div> - - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for our running example, enhanced with the -LLVM code generator. Because this uses the LLVM libraries, we need to link -them in. To do this, we use the <a -href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform -our makefile/command line about which options to use:</p> - -<div class="doc_code"> -<pre> - # Compile - g++ -g -O3 toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy - # Run - ./toy -</pre> -</div> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -// To build this: -// See example below. - -#include "llvm/DerivedTypes.h" -#include "llvm/LLVMContext.h" -#include "llvm/Module.h" -#include "llvm/Analysis/Verifier.h" -#include "llvm/Support/IRBuilder.h" -#include <cstdio> -#include <string> -#include <map> -#include <vector> -using namespace llvm; - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - virtual Value *Codegen() = 0; -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - virtual Value *Codegen(); -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} - virtual Value *Codegen(); -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} - virtual Value *Codegen(); -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} - virtual Value *Codegen(); -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes). -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args) - : Name(name), Args(args) {} - - Function *Codegen(); -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - - Function *Codegen(); -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - } -} - -/// binoprhs -/// ::= ('+' primary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the primary expression after the binary operator. - ExprAST *RHS = ParsePrimary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= primary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParsePrimary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -static PrototypeAST *ParsePrototype() { - if (CurTok != tok_identifier) - return ErrorP("Expected function name in prototype"); - - std::string FnName = IdentifierStr; - getNextToken(); - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - return new PrototypeAST(FnName, ArgNames); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Code Generation -//===----------------------------------------------------------------------===// - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, Value*> NamedValues; - -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} - -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - return V ? V : ErrorV("Unknown variable name"); -} - -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateAdd(L, R, "addtmp"); - case '-': return Builder.CreateSub(L, R, "subtmp"); - case '*': return Builder.CreateMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: return ErrorV("invalid binary operator"); - } -} - -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); -} - -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<const Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); - - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); - - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } - - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) { - AI->setName(Args[Idx]); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = AI; - } - - return F; -} - -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - return TheFunction; - } - - // Error reading body, remove function. - TheFunction->eraseFromParent(); - return 0; -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing and JIT Driver -//===----------------------------------------------------------------------===// - -static void HandleDefinition() { - if (FunctionAST *F = ParseDefinition()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read function definition:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (PrototypeAST *P = ParseExtern()) { - if (Function *F = P->Codegen()) { - fprintf(stderr, "Read extern: "); - F->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read top-level expression:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// "Library" functions that can be "extern'd" from user code. -//===----------------------------------------------------------------------===// - -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - LLVMContext &Context = getGlobalContext(); - - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Make the module, which holds all the code. - TheModule = new Module("my cool jit", Context); - - // Run the main "interpreter loop" now. - MainLoop(); - - // Print out all of the generated code. - TheModule->dump(); - - return 0; -} -</pre> -</div> -<a href="LangImpl4.html">Next: Adding JIT and Optimizer Support</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl4.html b/docs/tutorial/LangImpl4.html deleted file mode 100644 index 230e6e5..0000000 --- a/docs/tutorial/LangImpl4.html +++ /dev/null @@ -1,1132 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Adding JIT and Optimizer Support</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Adding JIT and Optimizer Support</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 4 - <ol> - <li><a href="#intro">Chapter 4 Introduction</a></li> - <li><a href="#trivialconstfold">Trivial Constant Folding</a></li> - <li><a href="#optimizerpasses">LLVM Optimization Passes</a></li> - <li><a href="#jit">Adding a JIT Compiler</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl5.html">Chapter 5</a>: Extending the Language: Control -Flow</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 4 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 4 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. Chapters 1-3 described the implementation of a simple -language and added support for generating LLVM IR. This chapter describes -two new techniques: adding optimizer support to your language, and adding JIT -compiler support. These additions will demonstrate how to get nice, efficient code -for the Kaleidoscope language.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="trivialconstfold">Trivial Constant -Folding</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Our demonstration for Chapter 3 is elegant and easy to extend. Unfortunately, -it does not produce wonderful code. The IRBuilder, however, does give us -obvious optimizations when compiling simple code:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) 1+2+x;</b> -Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 3.000000e+00, %x - ret double %addtmp -} -</pre> -</div> - -<p>This code is not a literal transcription of the AST built by parsing the -input. That would be: - -<div class="doc_code"> -<pre> -ready> <b>def test(x) 1+2+x;</b> -Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 2.000000e+00, 1.000000e+00 - %addtmp1 = fadd double %addtmp, %x - ret double %addtmp1 -} -</pre> -</div> - -<p>Constant folding, as seen above, in particular, is a very common and very -important optimization: so much so that many language implementors implement -constant folding support in their AST representation.</p> - -<p>With LLVM, you don't need this support in the AST. Since all calls to build -LLVM IR go through the LLVM IR builder, the builder itself checked to see if -there was a constant folding opportunity when you call it. If so, it just does -the constant fold and return the constant instead of creating an instruction. - -<p>Well, that was easy :). In practice, we recommend always using -<tt>IRBuilder</tt> when generating code like this. It has no -"syntactic overhead" for its use (you don't have to uglify your compiler with -constant checks everywhere) and it can dramatically reduce the amount of -LLVM IR that is generated in some cases (particular for languages with a macro -preprocessor or that use a lot of constants).</p> - -<p>On the other hand, the <tt>IRBuilder</tt> is limited by the fact -that it does all of its analysis inline with the code as it is built. If you -take a slightly more complex example:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) (1+2+x)*(x+(1+2));</b> -ready> Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 3.000000e+00, %x - %addtmp1 = fadd double %x, 3.000000e+00 - %multmp = fmul double %addtmp, %addtmp1 - ret double %multmp -} -</pre> -</div> - -<p>In this case, the LHS and RHS of the multiplication are the same value. We'd -really like to see this generate "<tt>tmp = x+3; result = tmp*tmp;</tt>" instead -of computing "<tt>x+3</tt>" twice.</p> - -<p>Unfortunately, no amount of local analysis will be able to detect and correct -this. This requires two transformations: reassociation of expressions (to -make the add's lexically identical) and Common Subexpression Elimination (CSE) -to delete the redundant add instruction. Fortunately, LLVM provides a broad -range of optimizations that you can use, in the form of "passes".</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="optimizerpasses">LLVM Optimization - Passes</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>LLVM provides many optimization passes, which do many different sorts of -things and have different tradeoffs. Unlike other systems, LLVM doesn't hold -to the mistaken notion that one set of optimizations is right for all languages -and for all situations. LLVM allows a compiler implementor to make complete -decisions about what optimizations to use, in which order, and in what -situation.</p> - -<p>As a concrete example, LLVM supports both "whole module" passes, which look -across as large of body of code as they can (often a whole file, but if run -at link time, this can be a substantial portion of the whole program). It also -supports and includes "per-function" passes which just operate on a single -function at a time, without looking at other functions. For more information -on passes and how they are run, see the <a href="../WritingAnLLVMPass.html">How -to Write a Pass</a> document and the <a href="../Passes.html">List of LLVM -Passes</a>.</p> - -<p>For Kaleidoscope, we are currently generating functions on the fly, one at -a time, as the user types them in. We aren't shooting for the ultimate -optimization experience in this setting, but we also want to catch the easy and -quick stuff where possible. As such, we will choose to run a few per-function -optimizations as the user types the function in. If we wanted to make a "static -Kaleidoscope compiler", we would use exactly the code we have now, except that -we would defer running the optimizer until the entire file has been parsed.</p> - -<p>In order to get per-function optimizations going, we need to set up a -<a href="../WritingAnLLVMPass.html#passmanager">FunctionPassManager</a> to hold and -organize the LLVM optimizations that we want to run. Once we have that, we can -add a set of optimizations to run. The code looks like this:</p> - -<div class="doc_code"> -<pre> - FunctionPassManager OurFPM(TheModule); - - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); - // Eliminate Common SubExpressions. - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - TheFPM = &OurFPM; - - // Run the main "interpreter loop" now. - MainLoop(); -</pre> -</div> - -<p>This code defines a <tt>FunctionPassManager</tt>, "<tt>OurFPM</tt>". It -requires a pointer to the <tt>Module</tt> to construct itself. Once it is set -up, we use a series of "add" calls to add a bunch of LLVM passes. The first -pass is basically boilerplate, it adds a pass so that later optimizations know -how the data structures in the program are laid out. The -"<tt>TheExecutionEngine</tt>" variable is related to the JIT, which we will get -to in the next section.</p> - -<p>In this case, we choose to add 4 optimization passes. The passes we chose -here are a pretty standard set of "cleanup" optimizations that are useful for -a wide variety of code. I won't delve into what they do but, believe me, -they are a good starting place :).</p> - -<p>Once the PassManager is set up, we need to make use of it. We do this by -running it after our newly created function is constructed (in -<tt>FunctionAST::Codegen</tt>), but before it is returned to the client:</p> - -<div class="doc_code"> -<pre> - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - <b>// Optimize the function. - TheFPM->run(*TheFunction);</b> - - return TheFunction; - } -</pre> -</div> - -<p>As you can see, this is pretty straightforward. The -<tt>FunctionPassManager</tt> optimizes and updates the LLVM Function* in place, -improving (hopefully) its body. With this in place, we can try our test above -again:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) (1+2+x)*(x+(1+2));</b> -ready> Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double %x, 3.000000e+00 - %multmp = fmul double %addtmp, %addtmp - ret double %multmp -} -</pre> -</div> - -<p>As expected, we now get our nicely optimized code, saving a floating point -add instruction from every execution of this function.</p> - -<p>LLVM provides a wide variety of optimizations that can be used in certain -circumstances. Some <a href="../Passes.html">documentation about the various -passes</a> is available, but it isn't very complete. Another good source of -ideas can come from looking at the passes that <tt>llvm-gcc</tt> or -<tt>llvm-ld</tt> run to get started. The "<tt>opt</tt>" tool allows you to -experiment with passes from the command line, so you can see if they do -anything.</p> - -<p>Now that we have reasonable code coming out of our front-end, lets talk about -executing it!</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="jit">Adding a JIT Compiler</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Code that is available in LLVM IR can have a wide variety of tools -applied to it. For example, you can run optimizations on it (as we did above), -you can dump it out in textual or binary forms, you can compile the code to an -assembly file (.s) for some target, or you can JIT compile it. The nice thing -about the LLVM IR representation is that it is the "common currency" between -many different parts of the compiler. -</p> - -<p>In this section, we'll add JIT compiler support to our interpreter. The -basic idea that we want for Kaleidoscope is to have the user enter function -bodies as they do now, but immediately evaluate the top-level expressions they -type in. For example, if they type in "1 + 2;", we should evaluate and print -out 3. If they define a function, they should be able to call it from the -command line.</p> - -<p>In order to do this, we first declare and initialize the JIT. This is done -by adding a global variable and a call in <tt>main</tt>:</p> - -<div class="doc_code"> -<pre> -<b>static ExecutionEngine *TheExecutionEngine;</b> -... -int main() { - .. - <b>// Create the JIT. This takes ownership of the module. - TheExecutionEngine = EngineBuilder(TheModule).create();</b> - .. -} -</pre> -</div> - -<p>This creates an abstract "Execution Engine" which can be either a JIT -compiler or the LLVM interpreter. LLVM will automatically pick a JIT compiler -for you if one is available for your platform, otherwise it will fall back to -the interpreter.</p> - -<p>Once the <tt>ExecutionEngine</tt> is created, the JIT is ready to be used. -There are a variety of APIs that are useful, but the simplest one is the -"<tt>getPointerToFunction(F)</tt>" method. This method JIT compiles the -specified LLVM Function and returns a function pointer to the generated machine -code. In our case, this means that we can change the code that parses a -top-level expression to look like this:</p> - -<div class="doc_code"> -<pre> -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - LF->dump(); // Dump the function for exposition purposes. - - <b>// JIT the function, returning a function pointer. - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); - - // Cast it to the right type (takes no arguments, returns a double) so we - // can call it as a native function. - double (*FP)() = (double (*)())(intptr_t)FPtr; - fprintf(stderr, "Evaluated to %f\n", FP());</b> - } -</pre> -</div> - -<p>Recall that we compile top-level expressions into a self-contained LLVM -function that takes no arguments and returns the computed double. Because the -LLVM JIT compiler matches the native platform ABI, this means that you can just -cast the result pointer to a function pointer of that type and call it directly. -This means, there is no difference between JIT compiled code and native machine -code that is statically linked into your application.</p> - -<p>With just these two changes, lets see how Kaleidoscope works now!</p> - -<div class="doc_code"> -<pre> -ready> <b>4+5;</b> -define double @""() { -entry: - ret double 9.000000e+00 -} - -<em>Evaluated to 9.000000</em> -</pre> -</div> - -<p>Well this looks like it is basically working. The dump of the function -shows the "no argument function that always returns double" that we synthesize -for each top-level expression that is typed in. This demonstrates very basic -functionality, but can we do more?</p> - -<div class="doc_code"> -<pre> -ready> <b>def testfunc(x y) x + y*2; </b> -Read function definition: -define double @testfunc(double %x, double %y) { -entry: - %multmp = fmul double %y, 2.000000e+00 - %addtmp = fadd double %multmp, %x - ret double %addtmp -} - -ready> <b>testfunc(4, 10);</b> -define double @""() { -entry: - %calltmp = call double @testfunc( double 4.000000e+00, double 1.000000e+01 ) - ret double %calltmp -} - -<em>Evaluated to 24.000000</em> -</pre> -</div> - -<p>This illustrates that we can now call user code, but there is something a bit -subtle going on here. Note that we only invoke the JIT on the anonymous -functions that <em>call testfunc</em>, but we never invoked it -on <em>testfunc</em> itself. What actually happened here is that the JIT -scanned for all non-JIT'd functions transitively called from the anonymous -function and compiled all of them before returning -from <tt>getPointerToFunction()</tt>.</p> - -<p>The JIT provides a number of other more advanced interfaces for things like -freeing allocated machine code, rejit'ing functions to update them, etc. -However, even with this simple code, we get some surprisingly powerful -capabilities - check this out (I removed the dump of the anonymous functions, -you should get the idea by now :) :</p> - -<div class="doc_code"> -<pre> -ready> <b>extern sin(x);</b> -Read extern: -declare double @sin(double) - -ready> <b>extern cos(x);</b> -Read extern: -declare double @cos(double) - -ready> <b>sin(1.0);</b> -<em>Evaluated to 0.841471</em> - -ready> <b>def foo(x) sin(x)*sin(x) + cos(x)*cos(x);</b> -Read function definition: -define double @foo(double %x) { -entry: - %calltmp = call double @sin( double %x ) - %multmp = fmul double %calltmp, %calltmp - %calltmp2 = call double @cos( double %x ) - %multmp4 = fmul double %calltmp2, %calltmp2 - %addtmp = fadd double %multmp, %multmp4 - ret double %addtmp -} - -ready> <b>foo(4.0);</b> -<em>Evaluated to 1.000000</em> -</pre> -</div> - -<p>Whoa, how does the JIT know about sin and cos? The answer is surprisingly -simple: in this -example, the JIT started execution of a function and got to a function call. It -realized that the function was not yet JIT compiled and invoked the standard set -of routines to resolve the function. In this case, there is no body defined -for the function, so the JIT ended up calling "<tt>dlsym("sin")</tt>" on the -Kaleidoscope process itself. -Since "<tt>sin</tt>" is defined within the JIT's address space, it simply -patches up calls in the module to call the libm version of <tt>sin</tt> -directly.</p> - -<p>The LLVM JIT provides a number of interfaces (look in the -<tt>ExecutionEngine.h</tt> file) for controlling how unknown functions get -resolved. It allows you to establish explicit mappings between IR objects and -addresses (useful for LLVM global variables that you want to map to static -tables, for example), allows you to dynamically decide on the fly based on the -function name, and even allows you to have the JIT compile functions lazily the -first time they're called.</p> - -<p>One interesting application of this is that we can now extend the language -by writing arbitrary C++ code to implement operations. For example, if we add: -</p> - -<div class="doc_code"> -<pre> -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} -</pre> -</div> - -<p>Now we can produce simple output to the console by using things like: -"<tt>extern putchard(x); putchard(120);</tt>", which prints a lowercase 'x' on -the console (120 is the ASCII code for 'x'). Similar code could be used to -implement file I/O, console input, and many other capabilities in -Kaleidoscope.</p> - -<p>This completes the JIT and optimizer chapter of the Kaleidoscope tutorial. At -this point, we can compile a non-Turing-complete programming language, optimize -and JIT compile it in a user-driven way. Next up we'll look into <a -href="LangImpl5.html">extending the language with control flow constructs</a>, -tackling some interesting LLVM IR issues along the way.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for our running example, enhanced with the -LLVM JIT and optimizer. To build this example, use: -</p> - -<div class="doc_code"> -<pre> - # Compile - g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy - # Run - ./toy -</pre> -</div> - -<p> -If you are compiling this on Linux, make sure to add the "-rdynamic" option -as well. This makes sure that the external functions are resolved properly -at runtime.</p> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -#include "llvm/DerivedTypes.h" -#include "llvm/ExecutionEngine/ExecutionEngine.h" -#include "llvm/ExecutionEngine/JIT.h" -#include "llvm/LLVMContext.h" -#include "llvm/Module.h" -#include "llvm/PassManager.h" -#include "llvm/Analysis/Verifier.h" -#include "llvm/Target/TargetData.h" -#include "llvm/Target/TargetSelect.h" -#include "llvm/Transforms/Scalar.h" -#include "llvm/Support/IRBuilder.h" -#include <cstdio> -#include <string> -#include <map> -#include <vector> -using namespace llvm; - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - virtual Value *Codegen() = 0; -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - virtual Value *Codegen(); -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} - virtual Value *Codegen(); -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} - virtual Value *Codegen(); -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} - virtual Value *Codegen(); -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes). -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args) - : Name(name), Args(args) {} - - Function *Codegen(); -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - - Function *Codegen(); -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - } -} - -/// binoprhs -/// ::= ('+' primary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the primary expression after the binary operator. - ExprAST *RHS = ParsePrimary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= primary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParsePrimary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -static PrototypeAST *ParsePrototype() { - if (CurTok != tok_identifier) - return ErrorP("Expected function name in prototype"); - - std::string FnName = IdentifierStr; - getNextToken(); - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - return new PrototypeAST(FnName, ArgNames); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Code Generation -//===----------------------------------------------------------------------===// - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, Value*> NamedValues; -static FunctionPassManager *TheFPM; - -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} - -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - return V ? V : ErrorV("Unknown variable name"); -} - -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateAdd(L, R, "addtmp"); - case '-': return Builder.CreateSub(L, R, "subtmp"); - case '*': return Builder.CreateMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: return ErrorV("invalid binary operator"); - } -} - -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); -} - -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<const Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); - - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); - - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } - - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) { - AI->setName(Args[Idx]); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = AI; - } - - return F; -} - -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - // Optimize the function. - TheFPM->run(*TheFunction); - - return TheFunction; - } - - // Error reading body, remove function. - TheFunction->eraseFromParent(); - return 0; -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing and JIT Driver -//===----------------------------------------------------------------------===// - -static ExecutionEngine *TheExecutionEngine; - -static void HandleDefinition() { - if (FunctionAST *F = ParseDefinition()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read function definition:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (PrototypeAST *P = ParseExtern()) { - if (Function *F = P->Codegen()) { - fprintf(stderr, "Read extern: "); - F->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - // JIT the function, returning a function pointer. - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); - - // Cast it to the right type (takes no arguments, returns a double) so we - // can call it as a native function. - double (*FP)() = (double (*)())(intptr_t)FPtr; - fprintf(stderr, "Evaluated to %f\n", FP()); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// "Library" functions that can be "extern'd" from user code. -//===----------------------------------------------------------------------===// - -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - InitializeNativeTarget(); - LLVMContext &Context = getGlobalContext(); - - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Make the module, which holds all the code. - TheModule = new Module("my cool jit", Context); - - // Create the JIT. This takes ownership of the module. - std::string ErrStr; - TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); - if (!TheExecutionEngine) { - fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); - exit(1); - } - - FunctionPassManager OurFPM(TheModule); - - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); - // Eliminate Common SubExpressions. - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - TheFPM = &OurFPM; - - // Run the main "interpreter loop" now. - MainLoop(); - - TheFPM = 0; - - // Print out all of the generated code. - TheModule->dump(); - - return 0; -} -</pre> -</div> - -<a href="LangImpl5.html">Next: Extending the language: control flow</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl5-cfg.png b/docs/tutorial/LangImpl5-cfg.png Binary files differdeleted file mode 100644 index cdba92f..0000000 --- a/docs/tutorial/LangImpl5-cfg.png +++ /dev/null diff --git a/docs/tutorial/LangImpl5.html b/docs/tutorial/LangImpl5.html deleted file mode 100644 index 7136351..0000000 --- a/docs/tutorial/LangImpl5.html +++ /dev/null @@ -1,1777 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: Control Flow</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 5 - <ol> - <li><a href="#intro">Chapter 5 Introduction</a></li> - <li><a href="#ifthen">If/Then/Else</a> - <ol> - <li><a href="#iflexer">Lexer Extensions</a></li> - <li><a href="#ifast">AST Extensions</a></li> - <li><a href="#ifparser">Parser Extensions</a></li> - <li><a href="#ifir">LLVM IR</a></li> - <li><a href="#ifcodegen">Code Generation</a></li> - </ol> - </li> - <li><a href="#for">'for' Loop Expression</a> - <ol> - <li><a href="#forlexer">Lexer Extensions</a></li> - <li><a href="#forast">AST Extensions</a></li> - <li><a href="#forparser">Parser Extensions</a></li> - <li><a href="#forir">LLVM IR</a></li> - <li><a href="#forcodegen">Code Generation</a></li> - </ol> - </li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl6.html">Chapter 6</a>: Extending the Language: -User-defined Operators</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 5 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple -Kaleidoscope language and included support for generating LLVM IR, followed by -optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is -mostly useless: it has no control flow other than call and return. This means -that you can't have conditional branches in the code, significantly limiting its -power. In this episode of "build that compiler", we'll extend Kaleidoscope to -have an if/then/else expression plus a simple 'for' loop.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="ifthen">If/Then/Else</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Extending Kaleidoscope to support if/then/else is quite straightforward. It -basically requires adding lexer support for this "new" concept to the lexer, -parser, AST, and LLVM code emitter. This example is nice, because it shows how -easy it is to "grow" a language over time, incrementally extending it as new -ideas are discovered.</p> - -<p>Before we get going on "how" we add this extension, lets talk about "what" we -want. The basic idea is that we want to be able to write this sort of thing: -</p> - -<div class="doc_code"> -<pre> -def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2); -</pre> -</div> - -<p>In Kaleidoscope, every construct is an expression: there are no statements. -As such, the if/then/else expression needs to return a value like any other. -Since we're using a mostly functional form, we'll have it evaluate its -conditional, then return the 'then' or 'else' value based on how the condition -was resolved. This is very similar to the C "?:" expression.</p> - -<p>The semantics of the if/then/else expression is that it evaluates the -condition to a boolean equality value: 0.0 is considered to be false and -everything else is considered to be true. -If the condition is true, the first subexpression is evaluated and returned, if -the condition is false, the second subexpression is evaluated and returned. -Since Kaleidoscope allows side-effects, this behavior is important to nail down. -</p> - -<p>Now that we know what we "want", lets break this down into its constituent -pieces.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for -If/Then/Else</a></div> -<!-- ======================================================================= --> - - -<div class="doc_text"> - -<p>The lexer extensions are straightforward. First we add new enum values -for the relevant tokens:</p> - -<div class="doc_code"> -<pre> - // control - tok_if = -6, tok_then = -7, tok_else = -8, -</pre> -</div> - -<p>Once we have that, we recognize the new keywords in the lexer. This is pretty simple -stuff:</p> - -<div class="doc_code"> -<pre> - ... - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - <b>if (IdentifierStr == "if") return tok_if; - if (IdentifierStr == "then") return tok_then; - if (IdentifierStr == "else") return tok_else;</b> - return tok_identifier; -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="ifast">AST Extensions for - If/Then/Else</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>To represent the new expression we add a new AST node for it:</p> - -<div class="doc_code"> -<pre> -/// IfExprAST - Expression class for if/then/else. -class IfExprAST : public ExprAST { - ExprAST *Cond, *Then, *Else; -public: - IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) - : Cond(cond), Then(then), Else(_else) {} - virtual Value *Codegen(); -}; -</pre> -</div> - -<p>The AST node just has pointers to the various subexpressions.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="ifparser">Parser Extensions for -If/Then/Else</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>Now that we have the relevant tokens coming from the lexer and we have the -AST node to build, our parsing logic is relatively straightforward. First we -define a new parsing function:</p> - -<div class="doc_code"> -<pre> -/// ifexpr ::= 'if' expression 'then' expression 'else' expression -static ExprAST *ParseIfExpr() { - getNextToken(); // eat the if. - - // condition. - ExprAST *Cond = ParseExpression(); - if (!Cond) return 0; - - if (CurTok != tok_then) - return Error("expected then"); - getNextToken(); // eat the then - - ExprAST *Then = ParseExpression(); - if (Then == 0) return 0; - - if (CurTok != tok_else) - return Error("expected else"); - - getNextToken(); - - ExprAST *Else = ParseExpression(); - if (!Else) return 0; - - return new IfExprAST(Cond, Then, Else); -} -</pre> -</div> - -<p>Next we hook it up as a primary expression:</p> - -<div class="doc_code"> -<pre> -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - <b>case tok_if: return ParseIfExpr();</b> - } -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>Now that we have it parsing and building the AST, the final piece is adding -LLVM code generation support. This is the most interesting part of the -if/then/else example, because this is where it starts to introduce new concepts. -All of the code above has been thoroughly described in previous chapters. -</p> - -<p>To motivate the code we want to produce, lets take a look at a simple -example. Consider:</p> - -<div class="doc_code"> -<pre> -extern foo(); -extern bar(); -def baz(x) if x then foo() else bar(); -</pre> -</div> - -<p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope -looks like this:</p> - -<div class="doc_code"> -<pre> -declare double @foo() - -declare double @bar() - -define double @baz(double %x) { -entry: - %ifcond = fcmp one double %x, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: ; preds = %entry - %calltmp = call double @foo() - br label %ifcont - -else: ; preds = %entry - %calltmp1 = call double @bar() - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>To visualize the control flow graph, you can use a nifty feature of the LLVM -'<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR -into "t.ll" and run "<tt>llvm-as < t.ll | opt -analyze -view-cfg</tt>", <a -href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll -see this graph:</p> - -<div style="text-align: center"><img src="LangImpl5-cfg.png" alt="Example CFG" width="423" -height="315"></div> - -<p>Another way to get this is to call "<tt>F->viewCFG()</tt>" or -"<tt>F->viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by -inserting actual calls into the code and recompiling or by calling these in the -debugger. LLVM has many nice features for visualizing various graphs.</p> - -<p>Getting back to the generated code, it is fairly simple: the entry block -evaluates the conditional expression ("x" in our case here) and compares the -result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>" -instruction ('one' is "Ordered and Not Equal"). Based on the result of this -expression, the code jumps to either the "then" or "else" blocks, which contain -the expressions for the true/false cases.</p> - -<p>Once the then/else blocks are finished executing, they both branch back to the -'ifcont' block to execute the code that happens after the if/then/else. In this -case the only thing left to do is to return to the caller of the function. The -question then becomes: how does the code know which expression to return?</p> - -<p>The answer to this question involves an important SSA operation: the -<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi -operation</a>. If you're not familiar with SSA, <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia -article</a> is a good introduction and there are various other introductions to -it available on your favorite search engine. The short version is that -"execution" of the Phi operation requires "remembering" which block control came -from. The Phi operation takes on the value corresponding to the input control -block. In this case, if control comes in from the "then" block, it gets the -value of "calltmp". If control comes from the "else" block, it gets the value -of "calltmp1".</p> - -<p>At this point, you are probably starting to think "Oh no! This means my -simple and elegant front-end will have to start generating SSA form in order to -use LLVM!". Fortunately, this is not the case, and we strongly advise -<em>not</em> implementing an SSA construction algorithm in your front-end -unless there is an amazingly good reason to do so. In practice, there are two -sorts of values that float around in code written for your average imperative -programming language that might need Phi nodes:</p> - -<ol> -<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li> -<li>Values that are implicit in the structure of your AST, such as the Phi node -in this case.</li> -</ol> - -<p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable -variables"), we'll talk about #1 -in depth. For now, just believe me that you don't need SSA construction to -handle this case. For #2, you have the choice of using the techniques that we will -describe for #1, or you can insert Phi nodes directly, if convenient. In this -case, it is really really easy to generate the Phi node, so we choose to do it -directly.</p> - -<p>Okay, enough of the motivation and overview, lets generate code!</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="ifcodegen">Code Generation for -If/Then/Else</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>In order to generate code for this, we implement the <tt>Codegen</tt> method -for <tt>IfExprAST</tt>:</p> - -<div class="doc_code"> -<pre> -Value *IfExprAST::Codegen() { - Value *CondV = Cond->Codegen(); - if (CondV == 0) return 0; - - // Convert condition to a bool by comparing equal to 0.0. - CondV = Builder.CreateFCmpONE(CondV, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "ifcond"); -</pre> -</div> - -<p>This code is straightforward and similar to what we saw before. We emit the -expression for the condition, then compare that value to zero to get a truth -value as a 1-bit (bool) value.</p> - -<div class="doc_code"> -<pre> - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Create blocks for the then and else cases. Insert the 'then' block at the - // end of the function. - BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); - BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); - BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); - - Builder.CreateCondBr(CondV, ThenBB, ElseBB); -</pre> -</div> - -<p>This code creates the basic blocks that are related to the if/then/else -statement, and correspond directly to the blocks in the example above. The -first line gets the current Function object that is being built. It -gets this by asking the builder for the current BasicBlock, and asking that -block for its "parent" (the function it is currently embedded into).</p> - -<p>Once it has that, it creates three blocks. Note that it passes "TheFunction" -into the constructor for the "then" block. This causes the constructor to -automatically insert the new block into the end of the specified function. The -other two blocks are created, but aren't yet inserted into the function.</p> - -<p>Once the blocks are created, we can emit the conditional branch that chooses -between them. Note that creating new blocks does not implicitly affect the -IRBuilder, so it is still inserting into the block that the condition -went into. Also note that it is creating a branch to the "then" block and the -"else" block, even though the "else" block isn't inserted into the function yet. -This is all ok: it is the standard way that LLVM supports forward -references.</p> - -<div class="doc_code"> -<pre> - // Emit then value. - Builder.SetInsertPoint(ThenBB); - - Value *ThenV = Then->Codegen(); - if (ThenV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Then' can change the current block, update ThenBB for the PHI. - ThenBB = Builder.GetInsertBlock(); -</pre> -</div> - -<p>After the conditional branch is inserted, we move the builder to start -inserting into the "then" block. Strictly speaking, this call moves the -insertion point to be at the end of the specified block. However, since the -"then" block is empty, it also starts out by inserting at the beginning of the -block. :)</p> - -<p>Once the insertion point is set, we recursively codegen the "then" expression -from the AST. To finish off the "then" block, we create an unconditional branch -to the merge block. One interesting (and very important) aspect of the LLVM IR -is that it <a href="../LangRef.html#functionstructure">requires all basic blocks -to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow -instruction</a> such as return or branch. This means that all control flow, -<em>including fall throughs</em> must be made explicit in the LLVM IR. If you -violate this rule, the verifier will emit an error.</p> - -<p>The final line here is quite subtle, but is very important. The basic issue -is that when we create the Phi node in the merge block, we need to set up the -block/value pairs that indicate how the Phi will work. Importantly, the Phi -node expects to have an entry for each predecessor of the block in the CFG. Why -then, are we getting the current block when we just set it to ThenBB 5 lines -above? The problem is that the "Then" expression may actually itself change the -block that the Builder is emitting into if, for example, it contains a nested -"if/then/else" expression. Because calling Codegen recursively could -arbitrarily change the notion of the current block, we are required to get an -up-to-date value for code that will set up the Phi node.</p> - -<div class="doc_code"> -<pre> - // Emit else block. - TheFunction->getBasicBlockList().push_back(ElseBB); - Builder.SetInsertPoint(ElseBB); - - Value *ElseV = Else->Codegen(); - if (ElseV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Else' can change the current block, update ElseBB for the PHI. - ElseBB = Builder.GetInsertBlock(); -</pre> -</div> - -<p>Code generation for the 'else' block is basically identical to codegen for -the 'then' block. The only significant difference is the first line, which adds -the 'else' block to the function. Recall previously that the 'else' block was -created, but not added to the function. Now that the 'then' and 'else' blocks -are emitted, we can finish up with the merge code:</p> - -<div class="doc_code"> -<pre> - // Emit merge block. - TheFunction->getBasicBlockList().push_back(MergeBB); - Builder.SetInsertPoint(MergeBB); - PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), - "iftmp"); - - PN->addIncoming(ThenV, ThenBB); - PN->addIncoming(ElseV, ElseBB); - return PN; -} -</pre> -</div> - -<p>The first two lines here are now familiar: the first adds the "merge" block -to the Function object (it was previously floating, like the else block above). -The second block changes the insertion point so that newly created code will go -into the "merge" block. Once that is done, we need to create the PHI node and -set up the block/value pairs for the PHI.</p> - -<p>Finally, the CodeGen function returns the phi node as the value computed by -the if/then/else expression. In our example above, this returned value will -feed into the code for the top-level function, which will create the return -instruction.</p> - -<p>Overall, we now have the ability to execute conditional code in -Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language -that can calculate a wide variety of numeric functions. Next up we'll add -another useful expression that is familiar from non-functional languages...</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="for">'for' Loop Expression</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Now that we know how to add basic control flow constructs to the language, -we have the tools to add more powerful things. Lets add something more -aggressive, a 'for' expression:</p> - -<div class="doc_code"> -<pre> - extern putchard(char) - def printstar(n) - for i = 1, i < n, 1.0 in - putchard(42); # ascii 42 = '*' - - # print 100 '*' characters - printstar(100); -</pre> -</div> - -<p>This expression defines a new variable ("i" in this case) which iterates from -a starting value, while the condition ("i < n" in this case) is true, -incrementing by an optional step value ("1.0" in this case). If the step value -is omitted, it defaults to 1.0. While the loop is true, it executes its -body expression. Because we don't have anything better to return, we'll just -define the loop as always returning 0.0. In the future when we have mutable -variables, it will get more useful.</p> - -<p>As before, lets talk about the changes that we need to Kaleidoscope to -support this.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for -the 'for' Loop</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>The lexer extensions are the same sort of thing as for if/then/else:</p> - -<div class="doc_code"> -<pre> - ... in enum Token ... - // control - tok_if = -6, tok_then = -7, tok_else = -8, -<b> tok_for = -9, tok_in = -10</b> - - ... in gettok ... - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - if (IdentifierStr == "if") return tok_if; - if (IdentifierStr == "then") return tok_then; - if (IdentifierStr == "else") return tok_else; - <b>if (IdentifierStr == "for") return tok_for; - if (IdentifierStr == "in") return tok_in;</b> - return tok_identifier; -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="forast">AST Extensions for -the 'for' Loop</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>The AST node is just as simple. It basically boils down to capturing -the variable name and the constituent expressions in the node.</p> - -<div class="doc_code"> -<pre> -/// ForExprAST - Expression class for for/in. -class ForExprAST : public ExprAST { - std::string VarName; - ExprAST *Start, *End, *Step, *Body; -public: - ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, - ExprAST *step, ExprAST *body) - : VarName(varname), Start(start), End(end), Step(step), Body(body) {} - virtual Value *Codegen(); -}; -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="forparser">Parser Extensions for -the 'for' Loop</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>The parser code is also fairly standard. The only interesting thing here is -handling of the optional step value. The parser code handles it by checking to -see if the second comma is present. If not, it sets the step value to null in -the AST node:</p> - -<div class="doc_code"> -<pre> -/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression -static ExprAST *ParseForExpr() { - getNextToken(); // eat the for. - - if (CurTok != tok_identifier) - return Error("expected identifier after for"); - - std::string IdName = IdentifierStr; - getNextToken(); // eat identifier. - - if (CurTok != '=') - return Error("expected '=' after for"); - getNextToken(); // eat '='. - - - ExprAST *Start = ParseExpression(); - if (Start == 0) return 0; - if (CurTok != ',') - return Error("expected ',' after for start value"); - getNextToken(); - - ExprAST *End = ParseExpression(); - if (End == 0) return 0; - - // The step value is optional. - ExprAST *Step = 0; - if (CurTok == ',') { - getNextToken(); - Step = ParseExpression(); - if (Step == 0) return 0; - } - - if (CurTok != tok_in) - return Error("expected 'in' after for"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new ForExprAST(IdName, Start, End, Step, Body); -} -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="forir">LLVM IR for -the 'for' Loop</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>Now we get to the good part: the LLVM IR we want to generate for this thing. -With the simple example above, we get this LLVM IR (note that this dump is -generated with optimizations disabled for clarity): -</p> - -<div class="doc_code"> -<pre> -declare double @putchard(double) - -define double @printstar(double %n) { -entry: - ; initial value = 1.0 (inlined into phi) - br label %loop - -loop: ; preds = %loop, %entry - %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] - ; body - %calltmp = call double @putchard( double 4.200000e+01 ) - ; increment - %nextvar = fadd double %i, 1.000000e+00 - - ; termination test - %cmptmp = fcmp ult double %i, %n - %booltmp = uitofp i1 %cmptmp to double - %loopcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %loopcond, label %loop, label %afterloop - -afterloop: ; preds = %loop - ; loop always returns 0.0 - ret double 0.000000e+00 -} -</pre> -</div> - -<p>This loop contains all the same constructs we saw before: a phi node, several -expressions, and some basic blocks. Lets see how this fits together.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="forcodegen">Code Generation for -the 'for' Loop</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>The first part of Codegen is very simple: we just output the start expression -for the loop value:</p> - -<div class="doc_code"> -<pre> -Value *ForExprAST::Codegen() { - // Emit the start code first, without 'variable' in scope. - Value *StartVal = Start->Codegen(); - if (StartVal == 0) return 0; -</pre> -</div> - -<p>With this out of the way, the next step is to set up the LLVM basic block -for the start of the loop body. In the case above, the whole loop body is one -block, but remember that the body code itself could consist of multiple blocks -(e.g. if it contains an if/then/else or a for/in expression).</p> - -<div class="doc_code"> -<pre> - // Make the new basic block for the loop header, inserting after current - // block. - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - BasicBlock *PreheaderBB = Builder.GetInsertBlock(); - BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); - - // Insert an explicit fall through from the current block to the LoopBB. - Builder.CreateBr(LoopBB); -</pre> -</div> - -<p>This code is similar to what we saw for if/then/else. Because we will need -it to create the Phi node, we remember the block that falls through into the -loop. Once we have that, we create the actual block that starts the loop and -create an unconditional branch for the fall-through between the two blocks.</p> - -<div class="doc_code"> -<pre> - // Start insertion in LoopBB. - Builder.SetInsertPoint(LoopBB); - - // Start the PHI node with an entry for Start. - PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), VarName.c_str()); - Variable->addIncoming(StartVal, PreheaderBB); -</pre> -</div> - -<p>Now that the "preheader" for the loop is set up, we switch to emitting code -for the loop body. To begin with, we move the insertion point and create the -PHI node for the loop induction variable. Since we already know the incoming -value for the starting value, we add it to the Phi node. Note that the Phi will -eventually get a second value for the backedge, but we can't set it up yet -(because it doesn't exist!).</p> - -<div class="doc_code"> -<pre> - // Within the loop, the variable is defined equal to the PHI node. If it - // shadows an existing variable, we have to restore it, so save it now. - Value *OldVal = NamedValues[VarName]; - NamedValues[VarName] = Variable; - - // Emit the body of the loop. This, like any other expr, can change the - // current BB. Note that we ignore the value computed by the body, but don't - // allow an error. - if (Body->Codegen() == 0) - return 0; -</pre> -</div> - -<p>Now the code starts to get more interesting. Our 'for' loop introduces a new -variable to the symbol table. This means that our symbol table can now contain -either function arguments or loop variables. To handle this, before we codegen -the body of the loop, we add the loop variable as the current value for its -name. Note that it is possible that there is a variable of the same name in the -outer scope. It would be easy to make this an error (emit an error and return -null if there is already an entry for VarName) but we choose to allow shadowing -of variables. In order to handle this correctly, we remember the Value that -we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is -no shadowed variable).</p> - -<p>Once the loop variable is set into the symbol table, the code recursively -codegen's the body. This allows the body to use the loop variable: any -references to it will naturally find it in the symbol table.</p> - -<div class="doc_code"> -<pre> - // Emit the step value. - Value *StepVal; - if (Step) { - StepVal = Step->Codegen(); - if (StepVal == 0) return 0; - } else { - // If not specified, use 1.0. - StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); - } - - Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar"); -</pre> -</div> - -<p>Now that the body is emitted, we compute the next value of the iteration -variable by adding the step value, or 1.0 if it isn't present. '<tt>NextVar</tt>' -will be the value of the loop variable on the next iteration of the loop.</p> - -<div class="doc_code"> -<pre> - // Compute the end condition. - Value *EndCond = End->Codegen(); - if (EndCond == 0) return EndCond; - - // Convert condition to a bool by comparing equal to 0.0. - EndCond = Builder.CreateFCmpONE(EndCond, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "loopcond"); -</pre> -</div> - -<p>Finally, we evaluate the exit value of the loop, to determine whether the -loop should exit. This mirrors the condition evaluation for the if/then/else -statement.</p> - -<div class="doc_code"> -<pre> - // Create the "after loop" block and insert it. - BasicBlock *LoopEndBB = Builder.GetInsertBlock(); - BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); - - // Insert the conditional branch into the end of LoopEndBB. - Builder.CreateCondBr(EndCond, LoopBB, AfterBB); - - // Any new code will be inserted in AfterBB. - Builder.SetInsertPoint(AfterBB); -</pre> -</div> - -<p>With the code for the body of the loop complete, we just need to finish up -the control flow for it. This code remembers the end block (for the phi node), then creates the block for the loop exit ("afterloop"). Based on the value of the -exit condition, it creates a conditional branch that chooses between executing -the loop again and exiting the loop. Any future code is emitted in the -"afterloop" block, so it sets the insertion position to it.</p> - -<div class="doc_code"> -<pre> - // Add a new entry to the PHI node for the backedge. - Variable->addIncoming(NextVar, LoopEndBB); - - // Restore the unshadowed variable. - if (OldVal) - NamedValues[VarName] = OldVal; - else - NamedValues.erase(VarName); - - // for expr always returns 0.0. - return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); -} -</pre> -</div> - -<p>The final code handles various cleanups: now that we have the "NextVar" -value, we can add the incoming value to the loop PHI node. After that, we -remove the loop variable from the symbol table, so that it isn't in scope after -the for loop. Finally, code generation of the for loop always returns 0.0, so -that is what we return from <tt>ForExprAST::Codegen</tt>.</p> - -<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of -the tutorial. In this chapter we added two control flow constructs, and used them to motivate a couple of aspects of the LLVM IR that are important for front-end implementors -to know. In the next chapter of our saga, we will get a bit crazier and add -<a href="LangImpl6.html">user-defined operators</a> to our poor innocent -language.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for our running example, enhanced with the -if/then/else and for expressions.. To build this example, use: -</p> - -<div class="doc_code"> -<pre> - # Compile - g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy - # Run - ./toy -</pre> -</div> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -#include "llvm/DerivedTypes.h" -#include "llvm/ExecutionEngine/ExecutionEngine.h" -#include "llvm/ExecutionEngine/JIT.h" -#include "llvm/LLVMContext.h" -#include "llvm/Module.h" -#include "llvm/PassManager.h" -#include "llvm/Analysis/Verifier.h" -#include "llvm/Target/TargetData.h" -#include "llvm/Target/TargetSelect.h" -#include "llvm/Transforms/Scalar.h" -#include "llvm/Support/IRBuilder.h" -#include <cstdio> -#include <string> -#include <map> -#include <vector> -using namespace llvm; - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5, - - // control - tok_if = -6, tok_then = -7, tok_else = -8, - tok_for = -9, tok_in = -10 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - if (IdentifierStr == "if") return tok_if; - if (IdentifierStr == "then") return tok_then; - if (IdentifierStr == "else") return tok_else; - if (IdentifierStr == "for") return tok_for; - if (IdentifierStr == "in") return tok_in; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - virtual Value *Codegen() = 0; -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - virtual Value *Codegen(); -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} - virtual Value *Codegen(); -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} - virtual Value *Codegen(); -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} - virtual Value *Codegen(); -}; - -/// IfExprAST - Expression class for if/then/else. -class IfExprAST : public ExprAST { - ExprAST *Cond, *Then, *Else; -public: - IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) - : Cond(cond), Then(then), Else(_else) {} - virtual Value *Codegen(); -}; - -/// ForExprAST - Expression class for for/in. -class ForExprAST : public ExprAST { - std::string VarName; - ExprAST *Start, *End, *Step, *Body; -public: - ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, - ExprAST *step, ExprAST *body) - : VarName(varname), Start(start), End(end), Step(step), Body(body) {} - virtual Value *Codegen(); -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes). -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args) - : Name(name), Args(args) {} - - Function *Codegen(); -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - - Function *Codegen(); -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// ifexpr ::= 'if' expression 'then' expression 'else' expression -static ExprAST *ParseIfExpr() { - getNextToken(); // eat the if. - - // condition. - ExprAST *Cond = ParseExpression(); - if (!Cond) return 0; - - if (CurTok != tok_then) - return Error("expected then"); - getNextToken(); // eat the then - - ExprAST *Then = ParseExpression(); - if (Then == 0) return 0; - - if (CurTok != tok_else) - return Error("expected else"); - - getNextToken(); - - ExprAST *Else = ParseExpression(); - if (!Else) return 0; - - return new IfExprAST(Cond, Then, Else); -} - -/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression -static ExprAST *ParseForExpr() { - getNextToken(); // eat the for. - - if (CurTok != tok_identifier) - return Error("expected identifier after for"); - - std::string IdName = IdentifierStr; - getNextToken(); // eat identifier. - - if (CurTok != '=') - return Error("expected '=' after for"); - getNextToken(); // eat '='. - - - ExprAST *Start = ParseExpression(); - if (Start == 0) return 0; - if (CurTok != ',') - return Error("expected ',' after for start value"); - getNextToken(); - - ExprAST *End = ParseExpression(); - if (End == 0) return 0; - - // The step value is optional. - ExprAST *Step = 0; - if (CurTok == ',') { - getNextToken(); - Step = ParseExpression(); - if (Step == 0) return 0; - } - - if (CurTok != tok_in) - return Error("expected 'in' after for"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new ForExprAST(IdName, Start, End, Step, Body); -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -/// ::= ifexpr -/// ::= forexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - case tok_if: return ParseIfExpr(); - case tok_for: return ParseForExpr(); - } -} - -/// binoprhs -/// ::= ('+' primary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the primary expression after the binary operator. - ExprAST *RHS = ParsePrimary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= primary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParsePrimary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -static PrototypeAST *ParsePrototype() { - if (CurTok != tok_identifier) - return ErrorP("Expected function name in prototype"); - - std::string FnName = IdentifierStr; - getNextToken(); - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - return new PrototypeAST(FnName, ArgNames); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Code Generation -//===----------------------------------------------------------------------===// - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, Value*> NamedValues; -static FunctionPassManager *TheFPM; - -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} - -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - return V ? V : ErrorV("Unknown variable name"); -} - -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateAdd(L, R, "addtmp"); - case '-': return Builder.CreateSub(L, R, "subtmp"); - case '*': return Builder.CreateMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: return ErrorV("invalid binary operator"); - } -} - -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); -} - -Value *IfExprAST::Codegen() { - Value *CondV = Cond->Codegen(); - if (CondV == 0) return 0; - - // Convert condition to a bool by comparing equal to 0.0. - CondV = Builder.CreateFCmpONE(CondV, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "ifcond"); - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Create blocks for the then and else cases. Insert the 'then' block at the - // end of the function. - BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); - BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); - BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); - - Builder.CreateCondBr(CondV, ThenBB, ElseBB); - - // Emit then value. - Builder.SetInsertPoint(ThenBB); - - Value *ThenV = Then->Codegen(); - if (ThenV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Then' can change the current block, update ThenBB for the PHI. - ThenBB = Builder.GetInsertBlock(); - - // Emit else block. - TheFunction->getBasicBlockList().push_back(ElseBB); - Builder.SetInsertPoint(ElseBB); - - Value *ElseV = Else->Codegen(); - if (ElseV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Else' can change the current block, update ElseBB for the PHI. - ElseBB = Builder.GetInsertBlock(); - - // Emit merge block. - TheFunction->getBasicBlockList().push_back(MergeBB); - Builder.SetInsertPoint(MergeBB); - PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), - "iftmp"); - - PN->addIncoming(ThenV, ThenBB); - PN->addIncoming(ElseV, ElseBB); - return PN; -} - -Value *ForExprAST::Codegen() { - // Output this as: - // ... - // start = startexpr - // goto loop - // loop: - // variable = phi [start, loopheader], [nextvariable, loopend] - // ... - // bodyexpr - // ... - // loopend: - // step = stepexpr - // nextvariable = variable + step - // endcond = endexpr - // br endcond, loop, endloop - // outloop: - - // Emit the start code first, without 'variable' in scope. - Value *StartVal = Start->Codegen(); - if (StartVal == 0) return 0; - - // Make the new basic block for the loop header, inserting after current - // block. - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - BasicBlock *PreheaderBB = Builder.GetInsertBlock(); - BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); - - // Insert an explicit fall through from the current block to the LoopBB. - Builder.CreateBr(LoopBB); - - // Start insertion in LoopBB. - Builder.SetInsertPoint(LoopBB); - - // Start the PHI node with an entry for Start. - PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), VarName.c_str()); - Variable->addIncoming(StartVal, PreheaderBB); - - // Within the loop, the variable is defined equal to the PHI node. If it - // shadows an existing variable, we have to restore it, so save it now. - Value *OldVal = NamedValues[VarName]; - NamedValues[VarName] = Variable; - - // Emit the body of the loop. This, like any other expr, can change the - // current BB. Note that we ignore the value computed by the body, but don't - // allow an error. - if (Body->Codegen() == 0) - return 0; - - // Emit the step value. - Value *StepVal; - if (Step) { - StepVal = Step->Codegen(); - if (StepVal == 0) return 0; - } else { - // If not specified, use 1.0. - StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); - } - - Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar"); - - // Compute the end condition. - Value *EndCond = End->Codegen(); - if (EndCond == 0) return EndCond; - - // Convert condition to a bool by comparing equal to 0.0. - EndCond = Builder.CreateFCmpONE(EndCond, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "loopcond"); - - // Create the "after loop" block and insert it. - BasicBlock *LoopEndBB = Builder.GetInsertBlock(); - BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); - - // Insert the conditional branch into the end of LoopEndBB. - Builder.CreateCondBr(EndCond, LoopBB, AfterBB); - - // Any new code will be inserted in AfterBB. - Builder.SetInsertPoint(AfterBB); - - // Add a new entry to the PHI node for the backedge. - Variable->addIncoming(NextVar, LoopEndBB); - - // Restore the unshadowed variable. - if (OldVal) - NamedValues[VarName] = OldVal; - else - NamedValues.erase(VarName); - - - // for expr always returns 0.0. - return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); -} - -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<const Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); - - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); - - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } - - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) { - AI->setName(Args[Idx]); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = AI; - } - - return F; -} - -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - // Optimize the function. - TheFPM->run(*TheFunction); - - return TheFunction; - } - - // Error reading body, remove function. - TheFunction->eraseFromParent(); - return 0; -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing and JIT Driver -//===----------------------------------------------------------------------===// - -static ExecutionEngine *TheExecutionEngine; - -static void HandleDefinition() { - if (FunctionAST *F = ParseDefinition()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read function definition:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (PrototypeAST *P = ParseExtern()) { - if (Function *F = P->Codegen()) { - fprintf(stderr, "Read extern: "); - F->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - // JIT the function, returning a function pointer. - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); - - // Cast it to the right type (takes no arguments, returns a double) so we - // can call it as a native function. - double (*FP)() = (double (*)())(intptr_t)FPtr; - fprintf(stderr, "Evaluated to %f\n", FP()); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// "Library" functions that can be "extern'd" from user code. -//===----------------------------------------------------------------------===// - -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - InitializeNativeTarget(); - LLVMContext &Context = getGlobalContext(); - - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Make the module, which holds all the code. - TheModule = new Module("my cool jit", Context); - - // Create the JIT. This takes ownership of the module. - std::string ErrStr; - TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); - if (!TheExecutionEngine) { - fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); - exit(1); - } - - FunctionPassManager OurFPM(TheModule); - - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); - // Eliminate Common SubExpressions. - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - TheFPM = &OurFPM; - - // Run the main "interpreter loop" now. - MainLoop(); - - TheFPM = 0; - - // Print out all of the generated code. - TheModule->dump(); - - return 0; -} -</pre> -</div> - -<a href="LangImpl6.html">Next: Extending the language: user-defined operators</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl6.html b/docs/tutorial/LangImpl6.html deleted file mode 100644 index 5fae906..0000000 --- a/docs/tutorial/LangImpl6.html +++ /dev/null @@ -1,1814 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: User-defined Operators</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Extending the Language: User-defined Operators</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 6 - <ol> - <li><a href="#intro">Chapter 6 Introduction</a></li> - <li><a href="#idea">User-defined Operators: the Idea</a></li> - <li><a href="#binary">User-defined Binary Operators</a></li> - <li><a href="#unary">User-defined Unary Operators</a></li> - <li><a href="#example">Kicking the Tires</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl7.html">Chapter 7</a>: Extending the Language: Mutable -Variables / SSA Construction</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 6 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 6 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. At this point in our tutorial, we now have a fully -functional language that is fairly minimal, but also useful. There -is still one big problem with it, however. Our language doesn't have many -useful operators (like division, logical negation, or even any comparisons -besides less-than).</p> - -<p>This chapter of the tutorial takes a wild digression into adding user-defined -operators to the simple and beautiful Kaleidoscope language. This digression now gives -us a simple and ugly language in some ways, but also a powerful one at the same time. -One of the great things about creating your own language is that you get to -decide what is good or bad. In this tutorial we'll assume that it is okay to -use this as a way to show some interesting parsing techniques.</p> - -<p>At the end of this tutorial, we'll run through an example Kaleidoscope -application that <a href="#example">renders the Mandelbrot set</a>. This gives -an example of what you can build with Kaleidoscope and its feature set.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="idea">User-defined Operators: the Idea</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -The "operator overloading" that we will add to Kaleidoscope is more general than -languages like C++. In C++, you are only allowed to redefine existing -operators: you can't programatically change the grammar, introduce new -operators, change precedence levels, etc. In this chapter, we will add this -capability to Kaleidoscope, which will let the user round out the set of -operators that are supported.</p> - -<p>The point of going into user-defined operators in a tutorial like this is to -show the power and flexibility of using a hand-written parser. Thus far, the parser -we have been implementing uses recursive descent for most parts of the grammar and -operator precedence parsing for the expressions. See <a -href="LangImpl2.html">Chapter 2</a> for details. Without using operator -precedence parsing, it would be very difficult to allow the programmer to -introduce new operators into the grammar: the grammar is dynamically extensible -as the JIT runs.</p> - -<p>The two specific features we'll add are programmable unary operators (right -now, Kaleidoscope has no unary operators at all) as well as binary operators. -An example of this is:</p> - -<div class="doc_code"> -<pre> -# Logical unary not. -def unary!(v) - if v then - 0 - else - 1; - -# Define > with the same precedence as <. -def binary> 10 (LHS RHS) - RHS < LHS; - -# Binary "logical or", (note that it does not "short circuit") -def binary| 5 (LHS RHS) - if LHS then - 1 - else if RHS then - 1 - else - 0; - -# Define = with slightly lower precedence than relationals. -def binary= 9 (LHS RHS) - !(LHS < RHS | LHS > RHS); -</pre> -</div> - -<p>Many languages aspire to being able to implement their standard runtime -library in the language itself. In Kaleidoscope, we can implement significant -parts of the language in the library!</p> - -<p>We will break down implementation of these features into two parts: -implementing support for user-defined binary operators and adding unary -operators.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="binary">User-defined Binary Operators</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Adding support for user-defined binary operators is pretty simple with our -current framework. We'll first add support for the unary/binary keywords:</p> - -<div class="doc_code"> -<pre> -enum Token { - ... - <b>// operators - tok_binary = -11, tok_unary = -12</b> -}; -... -static int gettok() { -... - if (IdentifierStr == "for") return tok_for; - if (IdentifierStr == "in") return tok_in; - <b>if (IdentifierStr == "binary") return tok_binary; - if (IdentifierStr == "unary") return tok_unary;</b> - return tok_identifier; -</pre> -</div> - -<p>This just adds lexer support for the unary and binary keywords, like we -did in <a href="LangImpl5.html#iflexer">previous chapters</a>. One nice thing -about our current AST, is that we represent binary operators with full generalisation -by using their ASCII code as the opcode. For our extended operators, we'll use this -same representation, so we don't need any new AST or parser support.</p> - -<p>On the other hand, we have to be able to represent the definitions of these -new operators, in the "def binary| 5" part of the function definition. In our -grammar so far, the "name" for the function definition is parsed as the -"prototype" production and into the <tt>PrototypeAST</tt> AST node. To -represent our new user-defined operators as prototypes, we have to extend -the <tt>PrototypeAST</tt> AST node like this:</p> - -<div class="doc_code"> -<pre> -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its argument names as well as if it is an operator. -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; - <b>bool isOperator; - unsigned Precedence; // Precedence if a binary op.</b> -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args, - <b>bool isoperator = false, unsigned prec = 0</b>) - : Name(name), Args(args), <b>isOperator(isoperator), Precedence(prec)</b> {} - - <b>bool isUnaryOp() const { return isOperator && Args.size() == 1; } - bool isBinaryOp() const { return isOperator && Args.size() == 2; } - - char getOperatorName() const { - assert(isUnaryOp() || isBinaryOp()); - return Name[Name.size()-1]; - } - - unsigned getBinaryPrecedence() const { return Precedence; }</b> - - Function *Codegen(); -}; -</pre> -</div> - -<p>Basically, in addition to knowing a name for the prototype, we now keep track -of whether it was an operator, and if it was, what precedence level the operator -is at. The precedence is only used for binary operators (as you'll see below, -it just doesn't apply for unary operators). Now that we have a way to represent -the prototype for a user-defined operator, we need to parse it:</p> - -<div class="doc_code"> -<pre> -/// prototype -/// ::= id '(' id* ')' -<b>/// ::= binary LETTER number? (id, id)</b> -static PrototypeAST *ParsePrototype() { - std::string FnName; - - <b>unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. - unsigned BinaryPrecedence = 30;</b> - - switch (CurTok) { - default: - return ErrorP("Expected function name in prototype"); - case tok_identifier: - FnName = IdentifierStr; - Kind = 0; - getNextToken(); - break; - <b>case tok_binary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected binary operator"); - FnName = "binary"; - FnName += (char)CurTok; - Kind = 2; - getNextToken(); - - // Read the precedence if present. - if (CurTok == tok_number) { - if (NumVal < 1 || NumVal > 100) - return ErrorP("Invalid precedecnce: must be 1..100"); - BinaryPrecedence = (unsigned)NumVal; - getNextToken(); - } - break;</b> - } - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - <b>// Verify right number of names for operator. - if (Kind && ArgNames.size() != Kind) - return ErrorP("Invalid number of operands for operator"); - - return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);</b> -} -</pre> -</div> - -<p>This is all fairly straightforward parsing code, and we have already seen -a lot of similar code in the past. One interesting part about the code above is -the couple lines that set up <tt>FnName</tt> for binary operators. This builds names -like "binary@" for a newly defined "@" operator. This then takes advantage of the -fact that symbol names in the LLVM symbol table are allowed to have any character in -them, including embedded nul characters.</p> - -<p>The next interesting thing to add, is codegen support for these binary operators. -Given our current structure, this is a simple addition of a default case for our -existing binary operator node:</p> - -<div class="doc_code"> -<pre> -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateAdd(L, R, "addtmp"); - case '-': return Builder.CreateSub(L, R, "subtmp"); - case '*': return Builder.CreateMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - <b>default: break;</b> - } - - <b>// If it wasn't a builtin binary operator, it must be a user defined one. Emit - // a call to it. - Function *F = TheModule->getFunction(std::string("binary")+Op); - assert(F && "binary operator not found!"); - - Value *Ops[] = { L, R }; - return Builder.CreateCall(F, Ops, Ops+2, "binop");</b> -} - -</pre> -</div> - -<p>As you can see above, the new code is actually really simple. It just does -a lookup for the appropriate operator in the symbol table and generates a -function call to it. Since user-defined operators are just built as normal -functions (because the "prototype" boils down to a function with the right -name) everything falls into place.</p> - -<p>The final piece of code we are missing, is a bit of top-level magic:</p> - -<div class="doc_code"> -<pre> -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - <b>// If this is an operator, install it. - if (Proto->isBinaryOp()) - BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();</b> - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { - ... -</pre> -</div> - -<p>Basically, before codegening a function, if it is a user-defined operator, we -register it in the precedence table. This allows the binary operator parsing -logic we already have in place to handle it. Since we are working on a fully-general operator precedence parser, this is all we need to do to "extend the grammar".</p> - -<p>Now we have useful user-defined binary operators. This builds a lot -on the previous framework we built for other operators. Adding unary operators -is a bit more challenging, because we don't have any framework for it yet - lets -see what it takes.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="unary">User-defined Unary Operators</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Since we don't currently support unary operators in the Kaleidoscope -language, we'll need to add everything to support them. Above, we added simple -support for the 'unary' keyword to the lexer. In addition to that, we need an -AST node:</p> - -<div class="doc_code"> -<pre> -/// UnaryExprAST - Expression class for a unary operator. -class UnaryExprAST : public ExprAST { - char Opcode; - ExprAST *Operand; -public: - UnaryExprAST(char opcode, ExprAST *operand) - : Opcode(opcode), Operand(operand) {} - virtual Value *Codegen(); -}; -</pre> -</div> - -<p>This AST node is very simple and obvious by now. It directly mirrors the -binary operator AST node, except that it only has one child. With this, we -need to add the parsing logic. Parsing a unary operator is pretty simple: we'll -add a new function to do it:</p> - -<div class="doc_code"> -<pre> -/// unary -/// ::= primary -/// ::= '!' unary -static ExprAST *ParseUnary() { - // If the current token is not an operator, it must be a primary expr. - if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') - return ParsePrimary(); - - // If this is a unary operator, read it. - int Opc = CurTok; - getNextToken(); - if (ExprAST *Operand = ParseUnary()) - return new UnaryExprAST(Opc, Operand); - return 0; -} -</pre> -</div> - -<p>The grammar we add is pretty straightforward here. If we see a unary -operator when parsing a primary operator, we eat the operator as a prefix and -parse the remaining piece as another unary operator. This allows us to handle -multiple unary operators (e.g. "!!x"). Note that unary operators can't have -ambiguous parses like binary operators can, so there is no need for precedence -information.</p> - -<p>The problem with this function, is that we need to call ParseUnary from somewhere. -To do this, we change previous callers of ParsePrimary to call ParseUnary -instead:</p> - -<div class="doc_code"> -<pre> -/// binoprhs -/// ::= ('+' unary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - ... - <b>// Parse the unary expression after the binary operator. - ExprAST *RHS = ParseUnary(); - if (!RHS) return 0;</b> - ... -} -/// expression -/// ::= unary binoprhs -/// -static ExprAST *ParseExpression() { - <b>ExprAST *LHS = ParseUnary();</b> - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} -</pre> -</div> - -<p>With these two simple changes, we are now able to parse unary operators and build the -AST for them. Next up, we need to add parser support for prototypes, to parse -the unary operator prototype. We extend the binary operator code above -with:</p> - -<div class="doc_code"> -<pre> -/// prototype -/// ::= id '(' id* ')' -/// ::= binary LETTER number? (id, id) -<b>/// ::= unary LETTER (id)</b> -static PrototypeAST *ParsePrototype() { - std::string FnName; - - unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. - unsigned BinaryPrecedence = 30; - - switch (CurTok) { - default: - return ErrorP("Expected function name in prototype"); - case tok_identifier: - FnName = IdentifierStr; - Kind = 0; - getNextToken(); - break; - <b>case tok_unary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected unary operator"); - FnName = "unary"; - FnName += (char)CurTok; - Kind = 1; - getNextToken(); - break;</b> - case tok_binary: - ... -</pre> -</div> - -<p>As with binary operators, we name unary operators with a name that includes -the operator character. This assists us at code generation time. Speaking of, -the final piece we need to add is codegen support for unary operators. It looks -like this:</p> - -<div class="doc_code"> -<pre> -Value *UnaryExprAST::Codegen() { - Value *OperandV = Operand->Codegen(); - if (OperandV == 0) return 0; - - Function *F = TheModule->getFunction(std::string("unary")+Opcode); - if (F == 0) - return ErrorV("Unknown unary operator"); - - return Builder.CreateCall(F, OperandV, "unop"); -} -</pre> -</div> - -<p>This code is similar to, but simpler than, the code for binary operators. It -is simpler primarily because it doesn't need to handle any predefined operators. -</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="example">Kicking the Tires</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>It is somewhat hard to believe, but with a few simple extensions we've -covered in the last chapters, we have grown a real-ish language. With this, we -can do a lot of interesting things, including I/O, math, and a bunch of other -things. For example, we can now add a nice sequencing operator (printd is -defined to print out the specified value and a newline):</p> - -<div class="doc_code"> -<pre> -ready> <b>extern printd(x);</b> -Read extern: declare double @printd(double) -ready> <b>def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.</b> -.. -ready> <b>printd(123) : printd(456) : printd(789);</b> -123.000000 -456.000000 -789.000000 -Evaluated to 0.000000 -</pre> -</div> - -<p>We can also define a bunch of other "primitive" operations, such as:</p> - -<div class="doc_code"> -<pre> -# Logical unary not. -def unary!(v) - if v then - 0 - else - 1; - -# Unary negate. -def unary-(v) - 0-v; - -# Define > with the same precedence as >. -def binary> 10 (LHS RHS) - RHS < LHS; - -# Binary logical or, which does not short circuit. -def binary| 5 (LHS RHS) - if LHS then - 1 - else if RHS then - 1 - else - 0; - -# Binary logical and, which does not short circuit. -def binary& 6 (LHS RHS) - if !LHS then - 0 - else - !!RHS; - -# Define = with slightly lower precedence than relationals. -def binary = 9 (LHS RHS) - !(LHS < RHS | LHS > RHS); - -</pre> -</div> - - -<p>Given the previous if/then/else support, we can also define interesting -functions for I/O. For example, the following prints out a character whose -"density" reflects the value passed in: the lower the value, the denser the -character:</p> - -<div class="doc_code"> -<pre> -ready> -<b> -extern putchard(char) -def printdensity(d) - if d > 8 then - putchard(32) # ' ' - else if d > 4 then - putchard(46) # '.' - else if d > 2 then - putchard(43) # '+' - else - putchard(42); # '*'</b> -... -ready> <b>printdensity(1): printdensity(2): printdensity(3) : - printdensity(4): printdensity(5): printdensity(9): putchard(10);</b> -*++.. -Evaluated to 0.000000 -</pre> -</div> - -<p>Based on these simple primitive operations, we can start to define more -interesting things. For example, here's a little function that solves for the -number of iterations it takes a function in the complex plane to -converge:</p> - -<div class="doc_code"> -<pre> -# determine whether the specific location diverges. -# Solve for z = z^2 + c in the complex plane. -def mandleconverger(real imag iters creal cimag) - if iters > 255 | (real*real + imag*imag > 4) then - iters - else - mandleconverger(real*real - imag*imag + creal, - 2*real*imag + cimag, - iters+1, creal, cimag); - -# return the number of iterations required for the iteration to escape -def mandleconverge(real imag) - mandleconverger(real, imag, 0, real, imag); -</pre> -</div> - -<p>This "z = z<sup>2</sup> + c" function is a beautiful little creature that is the basis -for computation of the <a -href="http://en.wikipedia.org/wiki/Mandelbrot_set">Mandelbrot Set</a>. Our -<tt>mandelconverge</tt> function returns the number of iterations that it takes -for a complex orbit to escape, saturating to 255. This is not a very useful -function by itself, but if you plot its value over a two-dimensional plane, -you can see the Mandelbrot set. Given that we are limited to using putchard -here, our amazing graphical output is limited, but we can whip together -something using the density plotter above:</p> - -<div class="doc_code"> -<pre> -# compute and plot the mandlebrot set with the specified 2 dimensional range -# info. -def mandelhelp(xmin xmax xstep ymin ymax ystep) - for y = ymin, y < ymax, ystep in ( - (for x = xmin, x < xmax, xstep in - printdensity(mandleconverge(x,y))) - : putchard(10) - ) - -# mandel - This is a convenient helper function for ploting the mandelbrot set -# from the specified position with the specified Magnification. -def mandel(realstart imagstart realmag imagmag) - mandelhelp(realstart, realstart+realmag*78, realmag, - imagstart, imagstart+imagmag*40, imagmag); -</pre> -</div> - -<p>Given this, we can try plotting out the mandlebrot set! Lets try it out:</p> - -<div class="doc_code"> -<pre> -ready> <b>mandel(-2.3, -1.3, 0.05, 0.07);</b> -*******************************+++++++++++************************************* -*************************+++++++++++++++++++++++******************************* -**********************+++++++++++++++++++++++++++++**************************** -*******************+++++++++++++++++++++.. ...++++++++************************* -*****************++++++++++++++++++++++.... ...+++++++++*********************** -***************+++++++++++++++++++++++..... ...+++++++++********************* -**************+++++++++++++++++++++++.... ....+++++++++******************** -*************++++++++++++++++++++++...... .....++++++++******************* -************+++++++++++++++++++++....... .......+++++++****************** -***********+++++++++++++++++++.... ... .+++++++***************** -**********+++++++++++++++++....... .+++++++**************** -*********++++++++++++++........... ...+++++++*************** -********++++++++++++............ ...++++++++************** -********++++++++++... .......... .++++++++************** -*******+++++++++..... .+++++++++************* -*******++++++++...... ..+++++++++************* -*******++++++....... ..+++++++++************* -*******+++++...... ..+++++++++************* -*******.... .... ...+++++++++************* -*******.... . ...+++++++++************* -*******+++++...... ...+++++++++************* -*******++++++....... ..+++++++++************* -*******++++++++...... .+++++++++************* -*******+++++++++..... ..+++++++++************* -********++++++++++... .......... .++++++++************** -********++++++++++++............ ...++++++++************** -*********++++++++++++++.......... ...+++++++*************** -**********++++++++++++++++........ .+++++++**************** -**********++++++++++++++++++++.... ... ..+++++++**************** -***********++++++++++++++++++++++....... .......++++++++***************** -************+++++++++++++++++++++++...... ......++++++++****************** -**************+++++++++++++++++++++++.... ....++++++++******************** -***************+++++++++++++++++++++++..... ...+++++++++********************* -*****************++++++++++++++++++++++.... ...++++++++*********************** -*******************+++++++++++++++++++++......++++++++************************* -*********************++++++++++++++++++++++.++++++++*************************** -*************************+++++++++++++++++++++++******************************* -******************************+++++++++++++************************************ -******************************************************************************* -******************************************************************************* -******************************************************************************* -Evaluated to 0.000000 -ready> <b>mandel(-2, -1, 0.02, 0.04);</b> -**************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ -***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -*********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. -*******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... -*****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... -***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ -**************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... -************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. -***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . -**********++++++++++++++++++++++++++++++++++++++++++++++............. -********+++++++++++++++++++++++++++++++++++++++++++.................. -*******+++++++++++++++++++++++++++++++++++++++....................... -******+++++++++++++++++++++++++++++++++++........................... -*****++++++++++++++++++++++++++++++++............................ -*****++++++++++++++++++++++++++++............................... -****++++++++++++++++++++++++++...... ......................... -***++++++++++++++++++++++++......... ...... ........... -***++++++++++++++++++++++............ -**+++++++++++++++++++++.............. -**+++++++++++++++++++................ -*++++++++++++++++++................. -*++++++++++++++++............ ... -*++++++++++++++.............. -*+++....++++................ -*.......... ........... -* -*.......... ........... -*+++....++++................ -*++++++++++++++.............. -*++++++++++++++++............ ... -*++++++++++++++++++................. -**+++++++++++++++++++................ -**+++++++++++++++++++++.............. -***++++++++++++++++++++++............ -***++++++++++++++++++++++++......... ...... ........... -****++++++++++++++++++++++++++...... ......................... -*****++++++++++++++++++++++++++++............................... -*****++++++++++++++++++++++++++++++++............................ -******+++++++++++++++++++++++++++++++++++........................... -*******+++++++++++++++++++++++++++++++++++++++....................... -********+++++++++++++++++++++++++++++++++++++++++++.................. -Evaluated to 0.000000 -ready> <b>mandel(-0.9, -1.4, 0.02, 0.03);</b> -******************************************************************************* -******************************************************************************* -******************************************************************************* -**********+++++++++++++++++++++************************************************ -*+++++++++++++++++++++++++++++++++++++++*************************************** -+++++++++++++++++++++++++++++++++++++++++++++********************************** -++++++++++++++++++++++++++++++++++++++++++++++++++***************************** -++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** -+++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* -+++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** -+++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** -++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ -+++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** -++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** -++++++++++++++++++++++++............. .......++++++++++++++++++++++****** -+++++++++++++++++++++++............. ........+++++++++++++++++++++++**** -++++++++++++++++++++++........... ..........++++++++++++++++++++++*** -++++++++++++++++++++........... .........++++++++++++++++++++++* -++++++++++++++++++............ ...........++++++++++++++++++++ -++++++++++++++++............... .............++++++++++++++++++ -++++++++++++++................. ...............++++++++++++++++ -++++++++++++.................. .................++++++++++++++ -+++++++++.................. .................+++++++++++++ -++++++........ . ......... ..++++++++++++ -++............ ...... ....++++++++++ -.............. ...++++++++++ -.............. ....+++++++++ -.............. .....++++++++ -............. ......++++++++ -........... .......++++++++ -......... ........+++++++ -......... ........+++++++ -......... ....+++++++ -........ ...+++++++ -....... ...+++++++ - ....+++++++ - .....+++++++ - ....+++++++ - ....+++++++ - ....+++++++ -Evaluated to 0.000000 -ready> <b>^D</b> -</pre> -</div> - -<p>At this point, you may be starting to realize that Kaleidoscope is a real -and powerful language. It may not be self-similar :), but it can be used to -plot things that are!</p> - -<p>With this, we conclude the "adding user-defined operators" chapter of the -tutorial. We have successfully augmented our language, adding the ability to extend the -language in the library, and we have shown how this can be used to build a simple but -interesting end-user application in Kaleidoscope. At this point, Kaleidoscope -can build a variety of applications that are functional and can call functions -with side-effects, but it can't actually define and mutate a variable itself. -</p> - -<p>Strikingly, variable mutation is an important feature of some -languages, and it is not at all obvious how to <a href="LangImpl7.html">add -support for mutable variables</a> without having to add an "SSA construction" -phase to your front-end. In the next chapter, we will describe how you can -add variable mutation without building SSA in your front-end.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for our running example, enhanced with the -if/then/else and for expressions.. To build this example, use: -</p> - -<div class="doc_code"> -<pre> - # Compile - g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy - # Run - ./toy -</pre> -</div> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -#include "llvm/DerivedTypes.h" -#include "llvm/ExecutionEngine/ExecutionEngine.h" -#include "llvm/ExecutionEngine/JIT.h" -#include "llvm/LLVMContext.h" -#include "llvm/Module.h" -#include "llvm/PassManager.h" -#include "llvm/Analysis/Verifier.h" -#include "llvm/Target/TargetData.h" -#include "llvm/Target/TargetSelect.h" -#include "llvm/Transforms/Scalar.h" -#include "llvm/Support/IRBuilder.h" -#include <cstdio> -#include <string> -#include <map> -#include <vector> -using namespace llvm; - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5, - - // control - tok_if = -6, tok_then = -7, tok_else = -8, - tok_for = -9, tok_in = -10, - - // operators - tok_binary = -11, tok_unary = -12 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - if (IdentifierStr == "if") return tok_if; - if (IdentifierStr == "then") return tok_then; - if (IdentifierStr == "else") return tok_else; - if (IdentifierStr == "for") return tok_for; - if (IdentifierStr == "in") return tok_in; - if (IdentifierStr == "binary") return tok_binary; - if (IdentifierStr == "unary") return tok_unary; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - virtual Value *Codegen() = 0; -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - virtual Value *Codegen(); -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} - virtual Value *Codegen(); -}; - -/// UnaryExprAST - Expression class for a unary operator. -class UnaryExprAST : public ExprAST { - char Opcode; - ExprAST *Operand; -public: - UnaryExprAST(char opcode, ExprAST *operand) - : Opcode(opcode), Operand(operand) {} - virtual Value *Codegen(); -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} - virtual Value *Codegen(); -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} - virtual Value *Codegen(); -}; - -/// IfExprAST - Expression class for if/then/else. -class IfExprAST : public ExprAST { - ExprAST *Cond, *Then, *Else; -public: - IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) - : Cond(cond), Then(then), Else(_else) {} - virtual Value *Codegen(); -}; - -/// ForExprAST - Expression class for for/in. -class ForExprAST : public ExprAST { - std::string VarName; - ExprAST *Start, *End, *Step, *Body; -public: - ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, - ExprAST *step, ExprAST *body) - : VarName(varname), Start(start), End(end), Step(step), Body(body) {} - virtual Value *Codegen(); -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes), as well as if it is an operator. -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; - bool isOperator; - unsigned Precedence; // Precedence if a binary op. -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args, - bool isoperator = false, unsigned prec = 0) - : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} - - bool isUnaryOp() const { return isOperator && Args.size() == 1; } - bool isBinaryOp() const { return isOperator && Args.size() == 2; } - - char getOperatorName() const { - assert(isUnaryOp() || isBinaryOp()); - return Name[Name.size()-1]; - } - - unsigned getBinaryPrecedence() const { return Precedence; } - - Function *Codegen(); -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - - Function *Codegen(); -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// ifexpr ::= 'if' expression 'then' expression 'else' expression -static ExprAST *ParseIfExpr() { - getNextToken(); // eat the if. - - // condition. - ExprAST *Cond = ParseExpression(); - if (!Cond) return 0; - - if (CurTok != tok_then) - return Error("expected then"); - getNextToken(); // eat the then - - ExprAST *Then = ParseExpression(); - if (Then == 0) return 0; - - if (CurTok != tok_else) - return Error("expected else"); - - getNextToken(); - - ExprAST *Else = ParseExpression(); - if (!Else) return 0; - - return new IfExprAST(Cond, Then, Else); -} - -/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression -static ExprAST *ParseForExpr() { - getNextToken(); // eat the for. - - if (CurTok != tok_identifier) - return Error("expected identifier after for"); - - std::string IdName = IdentifierStr; - getNextToken(); // eat identifier. - - if (CurTok != '=') - return Error("expected '=' after for"); - getNextToken(); // eat '='. - - - ExprAST *Start = ParseExpression(); - if (Start == 0) return 0; - if (CurTok != ',') - return Error("expected ',' after for start value"); - getNextToken(); - - ExprAST *End = ParseExpression(); - if (End == 0) return 0; - - // The step value is optional. - ExprAST *Step = 0; - if (CurTok == ',') { - getNextToken(); - Step = ParseExpression(); - if (Step == 0) return 0; - } - - if (CurTok != tok_in) - return Error("expected 'in' after for"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new ForExprAST(IdName, Start, End, Step, Body); -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -/// ::= ifexpr -/// ::= forexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - case tok_if: return ParseIfExpr(); - case tok_for: return ParseForExpr(); - } -} - -/// unary -/// ::= primary -/// ::= '!' unary -static ExprAST *ParseUnary() { - // If the current token is not an operator, it must be a primary expr. - if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') - return ParsePrimary(); - - // If this is a unary operator, read it. - int Opc = CurTok; - getNextToken(); - if (ExprAST *Operand = ParseUnary()) - return new UnaryExprAST(Opc, Operand); - return 0; -} - -/// binoprhs -/// ::= ('+' unary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the unary expression after the binary operator. - ExprAST *RHS = ParseUnary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= unary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParseUnary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -/// ::= binary LETTER number? (id, id) -/// ::= unary LETTER (id) -static PrototypeAST *ParsePrototype() { - std::string FnName; - - unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. - unsigned BinaryPrecedence = 30; - - switch (CurTok) { - default: - return ErrorP("Expected function name in prototype"); - case tok_identifier: - FnName = IdentifierStr; - Kind = 0; - getNextToken(); - break; - case tok_unary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected unary operator"); - FnName = "unary"; - FnName += (char)CurTok; - Kind = 1; - getNextToken(); - break; - case tok_binary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected binary operator"); - FnName = "binary"; - FnName += (char)CurTok; - Kind = 2; - getNextToken(); - - // Read the precedence if present. - if (CurTok == tok_number) { - if (NumVal < 1 || NumVal > 100) - return ErrorP("Invalid precedecnce: must be 1..100"); - BinaryPrecedence = (unsigned)NumVal; - getNextToken(); - } - break; - } - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - // Verify right number of names for operator. - if (Kind && ArgNames.size() != Kind) - return ErrorP("Invalid number of operands for operator"); - - return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Code Generation -//===----------------------------------------------------------------------===// - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, Value*> NamedValues; -static FunctionPassManager *TheFPM; - -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} - -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - return V ? V : ErrorV("Unknown variable name"); -} - -Value *UnaryExprAST::Codegen() { - Value *OperandV = Operand->Codegen(); - if (OperandV == 0) return 0; - - Function *F = TheModule->getFunction(std::string("unary")+Opcode); - if (F == 0) - return ErrorV("Unknown unary operator"); - - return Builder.CreateCall(F, OperandV, "unop"); -} - -Value *BinaryExprAST::Codegen() { - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateAdd(L, R, "addtmp"); - case '-': return Builder.CreateSub(L, R, "subtmp"); - case '*': return Builder.CreateMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: break; - } - - // If it wasn't a builtin binary operator, it must be a user defined one. Emit - // a call to it. - Function *F = TheModule->getFunction(std::string("binary")+Op); - assert(F && "binary operator not found!"); - - Value *Ops[] = { L, R }; - return Builder.CreateCall(F, Ops, Ops+2, "binop"); -} - -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); -} - -Value *IfExprAST::Codegen() { - Value *CondV = Cond->Codegen(); - if (CondV == 0) return 0; - - // Convert condition to a bool by comparing equal to 0.0. - CondV = Builder.CreateFCmpONE(CondV, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "ifcond"); - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Create blocks for the then and else cases. Insert the 'then' block at the - // end of the function. - BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); - BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); - BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); - - Builder.CreateCondBr(CondV, ThenBB, ElseBB); - - // Emit then value. - Builder.SetInsertPoint(ThenBB); - - Value *ThenV = Then->Codegen(); - if (ThenV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Then' can change the current block, update ThenBB for the PHI. - ThenBB = Builder.GetInsertBlock(); - - // Emit else block. - TheFunction->getBasicBlockList().push_back(ElseBB); - Builder.SetInsertPoint(ElseBB); - - Value *ElseV = Else->Codegen(); - if (ElseV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Else' can change the current block, update ElseBB for the PHI. - ElseBB = Builder.GetInsertBlock(); - - // Emit merge block. - TheFunction->getBasicBlockList().push_back(MergeBB); - Builder.SetInsertPoint(MergeBB); - PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), - "iftmp"); - - PN->addIncoming(ThenV, ThenBB); - PN->addIncoming(ElseV, ElseBB); - return PN; -} - -Value *ForExprAST::Codegen() { - // Output this as: - // ... - // start = startexpr - // goto loop - // loop: - // variable = phi [start, loopheader], [nextvariable, loopend] - // ... - // bodyexpr - // ... - // loopend: - // step = stepexpr - // nextvariable = variable + step - // endcond = endexpr - // br endcond, loop, endloop - // outloop: - - // Emit the start code first, without 'variable' in scope. - Value *StartVal = Start->Codegen(); - if (StartVal == 0) return 0; - - // Make the new basic block for the loop header, inserting after current - // block. - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - BasicBlock *PreheaderBB = Builder.GetInsertBlock(); - BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); - - // Insert an explicit fall through from the current block to the LoopBB. - Builder.CreateBr(LoopBB); - - // Start insertion in LoopBB. - Builder.SetInsertPoint(LoopBB); - - // Start the PHI node with an entry for Start. - PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), VarName.c_str()); - Variable->addIncoming(StartVal, PreheaderBB); - - // Within the loop, the variable is defined equal to the PHI node. If it - // shadows an existing variable, we have to restore it, so save it now. - Value *OldVal = NamedValues[VarName]; - NamedValues[VarName] = Variable; - - // Emit the body of the loop. This, like any other expr, can change the - // current BB. Note that we ignore the value computed by the body, but don't - // allow an error. - if (Body->Codegen() == 0) - return 0; - - // Emit the step value. - Value *StepVal; - if (Step) { - StepVal = Step->Codegen(); - if (StepVal == 0) return 0; - } else { - // If not specified, use 1.0. - StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); - } - - Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar"); - - // Compute the end condition. - Value *EndCond = End->Codegen(); - if (EndCond == 0) return EndCond; - - // Convert condition to a bool by comparing equal to 0.0. - EndCond = Builder.CreateFCmpONE(EndCond, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "loopcond"); - - // Create the "after loop" block and insert it. - BasicBlock *LoopEndBB = Builder.GetInsertBlock(); - BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); - - // Insert the conditional branch into the end of LoopEndBB. - Builder.CreateCondBr(EndCond, LoopBB, AfterBB); - - // Any new code will be inserted in AfterBB. - Builder.SetInsertPoint(AfterBB); - - // Add a new entry to the PHI node for the backedge. - Variable->addIncoming(NextVar, LoopEndBB); - - // Restore the unshadowed variable. - if (OldVal) - NamedValues[VarName] = OldVal; - else - NamedValues.erase(VarName); - - - // for expr always returns 0.0. - return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); -} - -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<const Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); - - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); - - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } - - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) { - AI->setName(Args[Idx]); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = AI; - } - - return F; -} - -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - // If this is an operator, install it. - if (Proto->isBinaryOp()) - BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - // Optimize the function. - TheFPM->run(*TheFunction); - - return TheFunction; - } - - // Error reading body, remove function. - TheFunction->eraseFromParent(); - - if (Proto->isBinaryOp()) - BinopPrecedence.erase(Proto->getOperatorName()); - return 0; -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing and JIT Driver -//===----------------------------------------------------------------------===// - -static ExecutionEngine *TheExecutionEngine; - -static void HandleDefinition() { - if (FunctionAST *F = ParseDefinition()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read function definition:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (PrototypeAST *P = ParseExtern()) { - if (Function *F = P->Codegen()) { - fprintf(stderr, "Read extern: "); - F->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - // JIT the function, returning a function pointer. - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); - - // Cast it to the right type (takes no arguments, returns a double) so we - // can call it as a native function. - double (*FP)() = (double (*)())(intptr_t)FPtr; - fprintf(stderr, "Evaluated to %f\n", FP()); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// "Library" functions that can be "extern'd" from user code. -//===----------------------------------------------------------------------===// - -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} - -/// printd - printf that takes a double prints it as "%f\n", returning 0. -extern "C" -double printd(double X) { - printf("%f\n", X); - return 0; -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - InitializeNativeTarget(); - LLVMContext &Context = getGlobalContext(); - - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Make the module, which holds all the code. - TheModule = new Module("my cool jit", Context); - - // Create the JIT. This takes ownership of the module. - std::string ErrStr; - TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); - if (!TheExecutionEngine) { - fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); - exit(1); - } - - FunctionPassManager OurFPM(TheModule); - - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); - // Eliminate Common SubExpressions. - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - TheFPM = &OurFPM; - - // Run the main "interpreter loop" now. - MainLoop(); - - TheFPM = 0; - - // Print out all of the generated code. - TheModule->dump(); - - return 0; -} -</pre> -</div> - -<a href="LangImpl7.html">Next: Extending the language: mutable variables / SSA construction</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl7.html b/docs/tutorial/LangImpl7.html deleted file mode 100644 index 0b46ba5..0000000 --- a/docs/tutorial/LangImpl7.html +++ /dev/null @@ -1,2164 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA - construction</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 7 - <ol> - <li><a href="#intro">Chapter 7 Introduction</a></li> - <li><a href="#why">Why is this a hard problem?</a></li> - <li><a href="#memory">Memory in LLVM</a></li> - <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li> - <li><a href="#adjustments">Adjusting Existing Variables for - Mutation</a></li> - <li><a href="#assignment">New Assignment Operator</a></li> - <li><a href="#localvars">User-defined Local Variables</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM - tidbits</li> -</ul> - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very -respectable, albeit simple, <a -href="http://en.wikipedia.org/wiki/Functional_programming">functional -programming language</a>. In our journey, we learned some parsing techniques, -how to build and represent an AST, how to build LLVM IR, and how to optimize -the resultant code as well as JIT compile it.</p> - -<p>While Kaleidoscope is interesting as a functional language, the fact that it -is functional makes it "too easy" to generate LLVM IR for it. In particular, a -functional language makes it very easy to build LLVM IR directly in <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>. -Since LLVM requires that the input code be in SSA form, this is a very nice -property and it is often unclear to newcomers how to generate code for an -imperative language with mutable variables.</p> - -<p>The short (and happy) summary of this chapter is that there is no need for -your front-end to build SSA form: LLVM provides highly tuned and well tested -support for this, though the way it works is a bit unexpected for some.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="why">Why is this a hard problem?</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -To understand why mutable variables cause complexities in SSA construction, -consider this extremely simple C example: -</p> - -<div class="doc_code"> -<pre> -int G, H; -int test(_Bool Condition) { - int X; - if (Condition) - X = G; - else - X = H; - return X; -} -</pre> -</div> - -<p>In this case, we have the variable "X", whose value depends on the path -executed in the program. Because there are two different possible values for X -before the return instruction, a PHI node is inserted to merge the two values. -The LLVM IR that we want for this example looks like this:</p> - -<div class="doc_code"> -<pre> -@G = weak global i32 0 ; type of @G is i32* -@H = weak global i32 0 ; type of @H is i32* - -define i32 @test(i1 %Condition) { -entry: - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - br label %cond_next - -cond_false: - %X.1 = load i32* @H - br label %cond_next - -cond_next: - %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] - ret i32 %X.2 -} -</pre> -</div> - -<p>In this example, the loads from the G and H global variables are explicit in -the LLVM IR, and they live in the then/else branches of the if statement -(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node -in the cond_next block selects the right value to use based on where control -flow is coming from: if control flow comes from the cond_false block, X.2 gets -the value of X.1. Alternatively, if control flow comes from cond_true, it gets -the value of X.0. The intent of this chapter is not to explain the details of -SSA form. For more information, see one of the many <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online -references</a>.</p> - -<p>The question for this article is "who places the phi nodes when lowering -assignments to mutable variables?". The issue here is that LLVM -<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it. -However, SSA construction requires non-trivial algorithms and data structures, -so it is inconvenient and wasteful for every front-end to have to reproduce this -logic.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="memory">Memory in LLVM</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>The 'trick' here is that while LLVM does require all register values to be -in SSA form, it does not require (or permit) memory objects to be in SSA form. -In the example above, note that the loads from G and H are direct accesses to -G and H: they are not renamed or versioned. This differs from some other -compiler systems, which do try to version memory objects. In LLVM, instead of -encoding dataflow analysis of memory into the LLVM IR, it is handled with <a -href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on -demand.</p> - -<p> -With this in mind, the high-level idea is that we want to make a stack variable -(which lives in memory, because it is on the stack) for each mutable object in -a function. To take advantage of this trick, we need to talk about how LLVM -represents stack variables. -</p> - -<p>In LLVM, all memory accesses are explicit with load/store instructions, and -it is carefully designed not to have (or need) an "address-of" operator. Notice -how the type of the @G/@H global variables is actually "i32*" even though the -variable is defined as "i32". What this means is that @G defines <em>space</em> -for an i32 in the global data area, but its <em>name</em> actually refers to the -address for that space. Stack variables work the same way, except that instead of -being declared with global variable definitions, they are declared with the -<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p> - -<div class="doc_code"> -<pre> -define i32 @example() { -entry: - %X = alloca i32 ; type of %X is i32*. - ... - %tmp = load i32* %X ; load the stack value %X from the stack. - %tmp2 = add i32 %tmp, 1 ; increment it - store i32 %tmp2, i32* %X ; store it back - ... -</pre> -</div> - -<p>This code shows an example of how you can declare and manipulate a stack -variable in the LLVM IR. Stack memory allocated with the alloca instruction is -fully general: you can pass the address of the stack slot to functions, you can -store it in other variables, etc. In our example above, we could rewrite the -example to use the alloca technique to avoid using a PHI node:</p> - -<div class="doc_code"> -<pre> -@G = weak global i32 0 ; type of @G is i32* -@H = weak global i32 0 ; type of @H is i32* - -define i32 @test(i1 %Condition) { -entry: - %X = alloca i32 ; type of %X is i32*. - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - store i32 %X.0, i32* %X ; Update X - br label %cond_next - -cond_false: - %X.1 = load i32* @H - store i32 %X.1, i32* %X ; Update X - br label %cond_next - -cond_next: - %X.2 = load i32* %X ; Read X - ret i32 %X.2 -} -</pre> -</div> - -<p>With this, we have discovered a way to handle arbitrary mutable variables -without the need to create Phi nodes at all:</p> - -<ol> -<li>Each mutable variable becomes a stack allocation.</li> -<li>Each read of the variable becomes a load from the stack.</li> -<li>Each update of the variable becomes a store to the stack.</li> -<li>Taking the address of a variable just uses the stack address directly.</li> -</ol> - -<p>While this solution has solved our immediate problem, it introduced another -one: we have now apparently introduced a lot of stack traffic for very simple -and common operations, a major performance problem. Fortunately for us, the -LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles -this case, promoting allocas like this into SSA registers, inserting Phi nodes -as appropriate. If you run this example through the pass, for example, you'll -get:</p> - -<div class="doc_code"> -<pre> -$ <b>llvm-as < example.ll | opt -mem2reg | llvm-dis</b> -@G = weak global i32 0 -@H = weak global i32 0 - -define i32 @test(i1 %Condition) { -entry: - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - br label %cond_next - -cond_false: - %X.1 = load i32* @H - br label %cond_next - -cond_next: - %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] - ret i32 %X.01 -} -</pre> -</div> - -<p>The mem2reg pass implements the standard "iterated dominance frontier" -algorithm for constructing SSA form and has a number of optimizations that speed -up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing -with mutable variables, and we highly recommend that you depend on it. Note that -mem2reg only works on variables in certain circumstances:</p> - -<ol> -<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it -promotes them. It does not apply to global variables or heap allocations.</li> - -<li>mem2reg only looks for alloca instructions in the entry block of the -function. Being in the entry block guarantees that the alloca is only executed -once, which makes analysis simpler.</li> - -<li>mem2reg only promotes allocas whose uses are direct loads and stores. If -the address of the stack object is passed to a function, or if any funny pointer -arithmetic is involved, the alloca will not be promoted.</li> - -<li>mem2reg only works on allocas of <a -href="../LangRef.html#t_classifications">first class</a> -values (such as pointers, scalars and vectors), and only if the array size -of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of -promoting structs or arrays to registers. Note that the "scalarrepl" pass is -more powerful and can promote structs, "unions", and arrays in many cases.</li> - -</ol> - -<p> -All of these properties are easy to satisfy for most imperative languages, and -we'll illustrate it below with Kaleidoscope. The final question you may be -asking is: should I bother with this nonsense for my front-end? Wouldn't it be -better if I just did SSA construction directly, avoiding use of the mem2reg -optimization pass? In short, we strongly recommend that you use this technique -for building SSA form, unless there is an extremely good reason not to. Using -this technique is:</p> - -<ul> -<li>Proven and well tested: llvm-gcc and clang both use this technique for local -mutable variables. As such, the most common clients of LLVM are using this to -handle a bulk of their variables. You can be sure that bugs are found fast and -fixed early.</li> - -<li>Extremely Fast: mem2reg has a number of special cases that make it fast in -common cases as well as fully general. For example, it has fast-paths for -variables that are only used in a single block, variables that only have one -assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc. -</li> - -<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html"> -Debug information in LLVM</a> relies on having the address of the variable -exposed so that debug info can be attached to it. This technique dovetails -very naturally with this style of debug info.</li> -</ul> - -<p>If nothing else, this makes it much easier to get your front-end up and -running, and is very simple to implement. Lets extend Kaleidoscope with mutable -variables now! -</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="kalvars">Mutable Variables in -Kaleidoscope</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Now that we know the sort of problem we want to tackle, lets see what this -looks like in the context of our little Kaleidoscope language. We're going to -add two features:</p> - -<ol> -<li>The ability to mutate variables with the '=' operator.</li> -<li>The ability to define new variables.</li> -</ol> - -<p>While the first item is really what this is about, we only have variables -for incoming arguments as well as for induction variables, and redefining those only -goes so far :). Also, the ability to define new variables is a -useful thing regardless of whether you will be mutating them. Here's a -motivating example that shows how we could use these:</p> - -<div class="doc_code"> -<pre> -# Define ':' for sequencing: as a low-precedence operator that ignores operands -# and just returns the RHS. -def binary : 1 (x y) y; - -# Recursive fib, we could do this before. -def fib(x) - if (x < 3) then - 1 - else - fib(x-1)+fib(x-2); - -# Iterative fib. -def fibi(x) - <b>var a = 1, b = 1, c in</b> - (for i = 3, i < x in - <b>c = a + b</b> : - <b>a = b</b> : - <b>b = c</b>) : - b; - -# Call it. -fibi(10); -</pre> -</div> - -<p> -In order to mutate variables, we have to change our existing variables to use -the "alloca trick". Once we have that, we'll add our new operator, then extend -Kaleidoscope to support new variable definitions. -</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for -Mutation</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -The symbol table in Kaleidoscope is managed at code generation time by the -'<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*" -that holds the double value for the named variable. In order to support -mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds -the <em>memory location</em> of the variable in question. Note that this -change is a refactoring: it changes the structure of the code, but does not -(by itself) change the behavior of the compiler. All of these changes are -isolated in the Kaleidoscope code generator.</p> - -<p> -At this point in Kaleidoscope's development, it only supports variables for two -things: incoming arguments to functions and the induction variable of 'for' -loops. For consistency, we'll allow mutation of these variables in addition to -other user-defined variables. This means that these will both need memory -locations. -</p> - -<p>To start our transformation of Kaleidoscope, we'll change the NamedValues -map so that it maps to AllocaInst* instead of Value*. Once we do this, the C++ -compiler will tell us what parts of the code we need to update:</p> - -<div class="doc_code"> -<pre> -static std::map<std::string, AllocaInst*> NamedValues; -</pre> -</div> - -<p>Also, since we will need to create these alloca's, we'll use a helper -function that ensures that the allocas are created in the entry block of the -function:</p> - -<div class="doc_code"> -<pre> -/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of -/// the function. This is used for mutable variables etc. -static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, - const std::string &VarName) { - IRBuilder<> TmpB(&TheFunction->getEntryBlock(), - TheFunction->getEntryBlock().begin()); - return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, - VarName.c_str()); -} -</pre> -</div> - -<p>This funny looking code creates an IRBuilder object that is pointing at -the first instruction (.begin()) of the entry block. It then creates an alloca -with the expected name and returns it. Because all values in Kaleidoscope are -doubles, there is no need to pass in a type to use.</p> - -<p>With this in place, the first functionality change we want to make is to -variable references. In our new scheme, variables live on the stack, so code -generating a reference to them actually needs to produce a load from the stack -slot:</p> - -<div class="doc_code"> -<pre> -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - if (V == 0) return ErrorV("Unknown variable name"); - - <b>// Load the value. - return Builder.CreateLoad(V, Name.c_str());</b> -} -</pre> -</div> - -<p>As you can see, this is pretty straightforward. Now we need to update the -things that define the variables to set up the alloca. We'll start with -<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for -the unabridged code):</p> - -<div class="doc_code"> -<pre> - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - <b>// Create an alloca for the variable in the entry block. - AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b> - - // Emit the start code first, without 'variable' in scope. - Value *StartVal = Start->Codegen(); - if (StartVal == 0) return 0; - - <b>// Store the value into the alloca. - Builder.CreateStore(StartVal, Alloca);</b> - ... - - // Compute the end condition. - Value *EndCond = End->Codegen(); - if (EndCond == 0) return EndCond; - - <b>// Reload, increment, and restore the alloca. This handles the case where - // the body of the loop mutates the variable. - Value *CurVar = Builder.CreateLoad(Alloca); - Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar"); - Builder.CreateStore(NextVar, Alloca);</b> - ... -</pre> -</div> - -<p>This code is virtually identical to the code <a -href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The -big difference is that we no longer have to construct a PHI node, and we use -load/store to access the variable as needed.</p> - -<p>To support mutable argument variables, we need to also make allocas for them. -The code for this is also pretty simple:</p> - -<div class="doc_code"> -<pre> -/// CreateArgumentAllocas - Create an alloca for each argument and register the -/// argument in the symbol table so that references to it will succeed. -void PrototypeAST::CreateArgumentAllocas(Function *F) { - Function::arg_iterator AI = F->arg_begin(); - for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { - // Create an alloca for this variable. - AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); - - // Store the initial value into the alloca. - Builder.CreateStore(AI, Alloca); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = Alloca; - } -} -</pre> -</div> - -<p>For each argument, we make an alloca, store the input value to the function -into the alloca, and register the alloca as the memory location for the -argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after -it sets up the entry block for the function.</p> - -<p>The final missing piece is adding the mem2reg pass, which allows us to get -good codegen once again:</p> - -<div class="doc_code"> -<pre> - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); - <b>// Promote allocas to registers. - OurFPM.add(createPromoteMemoryToRegisterPass());</b> - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); -</pre> -</div> - -<p>It is interesting to see what the code looks like before and after the -mem2reg optimization runs. For example, this is the before/after code for our -recursive fib function. Before the optimization:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - <b>%x1 = alloca double - store double %x, double* %x1 - %x2 = load double* %x1</b> - %cmptmp = fcmp ult double %x2, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: ; preds = %entry - br label %ifcont - -else: ; preds = %entry - <b>%x3 = load double* %x1</b> - %subtmp = fsub double %x3, 1.000000e+00 - %calltmp = call double @fib( double %subtmp ) - <b>%x4 = load double* %x1</b> - %subtmp5 = fsub double %x4, 2.000000e+00 - %calltmp6 = call double @fib( double %subtmp5 ) - %addtmp = fadd double %calltmp, %calltmp6 - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>Here there is only one variable (x, the input argument) but you can still -see the extremely simple-minded code generation strategy we are using. In the -entry block, an alloca is created, and the initial input value is stored into -it. Each reference to the variable does a reload from the stack. Also, note -that we didn't modify the if/then/else expression, so it still inserts a PHI -node. While we could make an alloca for it, it is actually easier to create a -PHI node for it, so we still just make the PHI.</p> - -<p>Here is the code after the mem2reg pass runs:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: - br label %ifcont - -else: - %subtmp = fsub double <b>%x</b>, 1.000000e+00 - %calltmp = call double @fib( double %subtmp ) - %subtmp5 = fsub double <b>%x</b>, 2.000000e+00 - %calltmp6 = call double @fib( double %subtmp5 ) - %addtmp = fadd double %calltmp, %calltmp6 - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>This is a trivial case for mem2reg, since there are no redefinitions of the -variable. The point of showing this is to calm your tension about inserting -such blatent inefficiencies :).</p> - -<p>After the rest of the optimizers run, we get:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - %cmptmp = fcmp ult double %x, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp ueq double %booltmp, 0.000000e+00 - br i1 %ifcond, label %else, label %ifcont - -else: - %subtmp = fsub double %x, 1.000000e+00 - %calltmp = call double @fib( double %subtmp ) - %subtmp5 = fsub double %x, 2.000000e+00 - %calltmp6 = call double @fib( double %subtmp5 ) - %addtmp = fadd double %calltmp, %calltmp6 - ret double %addtmp - -ifcont: - ret double 1.000000e+00 -} -</pre> -</div> - -<p>Here we see that the simplifycfg pass decided to clone the return instruction -into the end of the 'else' block. This allowed it to eliminate some branches -and the PHI node.</p> - -<p>Now that all symbol table references are updated to use stack variables, -we'll add the assignment operator.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="assignment">New Assignment Operator</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>With our current framework, adding a new assignment operator is really -simple. We will parse it just like any other binary operator, but handle it -internally (instead of allowing the user to define it). The first step is to -set a precedence:</p> - -<div class="doc_code"> -<pre> - int main() { - // Install standard binary operators. - // 1 is lowest precedence. - <b>BinopPrecedence['='] = 2;</b> - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; -</pre> -</div> - -<p>Now that the parser knows the precedence of the binary operator, it takes -care of all the parsing and AST generation. We just need to implement codegen -for the assignment operator. This looks like:</p> - -<div class="doc_code"> -<pre> -Value *BinaryExprAST::Codegen() { - // Special case '=' because we don't want to emit the LHS as an expression. - if (Op == '=') { - // Assignment requires the LHS to be an identifier. - VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS); - if (!LHSE) - return ErrorV("destination of '=' must be a variable"); -</pre> -</div> - -<p>Unlike the rest of the binary operators, our assignment operator doesn't -follow the "emit LHS, emit RHS, do computation" model. As such, it is handled -as a special case before the other binary operators are handled. The other -strange thing is that it requires the LHS to be a variable. It is invalid to -have "(x+1) = expr" - only things like "x = expr" are allowed. -</p> - -<div class="doc_code"> -<pre> - // Codegen the RHS. - Value *Val = RHS->Codegen(); - if (Val == 0) return 0; - - // Look up the name. - Value *Variable = NamedValues[LHSE->getName()]; - if (Variable == 0) return ErrorV("Unknown variable name"); - - Builder.CreateStore(Val, Variable); - return Val; - } - ... -</pre> -</div> - -<p>Once we have the variable, codegen'ing the assignment is straightforward: -we emit the RHS of the assignment, create a store, and return the computed -value. Returning a value allows for chained assignments like "X = (Y = Z)".</p> - -<p>Now that we have an assignment operator, we can mutate loop variables and -arguments. For example, we can now run code like this:</p> - -<div class="doc_code"> -<pre> -# Function to print a double. -extern printd(x); - -# Define ':' for sequencing: as a low-precedence operator that ignores operands -# and just returns the RHS. -def binary : 1 (x y) y; - -def test(x) - printd(x) : - x = 4 : - printd(x); - -test(123); -</pre> -</div> - -<p>When run, this example prints "123" and then "4", showing that we did -actually mutate the value! Okay, we have now officially implemented our goal: -getting this to work requires SSA construction in the general case. However, -to be really useful, we want the ability to define our own local variables, lets -add this next! -</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="localvars">User-defined Local -Variables</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Adding var/in is just like any other other extensions we made to -Kaleidoscope: we extend the lexer, the parser, the AST and the code generator. -The first step for adding our new 'var/in' construct is to extend the lexer. -As before, this is pretty trivial, the code looks like this:</p> - -<div class="doc_code"> -<pre> -enum Token { - ... - <b>// var definition - tok_var = -13</b> -... -} -... -static int gettok() { -... - if (IdentifierStr == "in") return tok_in; - if (IdentifierStr == "binary") return tok_binary; - if (IdentifierStr == "unary") return tok_unary; - <b>if (IdentifierStr == "var") return tok_var;</b> - return tok_identifier; -... -</pre> -</div> - -<p>The next step is to define the AST node that we will construct. For var/in, -it looks like this:</p> - -<div class="doc_code"> -<pre> -/// VarExprAST - Expression class for var/in -class VarExprAST : public ExprAST { - std::vector<std::pair<std::string, ExprAST*> > VarNames; - ExprAST *Body; -public: - VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames, - ExprAST *body) - : VarNames(varnames), Body(body) {} - - virtual Value *Codegen(); -}; -</pre> -</div> - -<p>var/in allows a list of names to be defined all at once, and each name can -optionally have an initializer value. As such, we capture this information in -the VarNames vector. Also, var/in has a body, this body is allowed to access -the variables defined by the var/in.</p> - -<p>With this in place, we can define the parser pieces. The first thing we do is add -it as a primary expression:</p> - -<div class="doc_code"> -<pre> -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -/// ::= ifexpr -/// ::= forexpr -<b>/// ::= varexpr</b> -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - case tok_if: return ParseIfExpr(); - case tok_for: return ParseForExpr(); - <b>case tok_var: return ParseVarExpr();</b> - } -} -</pre> -</div> - -<p>Next we define ParseVarExpr:</p> - -<div class="doc_code"> -<pre> -/// varexpr ::= 'var' identifier ('=' expression)? -// (',' identifier ('=' expression)?)* 'in' expression -static ExprAST *ParseVarExpr() { - getNextToken(); // eat the var. - - std::vector<std::pair<std::string, ExprAST*> > VarNames; - - // At least one variable name is required. - if (CurTok != tok_identifier) - return Error("expected identifier after var"); -</pre> -</div> - -<p>The first part of this code parses the list of identifier/expr pairs into the -local <tt>VarNames</tt> vector. - -<div class="doc_code"> -<pre> - while (1) { - std::string Name = IdentifierStr; - getNextToken(); // eat identifier. - - // Read the optional initializer. - ExprAST *Init = 0; - if (CurTok == '=') { - getNextToken(); // eat the '='. - - Init = ParseExpression(); - if (Init == 0) return 0; - } - - VarNames.push_back(std::make_pair(Name, Init)); - - // End of var list, exit loop. - if (CurTok != ',') break; - getNextToken(); // eat the ','. - - if (CurTok != tok_identifier) - return Error("expected identifier list after var"); - } -</pre> -</div> - -<p>Once all the variables are parsed, we then parse the body and create the -AST node:</p> - -<div class="doc_code"> -<pre> - // At this point, we have to have 'in'. - if (CurTok != tok_in) - return Error("expected 'in' keyword after 'var'"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new VarExprAST(VarNames, Body); -} -</pre> -</div> - -<p>Now that we can parse and represent the code, we need to support emission of -LLVM IR for it. This code starts out with:</p> - -<div class="doc_code"> -<pre> -Value *VarExprAST::Codegen() { - std::vector<AllocaInst *> OldBindings; - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Register all variables and emit their initializer. - for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { - const std::string &VarName = VarNames[i].first; - ExprAST *Init = VarNames[i].second; -</pre> -</div> - -<p>Basically it loops over all the variables, installing them one at a time. -For each variable we put into the symbol table, we remember the previous value -that we replace in OldBindings.</p> - -<div class="doc_code"> -<pre> - // Emit the initializer before adding the variable to scope, this prevents - // the initializer from referencing the variable itself, and permits stuff - // like this: - // var a = 1 in - // var a = a in ... # refers to outer 'a'. - Value *InitVal; - if (Init) { - InitVal = Init->Codegen(); - if (InitVal == 0) return 0; - } else { // If not specified, use 0.0. - InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0)); - } - - AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); - Builder.CreateStore(InitVal, Alloca); - - // Remember the old variable binding so that we can restore the binding when - // we unrecurse. - OldBindings.push_back(NamedValues[VarName]); - - // Remember this binding. - NamedValues[VarName] = Alloca; - } -</pre> -</div> - -<p>There are more comments here than code. The basic idea is that we emit the -initializer, create the alloca, then update the symbol table to point to it. -Once all the variables are installed in the symbol table, we evaluate the body -of the var/in expression:</p> - -<div class="doc_code"> -<pre> - // Codegen the body, now that all vars are in scope. - Value *BodyVal = Body->Codegen(); - if (BodyVal == 0) return 0; -</pre> -</div> - -<p>Finally, before returning, we restore the previous variable bindings:</p> - -<div class="doc_code"> -<pre> - // Pop all our variables from scope. - for (unsigned i = 0, e = VarNames.size(); i != e; ++i) - NamedValues[VarNames[i].first] = OldBindings[i]; - - // Return the body computation. - return BodyVal; -} -</pre> -</div> - -<p>The end result of all of this is that we get properly scoped variable -definitions, and we even (trivially) allow mutation of them :).</p> - -<p>With this, we completed what we set out to do. Our nice iterative fib -example from the intro compiles and runs just fine. The mem2reg pass optimizes -all of our stack variables into SSA registers, inserting PHI nodes where needed, -and our front-end remains simple: no "iterated dominance frontier" computation -anywhere in sight.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for our running example, enhanced with mutable -variables and var/in support. To build this example, use: -</p> - -<div class="doc_code"> -<pre> - # Compile - g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy - # Run - ./toy -</pre> -</div> - -<p>Here is the code:</p> - -<div class="doc_code"> -<pre> -#include "llvm/DerivedTypes.h" -#include "llvm/ExecutionEngine/ExecutionEngine.h" -#include "llvm/ExecutionEngine/JIT.h" -#include "llvm/LLVMContext.h" -#include "llvm/Module.h" -#include "llvm/PassManager.h" -#include "llvm/Analysis/Verifier.h" -#include "llvm/Target/TargetData.h" -#include "llvm/Target/TargetSelect.h" -#include "llvm/Transforms/Scalar.h" -#include "llvm/Support/IRBuilder.h" -#include <cstdio> -#include <string> -#include <map> -#include <vector> -using namespace llvm; - -//===----------------------------------------------------------------------===// -// Lexer -//===----------------------------------------------------------------------===// - -// The lexer returns tokens [0-255] if it is an unknown character, otherwise one -// of these for known things. -enum Token { - tok_eof = -1, - - // commands - tok_def = -2, tok_extern = -3, - - // primary - tok_identifier = -4, tok_number = -5, - - // control - tok_if = -6, tok_then = -7, tok_else = -8, - tok_for = -9, tok_in = -10, - - // operators - tok_binary = -11, tok_unary = -12, - - // var definition - tok_var = -13 -}; - -static std::string IdentifierStr; // Filled in if tok_identifier -static double NumVal; // Filled in if tok_number - -/// gettok - Return the next token from standard input. -static int gettok() { - static int LastChar = ' '; - - // Skip any whitespace. - while (isspace(LastChar)) - LastChar = getchar(); - - if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* - IdentifierStr = LastChar; - while (isalnum((LastChar = getchar()))) - IdentifierStr += LastChar; - - if (IdentifierStr == "def") return tok_def; - if (IdentifierStr == "extern") return tok_extern; - if (IdentifierStr == "if") return tok_if; - if (IdentifierStr == "then") return tok_then; - if (IdentifierStr == "else") return tok_else; - if (IdentifierStr == "for") return tok_for; - if (IdentifierStr == "in") return tok_in; - if (IdentifierStr == "binary") return tok_binary; - if (IdentifierStr == "unary") return tok_unary; - if (IdentifierStr == "var") return tok_var; - return tok_identifier; - } - - if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ - std::string NumStr; - do { - NumStr += LastChar; - LastChar = getchar(); - } while (isdigit(LastChar) || LastChar == '.'); - - NumVal = strtod(NumStr.c_str(), 0); - return tok_number; - } - - if (LastChar == '#') { - // Comment until end of line. - do LastChar = getchar(); - while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); - - if (LastChar != EOF) - return gettok(); - } - - // Check for end of file. Don't eat the EOF. - if (LastChar == EOF) - return tok_eof; - - // Otherwise, just return the character as its ascii value. - int ThisChar = LastChar; - LastChar = getchar(); - return ThisChar; -} - -//===----------------------------------------------------------------------===// -// Abstract Syntax Tree (aka Parse Tree) -//===----------------------------------------------------------------------===// - -/// ExprAST - Base class for all expression nodes. -class ExprAST { -public: - virtual ~ExprAST() {} - virtual Value *Codegen() = 0; -}; - -/// NumberExprAST - Expression class for numeric literals like "1.0". -class NumberExprAST : public ExprAST { - double Val; -public: - NumberExprAST(double val) : Val(val) {} - virtual Value *Codegen(); -}; - -/// VariableExprAST - Expression class for referencing a variable, like "a". -class VariableExprAST : public ExprAST { - std::string Name; -public: - VariableExprAST(const std::string &name) : Name(name) {} - const std::string &getName() const { return Name; } - virtual Value *Codegen(); -}; - -/// UnaryExprAST - Expression class for a unary operator. -class UnaryExprAST : public ExprAST { - char Opcode; - ExprAST *Operand; -public: - UnaryExprAST(char opcode, ExprAST *operand) - : Opcode(opcode), Operand(operand) {} - virtual Value *Codegen(); -}; - -/// BinaryExprAST - Expression class for a binary operator. -class BinaryExprAST : public ExprAST { - char Op; - ExprAST *LHS, *RHS; -public: - BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) - : Op(op), LHS(lhs), RHS(rhs) {} - virtual Value *Codegen(); -}; - -/// CallExprAST - Expression class for function calls. -class CallExprAST : public ExprAST { - std::string Callee; - std::vector<ExprAST*> Args; -public: - CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) - : Callee(callee), Args(args) {} - virtual Value *Codegen(); -}; - -/// IfExprAST - Expression class for if/then/else. -class IfExprAST : public ExprAST { - ExprAST *Cond, *Then, *Else; -public: - IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) - : Cond(cond), Then(then), Else(_else) {} - virtual Value *Codegen(); -}; - -/// ForExprAST - Expression class for for/in. -class ForExprAST : public ExprAST { - std::string VarName; - ExprAST *Start, *End, *Step, *Body; -public: - ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, - ExprAST *step, ExprAST *body) - : VarName(varname), Start(start), End(end), Step(step), Body(body) {} - virtual Value *Codegen(); -}; - -/// VarExprAST - Expression class for var/in -class VarExprAST : public ExprAST { - std::vector<std::pair<std::string, ExprAST*> > VarNames; - ExprAST *Body; -public: - VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames, - ExprAST *body) - : VarNames(varnames), Body(body) {} - - virtual Value *Codegen(); -}; - -/// PrototypeAST - This class represents the "prototype" for a function, -/// which captures its name, and its argument names (thus implicitly the number -/// of arguments the function takes), as well as if it is an operator. -class PrototypeAST { - std::string Name; - std::vector<std::string> Args; - bool isOperator; - unsigned Precedence; // Precedence if a binary op. -public: - PrototypeAST(const std::string &name, const std::vector<std::string> &args, - bool isoperator = false, unsigned prec = 0) - : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} - - bool isUnaryOp() const { return isOperator && Args.size() == 1; } - bool isBinaryOp() const { return isOperator && Args.size() == 2; } - - char getOperatorName() const { - assert(isUnaryOp() || isBinaryOp()); - return Name[Name.size()-1]; - } - - unsigned getBinaryPrecedence() const { return Precedence; } - - Function *Codegen(); - - void CreateArgumentAllocas(Function *F); -}; - -/// FunctionAST - This class represents a function definition itself. -class FunctionAST { - PrototypeAST *Proto; - ExprAST *Body; -public: - FunctionAST(PrototypeAST *proto, ExprAST *body) - : Proto(proto), Body(body) {} - - Function *Codegen(); -}; - -//===----------------------------------------------------------------------===// -// Parser -//===----------------------------------------------------------------------===// - -/// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current -/// token the parser is looking at. getNextToken reads another token from the -/// lexer and updates CurTok with its results. -static int CurTok; -static int getNextToken() { - return CurTok = gettok(); -} - -/// BinopPrecedence - This holds the precedence for each binary operator that is -/// defined. -static std::map<char, int> BinopPrecedence; - -/// GetTokPrecedence - Get the precedence of the pending binary operator token. -static int GetTokPrecedence() { - if (!isascii(CurTok)) - return -1; - - // Make sure it's a declared binop. - int TokPrec = BinopPrecedence[CurTok]; - if (TokPrec <= 0) return -1; - return TokPrec; -} - -/// Error* - These are little helper functions for error handling. -ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} -PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } -FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } - -static ExprAST *ParseExpression(); - -/// identifierexpr -/// ::= identifier -/// ::= identifier '(' expression* ')' -static ExprAST *ParseIdentifierExpr() { - std::string IdName = IdentifierStr; - - getNextToken(); // eat identifier. - - if (CurTok != '(') // Simple variable ref. - return new VariableExprAST(IdName); - - // Call. - getNextToken(); // eat ( - std::vector<ExprAST*> Args; - if (CurTok != ')') { - while (1) { - ExprAST *Arg = ParseExpression(); - if (!Arg) return 0; - Args.push_back(Arg); - - if (CurTok == ')') break; - - if (CurTok != ',') - return Error("Expected ')' or ',' in argument list"); - getNextToken(); - } - } - - // Eat the ')'. - getNextToken(); - - return new CallExprAST(IdName, Args); -} - -/// numberexpr ::= number -static ExprAST *ParseNumberExpr() { - ExprAST *Result = new NumberExprAST(NumVal); - getNextToken(); // consume the number - return Result; -} - -/// parenexpr ::= '(' expression ')' -static ExprAST *ParseParenExpr() { - getNextToken(); // eat (. - ExprAST *V = ParseExpression(); - if (!V) return 0; - - if (CurTok != ')') - return Error("expected ')'"); - getNextToken(); // eat ). - return V; -} - -/// ifexpr ::= 'if' expression 'then' expression 'else' expression -static ExprAST *ParseIfExpr() { - getNextToken(); // eat the if. - - // condition. - ExprAST *Cond = ParseExpression(); - if (!Cond) return 0; - - if (CurTok != tok_then) - return Error("expected then"); - getNextToken(); // eat the then - - ExprAST *Then = ParseExpression(); - if (Then == 0) return 0; - - if (CurTok != tok_else) - return Error("expected else"); - - getNextToken(); - - ExprAST *Else = ParseExpression(); - if (!Else) return 0; - - return new IfExprAST(Cond, Then, Else); -} - -/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression -static ExprAST *ParseForExpr() { - getNextToken(); // eat the for. - - if (CurTok != tok_identifier) - return Error("expected identifier after for"); - - std::string IdName = IdentifierStr; - getNextToken(); // eat identifier. - - if (CurTok != '=') - return Error("expected '=' after for"); - getNextToken(); // eat '='. - - - ExprAST *Start = ParseExpression(); - if (Start == 0) return 0; - if (CurTok != ',') - return Error("expected ',' after for start value"); - getNextToken(); - - ExprAST *End = ParseExpression(); - if (End == 0) return 0; - - // The step value is optional. - ExprAST *Step = 0; - if (CurTok == ',') { - getNextToken(); - Step = ParseExpression(); - if (Step == 0) return 0; - } - - if (CurTok != tok_in) - return Error("expected 'in' after for"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new ForExprAST(IdName, Start, End, Step, Body); -} - -/// varexpr ::= 'var' identifier ('=' expression)? -// (',' identifier ('=' expression)?)* 'in' expression -static ExprAST *ParseVarExpr() { - getNextToken(); // eat the var. - - std::vector<std::pair<std::string, ExprAST*> > VarNames; - - // At least one variable name is required. - if (CurTok != tok_identifier) - return Error("expected identifier after var"); - - while (1) { - std::string Name = IdentifierStr; - getNextToken(); // eat identifier. - - // Read the optional initializer. - ExprAST *Init = 0; - if (CurTok == '=') { - getNextToken(); // eat the '='. - - Init = ParseExpression(); - if (Init == 0) return 0; - } - - VarNames.push_back(std::make_pair(Name, Init)); - - // End of var list, exit loop. - if (CurTok != ',') break; - getNextToken(); // eat the ','. - - if (CurTok != tok_identifier) - return Error("expected identifier list after var"); - } - - // At this point, we have to have 'in'. - if (CurTok != tok_in) - return Error("expected 'in' keyword after 'var'"); - getNextToken(); // eat 'in'. - - ExprAST *Body = ParseExpression(); - if (Body == 0) return 0; - - return new VarExprAST(VarNames, Body); -} - -/// primary -/// ::= identifierexpr -/// ::= numberexpr -/// ::= parenexpr -/// ::= ifexpr -/// ::= forexpr -/// ::= varexpr -static ExprAST *ParsePrimary() { - switch (CurTok) { - default: return Error("unknown token when expecting an expression"); - case tok_identifier: return ParseIdentifierExpr(); - case tok_number: return ParseNumberExpr(); - case '(': return ParseParenExpr(); - case tok_if: return ParseIfExpr(); - case tok_for: return ParseForExpr(); - case tok_var: return ParseVarExpr(); - } -} - -/// unary -/// ::= primary -/// ::= '!' unary -static ExprAST *ParseUnary() { - // If the current token is not an operator, it must be a primary expr. - if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') - return ParsePrimary(); - - // If this is a unary operator, read it. - int Opc = CurTok; - getNextToken(); - if (ExprAST *Operand = ParseUnary()) - return new UnaryExprAST(Opc, Operand); - return 0; -} - -/// binoprhs -/// ::= ('+' unary)* -static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { - // If this is a binop, find its precedence. - while (1) { - int TokPrec = GetTokPrecedence(); - - // If this is a binop that binds at least as tightly as the current binop, - // consume it, otherwise we are done. - if (TokPrec < ExprPrec) - return LHS; - - // Okay, we know this is a binop. - int BinOp = CurTok; - getNextToken(); // eat binop - - // Parse the unary expression after the binary operator. - ExprAST *RHS = ParseUnary(); - if (!RHS) return 0; - - // If BinOp binds less tightly with RHS than the operator after RHS, let - // the pending operator take RHS as its LHS. - int NextPrec = GetTokPrecedence(); - if (TokPrec < NextPrec) { - RHS = ParseBinOpRHS(TokPrec+1, RHS); - if (RHS == 0) return 0; - } - - // Merge LHS/RHS. - LHS = new BinaryExprAST(BinOp, LHS, RHS); - } -} - -/// expression -/// ::= unary binoprhs -/// -static ExprAST *ParseExpression() { - ExprAST *LHS = ParseUnary(); - if (!LHS) return 0; - - return ParseBinOpRHS(0, LHS); -} - -/// prototype -/// ::= id '(' id* ')' -/// ::= binary LETTER number? (id, id) -/// ::= unary LETTER (id) -static PrototypeAST *ParsePrototype() { - std::string FnName; - - unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. - unsigned BinaryPrecedence = 30; - - switch (CurTok) { - default: - return ErrorP("Expected function name in prototype"); - case tok_identifier: - FnName = IdentifierStr; - Kind = 0; - getNextToken(); - break; - case tok_unary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected unary operator"); - FnName = "unary"; - FnName += (char)CurTok; - Kind = 1; - getNextToken(); - break; - case tok_binary: - getNextToken(); - if (!isascii(CurTok)) - return ErrorP("Expected binary operator"); - FnName = "binary"; - FnName += (char)CurTok; - Kind = 2; - getNextToken(); - - // Read the precedence if present. - if (CurTok == tok_number) { - if (NumVal < 1 || NumVal > 100) - return ErrorP("Invalid precedecnce: must be 1..100"); - BinaryPrecedence = (unsigned)NumVal; - getNextToken(); - } - break; - } - - if (CurTok != '(') - return ErrorP("Expected '(' in prototype"); - - std::vector<std::string> ArgNames; - while (getNextToken() == tok_identifier) - ArgNames.push_back(IdentifierStr); - if (CurTok != ')') - return ErrorP("Expected ')' in prototype"); - - // success. - getNextToken(); // eat ')'. - - // Verify right number of names for operator. - if (Kind && ArgNames.size() != Kind) - return ErrorP("Invalid number of operands for operator"); - - return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); -} - -/// definition ::= 'def' prototype expression -static FunctionAST *ParseDefinition() { - getNextToken(); // eat def. - PrototypeAST *Proto = ParsePrototype(); - if (Proto == 0) return 0; - - if (ExprAST *E = ParseExpression()) - return new FunctionAST(Proto, E); - return 0; -} - -/// toplevelexpr ::= expression -static FunctionAST *ParseTopLevelExpr() { - if (ExprAST *E = ParseExpression()) { - // Make an anonymous proto. - PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); - return new FunctionAST(Proto, E); - } - return 0; -} - -/// external ::= 'extern' prototype -static PrototypeAST *ParseExtern() { - getNextToken(); // eat extern. - return ParsePrototype(); -} - -//===----------------------------------------------------------------------===// -// Code Generation -//===----------------------------------------------------------------------===// - -static Module *TheModule; -static IRBuilder<> Builder(getGlobalContext()); -static std::map<std::string, AllocaInst*> NamedValues; -static FunctionPassManager *TheFPM; - -Value *ErrorV(const char *Str) { Error(Str); return 0; } - -/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of -/// the function. This is used for mutable variables etc. -static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, - const std::string &VarName) { - IRBuilder<> TmpB(&TheFunction->getEntryBlock(), - TheFunction->getEntryBlock().begin()); - return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, - VarName.c_str()); -} - -Value *NumberExprAST::Codegen() { - return ConstantFP::get(getGlobalContext(), APFloat(Val)); -} - -Value *VariableExprAST::Codegen() { - // Look this variable up in the function. - Value *V = NamedValues[Name]; - if (V == 0) return ErrorV("Unknown variable name"); - - // Load the value. - return Builder.CreateLoad(V, Name.c_str()); -} - -Value *UnaryExprAST::Codegen() { - Value *OperandV = Operand->Codegen(); - if (OperandV == 0) return 0; - - Function *F = TheModule->getFunction(std::string("unary")+Opcode); - if (F == 0) - return ErrorV("Unknown unary operator"); - - return Builder.CreateCall(F, OperandV, "unop"); -} - -Value *BinaryExprAST::Codegen() { - // Special case '=' because we don't want to emit the LHS as an expression. - if (Op == '=') { - // Assignment requires the LHS to be an identifier. - VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS); - if (!LHSE) - return ErrorV("destination of '=' must be a variable"); - // Codegen the RHS. - Value *Val = RHS->Codegen(); - if (Val == 0) return 0; - - // Look up the name. - Value *Variable = NamedValues[LHSE->getName()]; - if (Variable == 0) return ErrorV("Unknown variable name"); - - Builder.CreateStore(Val, Variable); - return Val; - } - - Value *L = LHS->Codegen(); - Value *R = RHS->Codegen(); - if (L == 0 || R == 0) return 0; - - switch (Op) { - case '+': return Builder.CreateAdd(L, R, "addtmp"); - case '-': return Builder.CreateSub(L, R, "subtmp"); - case '*': return Builder.CreateMul(L, R, "multmp"); - case '<': - L = Builder.CreateFCmpULT(L, R, "cmptmp"); - // Convert bool 0/1 to double 0.0 or 1.0 - return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), - "booltmp"); - default: break; - } - - // If it wasn't a builtin binary operator, it must be a user defined one. Emit - // a call to it. - Function *F = TheModule->getFunction(std::string("binary")+Op); - assert(F && "binary operator not found!"); - - Value *Ops[] = { L, R }; - return Builder.CreateCall(F, Ops, Ops+2, "binop"); -} - -Value *CallExprAST::Codegen() { - // Look up the name in the global module table. - Function *CalleeF = TheModule->getFunction(Callee); - if (CalleeF == 0) - return ErrorV("Unknown function referenced"); - - // If argument mismatch error. - if (CalleeF->arg_size() != Args.size()) - return ErrorV("Incorrect # arguments passed"); - - std::vector<Value*> ArgsV; - for (unsigned i = 0, e = Args.size(); i != e; ++i) { - ArgsV.push_back(Args[i]->Codegen()); - if (ArgsV.back() == 0) return 0; - } - - return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp"); -} - -Value *IfExprAST::Codegen() { - Value *CondV = Cond->Codegen(); - if (CondV == 0) return 0; - - // Convert condition to a bool by comparing equal to 0.0. - CondV = Builder.CreateFCmpONE(CondV, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "ifcond"); - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Create blocks for the then and else cases. Insert the 'then' block at the - // end of the function. - BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); - BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); - BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); - - Builder.CreateCondBr(CondV, ThenBB, ElseBB); - - // Emit then value. - Builder.SetInsertPoint(ThenBB); - - Value *ThenV = Then->Codegen(); - if (ThenV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Then' can change the current block, update ThenBB for the PHI. - ThenBB = Builder.GetInsertBlock(); - - // Emit else block. - TheFunction->getBasicBlockList().push_back(ElseBB); - Builder.SetInsertPoint(ElseBB); - - Value *ElseV = Else->Codegen(); - if (ElseV == 0) return 0; - - Builder.CreateBr(MergeBB); - // Codegen of 'Else' can change the current block, update ElseBB for the PHI. - ElseBB = Builder.GetInsertBlock(); - - // Emit merge block. - TheFunction->getBasicBlockList().push_back(MergeBB); - Builder.SetInsertPoint(MergeBB); - PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), - "iftmp"); - - PN->addIncoming(ThenV, ThenBB); - PN->addIncoming(ElseV, ElseBB); - return PN; -} - -Value *ForExprAST::Codegen() { - // Output this as: - // var = alloca double - // ... - // start = startexpr - // store start -> var - // goto loop - // loop: - // ... - // bodyexpr - // ... - // loopend: - // step = stepexpr - // endcond = endexpr - // - // curvar = load var - // nextvar = curvar + step - // store nextvar -> var - // br endcond, loop, endloop - // outloop: - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Create an alloca for the variable in the entry block. - AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); - - // Emit the start code first, without 'variable' in scope. - Value *StartVal = Start->Codegen(); - if (StartVal == 0) return 0; - - // Store the value into the alloca. - Builder.CreateStore(StartVal, Alloca); - - // Make the new basic block for the loop header, inserting after current - // block. - BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); - - // Insert an explicit fall through from the current block to the LoopBB. - Builder.CreateBr(LoopBB); - - // Start insertion in LoopBB. - Builder.SetInsertPoint(LoopBB); - - // Within the loop, the variable is defined equal to the PHI node. If it - // shadows an existing variable, we have to restore it, so save it now. - AllocaInst *OldVal = NamedValues[VarName]; - NamedValues[VarName] = Alloca; - - // Emit the body of the loop. This, like any other expr, can change the - // current BB. Note that we ignore the value computed by the body, but don't - // allow an error. - if (Body->Codegen() == 0) - return 0; - - // Emit the step value. - Value *StepVal; - if (Step) { - StepVal = Step->Codegen(); - if (StepVal == 0) return 0; - } else { - // If not specified, use 1.0. - StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); - } - - // Compute the end condition. - Value *EndCond = End->Codegen(); - if (EndCond == 0) return EndCond; - - // Reload, increment, and restore the alloca. This handles the case where - // the body of the loop mutates the variable. - Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str()); - Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar"); - Builder.CreateStore(NextVar, Alloca); - - // Convert condition to a bool by comparing equal to 0.0. - EndCond = Builder.CreateFCmpONE(EndCond, - ConstantFP::get(getGlobalContext(), APFloat(0.0)), - "loopcond"); - - // Create the "after loop" block and insert it. - BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); - - // Insert the conditional branch into the end of LoopEndBB. - Builder.CreateCondBr(EndCond, LoopBB, AfterBB); - - // Any new code will be inserted in AfterBB. - Builder.SetInsertPoint(AfterBB); - - // Restore the unshadowed variable. - if (OldVal) - NamedValues[VarName] = OldVal; - else - NamedValues.erase(VarName); - - - // for expr always returns 0.0. - return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); -} - -Value *VarExprAST::Codegen() { - std::vector<AllocaInst *> OldBindings; - - Function *TheFunction = Builder.GetInsertBlock()->getParent(); - - // Register all variables and emit their initializer. - for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { - const std::string &VarName = VarNames[i].first; - ExprAST *Init = VarNames[i].second; - - // Emit the initializer before adding the variable to scope, this prevents - // the initializer from referencing the variable itself, and permits stuff - // like this: - // var a = 1 in - // var a = a in ... # refers to outer 'a'. - Value *InitVal; - if (Init) { - InitVal = Init->Codegen(); - if (InitVal == 0) return 0; - } else { // If not specified, use 0.0. - InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0)); - } - - AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); - Builder.CreateStore(InitVal, Alloca); - - // Remember the old variable binding so that we can restore the binding when - // we unrecurse. - OldBindings.push_back(NamedValues[VarName]); - - // Remember this binding. - NamedValues[VarName] = Alloca; - } - - // Codegen the body, now that all vars are in scope. - Value *BodyVal = Body->Codegen(); - if (BodyVal == 0) return 0; - - // Pop all our variables from scope. - for (unsigned i = 0, e = VarNames.size(); i != e; ++i) - NamedValues[VarNames[i].first] = OldBindings[i]; - - // Return the body computation. - return BodyVal; -} - -Function *PrototypeAST::Codegen() { - // Make the function type: double(double,double) etc. - std::vector<const Type*> Doubles(Args.size(), - Type::getDoubleTy(getGlobalContext())); - FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), - Doubles, false); - - Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); - - // If F conflicted, there was already something named 'Name'. If it has a - // body, don't allow redefinition or reextern. - if (F->getName() != Name) { - // Delete the one we just made and get the existing one. - F->eraseFromParent(); - F = TheModule->getFunction(Name); - - // If F already has a body, reject this. - if (!F->empty()) { - ErrorF("redefinition of function"); - return 0; - } - - // If F took a different number of args, reject. - if (F->arg_size() != Args.size()) { - ErrorF("redefinition of function with different # args"); - return 0; - } - } - - // Set names for all arguments. - unsigned Idx = 0; - for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); - ++AI, ++Idx) - AI->setName(Args[Idx]); - - return F; -} - -/// CreateArgumentAllocas - Create an alloca for each argument and register the -/// argument in the symbol table so that references to it will succeed. -void PrototypeAST::CreateArgumentAllocas(Function *F) { - Function::arg_iterator AI = F->arg_begin(); - for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { - // Create an alloca for this variable. - AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); - - // Store the initial value into the alloca. - Builder.CreateStore(AI, Alloca); - - // Add arguments to variable symbol table. - NamedValues[Args[Idx]] = Alloca; - } -} - -Function *FunctionAST::Codegen() { - NamedValues.clear(); - - Function *TheFunction = Proto->Codegen(); - if (TheFunction == 0) - return 0; - - // If this is an operator, install it. - if (Proto->isBinaryOp()) - BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); - - // Create a new basic block to start insertion into. - BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); - Builder.SetInsertPoint(BB); - - // Add all arguments to the symbol table and create their allocas. - Proto->CreateArgumentAllocas(TheFunction); - - if (Value *RetVal = Body->Codegen()) { - // Finish off the function. - Builder.CreateRet(RetVal); - - // Validate the generated code, checking for consistency. - verifyFunction(*TheFunction); - - // Optimize the function. - TheFPM->run(*TheFunction); - - return TheFunction; - } - - // Error reading body, remove function. - TheFunction->eraseFromParent(); - - if (Proto->isBinaryOp()) - BinopPrecedence.erase(Proto->getOperatorName()); - return 0; -} - -//===----------------------------------------------------------------------===// -// Top-Level parsing and JIT Driver -//===----------------------------------------------------------------------===// - -static ExecutionEngine *TheExecutionEngine; - -static void HandleDefinition() { - if (FunctionAST *F = ParseDefinition()) { - if (Function *LF = F->Codegen()) { - fprintf(stderr, "Read function definition:"); - LF->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleExtern() { - if (PrototypeAST *P = ParseExtern()) { - if (Function *F = P->Codegen()) { - fprintf(stderr, "Read extern: "); - F->dump(); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -static void HandleTopLevelExpression() { - // Evaluate a top-level expression into an anonymous function. - if (FunctionAST *F = ParseTopLevelExpr()) { - if (Function *LF = F->Codegen()) { - // JIT the function, returning a function pointer. - void *FPtr = TheExecutionEngine->getPointerToFunction(LF); - - // Cast it to the right type (takes no arguments, returns a double) so we - // can call it as a native function. - double (*FP)() = (double (*)())(intptr_t)FPtr; - fprintf(stderr, "Evaluated to %f\n", FP()); - } - } else { - // Skip token for error recovery. - getNextToken(); - } -} - -/// top ::= definition | external | expression | ';' -static void MainLoop() { - while (1) { - fprintf(stderr, "ready> "); - switch (CurTok) { - case tok_eof: return; - case ';': getNextToken(); break; // ignore top-level semicolons. - case tok_def: HandleDefinition(); break; - case tok_extern: HandleExtern(); break; - default: HandleTopLevelExpression(); break; - } - } -} - -//===----------------------------------------------------------------------===// -// "Library" functions that can be "extern'd" from user code. -//===----------------------------------------------------------------------===// - -/// putchard - putchar that takes a double and returns 0. -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} - -/// printd - printf that takes a double prints it as "%f\n", returning 0. -extern "C" -double printd(double X) { - printf("%f\n", X); - return 0; -} - -//===----------------------------------------------------------------------===// -// Main driver code. -//===----------------------------------------------------------------------===// - -int main() { - InitializeNativeTarget(); - LLVMContext &Context = getGlobalContext(); - - // Install standard binary operators. - // 1 is lowest precedence. - BinopPrecedence['='] = 2; - BinopPrecedence['<'] = 10; - BinopPrecedence['+'] = 20; - BinopPrecedence['-'] = 20; - BinopPrecedence['*'] = 40; // highest. - - // Prime the first token. - fprintf(stderr, "ready> "); - getNextToken(); - - // Make the module, which holds all the code. - TheModule = new Module("my cool jit", Context); - - // Create the JIT. This takes ownership of the module. - std::string ErrStr; - TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); - if (!TheExecutionEngine) { - fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); - exit(1); - } - - FunctionPassManager OurFPM(TheModule); - - // Set up the optimizer pipeline. Start with registering info about how the - // target lays out data structures. - OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData())); - // Promote allocas to registers. - OurFPM.add(createPromoteMemoryToRegisterPass()); - // Do simple "peephole" optimizations and bit-twiddling optzns. - OurFPM.add(createInstructionCombiningPass()); - // Reassociate expressions. - OurFPM.add(createReassociatePass()); - // Eliminate Common SubExpressions. - OurFPM.add(createGVNPass()); - // Simplify the control flow graph (deleting unreachable blocks, etc). - OurFPM.add(createCFGSimplificationPass()); - - OurFPM.doInitialization(); - - // Set the global so the code gen can use this. - TheFPM = &OurFPM; - - // Run the main "interpreter loop" now. - MainLoop(); - - TheFPM = 0; - - // Print out all of the generated code. - TheModule->dump(); - - return 0; -} -</pre> -</div> - -<a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/LangImpl8.html b/docs/tutorial/LangImpl8.html deleted file mode 100644 index 64a6200..0000000 --- a/docs/tutorial/LangImpl8.html +++ /dev/null @@ -1,365 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Conclusion and other useful LLVM tidbits</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Conclusion and other useful LLVM - tidbits</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 8 - <ol> - <li><a href="#conclusion">Tutorial Conclusion</a></li> - <li><a href="#llvmirproperties">Properties of LLVM IR</a> - <ul> - <li><a href="#targetindep">Target Independence</a></li> - <li><a href="#safety">Safety Guarantees</a></li> - <li><a href="#langspecific">Language-Specific Optimizations</a></li> - </ul> - </li> - <li><a href="#tipsandtricks">Tips and Tricks</a> - <ul> - <li><a href="#offsetofsizeof">Implementing portable - offsetof/sizeof</a></li> - <li><a href="#gcstack">Garbage Collected Stack Frames</a></li> - </ul> - </li> - </ol> -</li> -</ul> - - -<div class="doc_author"> - <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="conclusion">Tutorial Conclusion</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to the the final chapter of the "<a href="index.html">Implementing a -language with LLVM</a>" tutorial. In the course of this tutorial, we have grown -our little Kaleidoscope language from being a useless toy, to being a -semi-interesting (but probably still useless) toy. :)</p> - -<p>It is interesting to see how far we've come, and how little code it has -taken. We built the entire lexer, parser, AST, code generator, and an -interactive run-loop (with a JIT!) by-hand in under 700 lines of -(non-comment/non-blank) code.</p> - -<p>Our little language supports a couple of interesting features: it supports -user defined binary and unary operators, it uses JIT compilation for immediate -evaluation, and it supports a few control flow constructs with SSA construction. -</p> - -<p>Part of the idea of this tutorial was to show you how easy and fun it can be -to define, build, and play with languages. Building a compiler need not be a -scary or mystical process! Now that you've seen some of the basics, I strongly -encourage you to take the code and hack on it. For example, try adding:</p> - -<ul> -<li><b>global variables</b> - While global variables have questional value in -modern software engineering, they are often useful when putting together quick -little hacks like the Kaleidoscope compiler itself. Fortunately, our current -setup makes it very easy to add global variables: just have value lookup check -to see if an unresolved variable is in the global variable symbol table before -rejecting it. To create a new global variable, make an instance of the LLVM -<tt>GlobalVariable</tt> class.</li> - -<li><b>typed variables</b> - Kaleidoscope currently only supports variables of -type double. This gives the language a very nice elegance, because only -supporting one type means that you never have to specify types. Different -languages have different ways of handling this. The easiest way is to require -the user to specify types for every variable definition, and record the type -of the variable in the symbol table along with its Value*.</li> - -<li><b>arrays, structs, vectors, etc</b> - Once you add types, you can start -extending the type system in all sorts of interesting ways. Simple arrays are -very easy and are quite useful for many different applications. Adding them is -mostly an exercise in learning how the LLVM <a -href="../LangRef.html#i_getelementptr">getelementptr</a> instruction works: it -is so nifty/unconventional, it <a -href="../GetElementPtr.html">has its own FAQ</a>! If you add support -for recursive types (e.g. linked lists), make sure to read the <a -href="../ProgrammersManual.html#TypeResolve">section in the LLVM -Programmer's Manual</a> that describes how to construct them.</li> - -<li><b>standard runtime</b> - Our current language allows the user to access -arbitrary external functions, and we use it for things like "printd" and -"putchard". As you extend the language to add higher-level constructs, often -these constructs make the most sense if they are lowered to calls into a -language-supplied runtime. For example, if you add hash tables to the language, -it would probably make sense to add the routines to a runtime, instead of -inlining them all the way.</li> - -<li><b>memory management</b> - Currently we can only access the stack in -Kaleidoscope. It would also be useful to be able to allocate heap memory, -either with calls to the standard libc malloc/free interface or with a garbage -collector. If you would like to use garbage collection, note that LLVM fully -supports <a href="../GarbageCollection.html">Accurate Garbage Collection</a> -including algorithms that move objects and need to scan/update the stack.</li> - -<li><b>debugger support</b> - LLVM supports generation of <a -href="../SourceLevelDebugging.html">DWARF Debug info</a> which is understood by -common debuggers like GDB. Adding support for debug info is fairly -straightforward. The best way to understand it is to compile some C/C++ code -with "<tt>llvm-gcc -g -O0</tt>" and taking a look at what it produces.</li> - -<li><b>exception handling support</b> - LLVM supports generation of <a -href="../ExceptionHandling.html">zero cost exceptions</a> which interoperate -with code compiled in other languages. You could also generate code by -implicitly making every function return an error value and checking it. You -could also make explicit use of setjmp/longjmp. There are many different ways -to go here.</li> - -<li><b>object orientation, generics, database access, complex numbers, -geometric programming, ...</b> - Really, there is -no end of crazy features that you can add to the language.</li> - -<li><b>unusual domains</b> - We've been talking about applying LLVM to a domain -that many people are interested in: building a compiler for a specific language. -However, there are many other domains that can use compiler technology that are -not typically considered. For example, LLVM has been used to implement OpenGL -graphics acceleration, translate C++ code to ActionScript, and many other -cute and clever things. Maybe you will be the first to JIT compile a regular -expression interpreter into native code with LLVM?</li> - -</ul> - -<p> -Have fun - try doing something crazy and unusual. Building a language like -everyone else always has, is much less fun than trying something a little crazy -or off the wall and seeing how it turns out. If you get stuck or want to talk -about it, feel free to email the <a -href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing -list</a>: it has lots of people who are interested in languages and are often -willing to help out. -</p> - -<p>Before we end this tutorial, I want to talk about some "tips and tricks" for generating -LLVM IR. These are some of the more subtle things that may not be obvious, but -are very useful if you want to take advantage of LLVM's capabilities.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="llvmirproperties">Properties of the LLVM -IR</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>We have a couple common questions about code in the LLVM IR form - lets just -get these out of the way right now, shall we?</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="targetindep">Target -Independence</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>Kaleidoscope is an example of a "portable language": any program written in -Kaleidoscope will work the same way on any target that it runs on. Many other -languages have this property, e.g. lisp, java, haskell, javascript, python, etc -(note that while these languages are portable, not all their libraries are).</p> - -<p>One nice aspect of LLVM is that it is often capable of preserving target -independence in the IR: you can take the LLVM IR for a Kaleidoscope-compiled -program and run it on any target that LLVM supports, even emitting C code and -compiling that on targets that LLVM doesn't support natively. You can trivially -tell that the Kaleidoscope compiler generates target-independent code because it -never queries for any target-specific information when generating code.</p> - -<p>The fact that LLVM provides a compact, target-independent, representation for -code gets a lot of people excited. Unfortunately, these people are usually -thinking about C or a language from the C family when they are asking questions -about language portability. I say "unfortunately", because there is really no -way to make (fully general) C code portable, other than shipping the source code -around (and of course, C source code is not actually portable in general -either - ever port a really old application from 32- to 64-bits?).</p> - -<p>The problem with C (again, in its full generality) is that it is heavily -laden with target specific assumptions. As one simple example, the preprocessor -often destructively removes target-independence from the code when it processes -the input text:</p> - -<div class="doc_code"> -<pre> -#ifdef __i386__ - int X = 1; -#else - int X = 42; -#endif -</pre> -</div> - -<p>While it is possible to engineer more and more complex solutions to problems -like this, it cannot be solved in full generality in a way that is better than shipping -the actual source code.</p> - -<p>That said, there are interesting subsets of C that can be made portable. If -you are willing to fix primitive types to a fixed size (say int = 32-bits, -and long = 64-bits), don't care about ABI compatibility with existing binaries, -and are willing to give up some other minor features, you can have portable -code. This can make sense for specialized domains such as an -in-kernel language.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="safety">Safety Guarantees</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>Many of the languages above are also "safe" languages: it is impossible for -a program written in Java to corrupt its address space and crash the process -(assuming the JVM has no bugs). -Safety is an interesting property that requires a combination of language -design, runtime support, and often operating system support.</p> - -<p>It is certainly possible to implement a safe language in LLVM, but LLVM IR -does not itself guarantee safety. The LLVM IR allows unsafe pointer casts, -use after free bugs, buffer over-runs, and a variety of other problems. Safety -needs to be implemented as a layer on top of LLVM and, conveniently, several -groups have investigated this. Ask on the <a -href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">llvmdev mailing -list</a> if you are interested in more details.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="langspecific">Language-Specific -Optimizations</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>One thing about LLVM that turns off many people is that it does not solve all -the world's problems in one system (sorry 'world hunger', someone else will have -to solve you some other day). One specific complaint is that people perceive -LLVM as being incapable of performing high-level language-specific optimization: -LLVM "loses too much information".</p> - -<p>Unfortunately, this is really not the place to give you a full and unified -version of "Chris Lattner's theory of compiler design". Instead, I'll make a -few observations:</p> - -<p>First, you're right that LLVM does lose information. For example, as of this -writing, there is no way to distinguish in the LLVM IR whether an SSA-value came -from a C "int" or a C "long" on an ILP32 machine (other than debug info). Both -get compiled down to an 'i32' value and the information about what it came from -is lost. The more general issue here, is that the LLVM type system uses -"structural equivalence" instead of "name equivalence". Another place this -surprises people is if you have two types in a high-level language that have the -same structure (e.g. two different structs that have a single int field): these -types will compile down into a single LLVM type and it will be impossible to -tell what it came from.</p> - -<p>Second, while LLVM does lose information, LLVM is not a fixed target: we -continue to enhance and improve it in many different ways. In addition to -adding new features (LLVM did not always support exceptions or debug info), we -also extend the IR to capture important information for optimization (e.g. -whether an argument is sign or zero extended, information about pointers -aliasing, etc). Many of the enhancements are user-driven: people want LLVM to -include some specific feature, so they go ahead and extend it.</p> - -<p>Third, it is <em>possible and easy</em> to add language-specific -optimizations, and you have a number of choices in how to do it. As one trivial -example, it is easy to add language-specific optimization passes that -"know" things about code compiled for a language. In the case of the C family, -there is an optimization pass that "knows" about the standard C library -functions. If you call "exit(0)" in main(), it knows that it is safe to -optimize that into "return 0;" because C specifies what the 'exit' -function does.</p> - -<p>In addition to simple library knowledge, it is possible to embed a variety of -other language-specific information into the LLVM IR. If you have a specific -need and run into a wall, please bring the topic up on the llvmdev list. At the -very worst, you can always treat LLVM as if it were a "dumb code generator" and -implement the high-level optimizations you desire in your front-end, on the -language-specific AST. -</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="tipsandtricks">Tips and Tricks</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>There is a variety of useful tips and tricks that you come to know after -working on/with LLVM that aren't obvious at first glance. Instead of letting -everyone rediscover them, this section talks about some of these issues.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="offsetofsizeof">Implementing portable -offsetof/sizeof</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>One interesting thing that comes up, if you are trying to keep the code -generated by your compiler "target independent", is that you often need to know -the size of some LLVM type or the offset of some field in an llvm structure. -For example, you might need to pass the size of a type into a function that -allocates memory.</p> - -<p>Unfortunately, this can vary widely across targets: for example the width of -a pointer is trivially target-specific. However, there is a <a -href="http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt">clever -way to use the getelementptr instruction</a> that allows you to compute this -in a portable way.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="gcstack">Garbage Collected -Stack Frames</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>Some languages want to explicitly manage their stack frames, often so that -they are garbage collected or to allow easy implementation of closures. There -are often better ways to implement these features than explicit stack frames, -but <a -href="http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt">LLVM -does support them,</a> if you want. It requires your front-end to convert the -code into <a -href="http://en.wikipedia.org/wiki/Continuation-passing_style">Continuation -Passing Style</a> and the use of tail calls (which LLVM also supports).</p> - -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/Makefile b/docs/tutorial/Makefile deleted file mode 100644 index 9082ad4..0000000 --- a/docs/tutorial/Makefile +++ /dev/null @@ -1,28 +0,0 @@ -##===- docs/tutorial/Makefile ------------------------------*- Makefile -*-===## -# -# The LLVM Compiler Infrastructure -# -# This file is distributed under the University of Illinois Open Source -# License. See LICENSE.TXT for details. -# -##===----------------------------------------------------------------------===## - -LEVEL := ../.. -include $(LEVEL)/Makefile.common - -HTML := $(wildcard $(PROJ_SRC_DIR)/*.html) -EXTRA_DIST := $(HTML) index.html -HTML_DIR := $(DESTDIR)$(PROJ_docsdir)/html/tutorial - -install-local:: $(HTML) - $(Echo) Installing HTML Tutorial Documentation - $(Verb) $(MKDIR) $(HTML_DIR) - $(Verb) $(DataInstall) $(HTML) $(HTML_DIR) - $(Verb) $(DataInstall) $(PROJ_SRC_DIR)/index.html $(HTML_DIR) - -uninstall-local:: - $(Echo) Uninstalling Tutorial Documentation - $(Verb) $(RM) -rf $(HTML_DIR) - -printvars:: - $(Echo) "HTML : " '$(HTML)' diff --git a/docs/tutorial/OCamlLangImpl1.html b/docs/tutorial/OCamlLangImpl1.html deleted file mode 100644 index 98c1124..0000000 --- a/docs/tutorial/OCamlLangImpl1.html +++ /dev/null @@ -1,365 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Tutorial Introduction and the Lexer</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Tutorial Introduction and the Lexer</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 1 - <ol> - <li><a href="#intro">Tutorial Introduction</a></li> - <li><a href="#language">The Basic Language</a></li> - <li><a href="#lexer">The Lexer</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl2.html">Chapter 2</a>: Implementing a Parser and -AST</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Tutorial Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to the "Implementing a language with LLVM" tutorial. This tutorial -runs through the implementation of a simple language, showing how fun and -easy it can be. This tutorial will get you up and started as well as help to -build a framework you can extend to other languages. The code in this tutorial -can also be used as a playground to hack on other LLVM specific things. -</p> - -<p> -The goal of this tutorial is to progressively unveil our language, describing -how it is built up over time. This will let us cover a fairly broad range of -language design and LLVM-specific usage issues, showing and explaining the code -for it all along the way, without overwhelming you with tons of details up -front.</p> - -<p>It is useful to point out ahead of time that this tutorial is really about -teaching compiler techniques and LLVM specifically, <em>not</em> about teaching -modern and sane software engineering principles. In practice, this means that -we'll take a number of shortcuts to simplify the exposition. For example, the -code leaks memory, uses global variables all over the place, doesn't use nice -design patterns like <a -href="http://en.wikipedia.org/wiki/Visitor_pattern">visitors</a>, etc... but it -is very simple. If you dig in and use the code as a basis for future projects, -fixing these deficiencies shouldn't be hard.</p> - -<p>I've tried to put this tutorial together in a way that makes chapters easy to -skip over if you are already familiar with or are uninterested in the various -pieces. The structure of the tutorial is: -</p> - -<ul> -<li><b><a href="#language">Chapter #1</a>: Introduction to the Kaleidoscope -language, and the definition of its Lexer</b> - This shows where we are going -and the basic functionality that we want it to do. In order to make this -tutorial maximally understandable and hackable, we choose to implement -everything in Objective Caml instead of using lexer and parser generators. -LLVM obviously works just fine with such tools, feel free to use one if you -prefer.</li> -<li><b><a href="OCamlLangImpl2.html">Chapter #2</a>: Implementing a Parser and -AST</b> - With the lexer in place, we can talk about parsing techniques and -basic AST construction. This tutorial describes recursive descent parsing and -operator precedence parsing. Nothing in Chapters 1 or 2 is LLVM-specific, -the code doesn't even link in LLVM at this point. :)</li> -<li><b><a href="OCamlLangImpl3.html">Chapter #3</a>: Code generation to LLVM -IR</b> - With the AST ready, we can show off how easy generation of LLVM IR -really is.</li> -<li><b><a href="OCamlLangImpl4.html">Chapter #4</a>: Adding JIT and Optimizer -Support</b> - Because a lot of people are interested in using LLVM as a JIT, -we'll dive right into it and show you the 3 lines it takes to add JIT support. -LLVM is also useful in many other ways, but this is one simple and "sexy" way -to shows off its power. :)</li> -<li><b><a href="OCamlLangImpl5.html">Chapter #5</a>: Extending the Language: -Control Flow</b> - With the language up and running, we show how to extend it -with control flow operations (if/then/else and a 'for' loop). This gives us a -chance to talk about simple SSA construction and control flow.</li> -<li><b><a href="OCamlLangImpl6.html">Chapter #6</a>: Extending the Language: -User-defined Operators</b> - This is a silly but fun chapter that talks about -extending the language to let the user program define their own arbitrary -unary and binary operators (with assignable precedence!). This lets us build a -significant piece of the "language" as library routines.</li> -<li><b><a href="OCamlLangImpl7.html">Chapter #7</a>: Extending the Language: -Mutable Variables</b> - This chapter talks about adding user-defined local -variables along with an assignment operator. The interesting part about this -is how easy and trivial it is to construct SSA form in LLVM: no, LLVM does -<em>not</em> require your front-end to construct SSA form!</li> -<li><b><a href="OCamlLangImpl8.html">Chapter #8</a>: Conclusion and other -useful LLVM tidbits</b> - This chapter wraps up the series by talking about -potential ways to extend the language, but also includes a bunch of pointers to -info about "special topics" like adding garbage collection support, exceptions, -debugging, support for "spaghetti stacks", and a bunch of other tips and -tricks.</li> - -</ul> - -<p>By the end of the tutorial, we'll have written a bit less than 700 lines of -non-comment, non-blank, lines of code. With this small amount of code, we'll -have built up a very reasonable compiler for a non-trivial language including -a hand-written lexer, parser, AST, as well as code generation support with a JIT -compiler. While other systems may have interesting "hello world" tutorials, -I think the breadth of this tutorial is a great testament to the strengths of -LLVM and why you should consider it if you're interested in language or compiler -design.</p> - -<p>A note about this tutorial: we expect you to extend the language and play -with it on your own. Take the code and go crazy hacking away at it, compilers -don't need to be scary creatures - it can be a lot of fun to play with -languages!</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="language">The Basic Language</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>This tutorial will be illustrated with a toy language that we'll call -"<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>" (derived -from "meaning beautiful, form, and view"). -Kaleidoscope is a procedural language that allows you to define functions, use -conditionals, math, etc. Over the course of the tutorial, we'll extend -Kaleidoscope to support the if/then/else construct, a for loop, user defined -operators, JIT compilation with a simple command line interface, etc.</p> - -<p>Because we want to keep things simple, the only datatype in Kaleidoscope is a -64-bit floating point type (aka 'float' in O'Caml parlance). As such, all -values are implicitly double precision and the language doesn't require type -declarations. This gives the language a very nice and simple syntax. For -example, the following simple example computes <a -href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers:</a></p> - -<div class="doc_code"> -<pre> -# Compute the x'th fibonacci number. -def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2) - -# This expression will compute the 40th number. -fib(40) -</pre> -</div> - -<p>We also allow Kaleidoscope to call into standard library functions (the LLVM -JIT makes this completely trivial). This means that you can use the 'extern' -keyword to define a function before you use it (this is also useful for mutually -recursive functions). For example:</p> - -<div class="doc_code"> -<pre> -extern sin(arg); -extern cos(arg); -extern atan2(arg1 arg2); - -atan2(sin(.4), cos(42)) -</pre> -</div> - -<p>A more interesting example is included in Chapter 6 where we write a little -Kaleidoscope application that <a href="OCamlLangImpl6.html#example">displays -a Mandelbrot Set</a> at various levels of magnification.</p> - -<p>Lets dive into the implementation of this language!</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="lexer">The Lexer</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>When it comes to implementing a language, the first thing needed is -the ability to process a text file and recognize what it says. The traditional -way to do this is to use a "<a -href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner') -to break the input up into "tokens". Each token returned by the lexer includes -a token code and potentially some metadata (e.g. the numeric value of a number). -First, we define the possibilities: -</p> - -<div class="doc_code"> -<pre> -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char -</pre> -</div> - -<p>Each token returned by our lexer will be one of the token variant values. -An unknown character like '+' will be returned as <tt>Token.Kwd '+'</tt>. If -the curr token is an identifier, the value will be <tt>Token.Ident s</tt>. If -the current token is a numeric literal (like 1.0), the value will be -<tt>Token.Number 1.0</tt>. -</p> - -<p>The actual implementation of the lexer is a collection of functions driven -by a function named <tt>Lexer.lex</tt>. The <tt>Lexer.lex</tt> function is -called to return the next token from standard input. We will use -<a href="http://caml.inria.fr/pub/docs/manual-camlp4/index.html">Camlp4</a> -to simplify the tokenization of the standard input. Its definition starts -as:</p> - -<div class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream -</pre> -</div> - -<p> -<tt>Lexer.lex</tt> works by recursing over a <tt>char Stream.t</tt> to read -characters one at a time from the standard input. It eats them as it recognizes -them and stores them in in a <tt>Token.token</tt> variant. The first thing that -it has to do is ignore whitespace between tokens. This is accomplished with the -recursive call above.</p> - -<p>The next thing <tt>Lexer.lex</tt> needs to do is recognize identifiers and -specific keywords like "def". Kaleidoscope does this with a pattern match -and a helper function.<p> - -<div class="doc_code"> -<pre> - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - -... - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] -</pre> -</div> - -<p>Numeric values are similar:</p> - -<div class="doc_code"> -<pre> - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - -... - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] -</pre> -</div> - -<p>This is all pretty straight-forward code for processing input. When reading -a numeric value from input, we use the ocaml <tt>float_of_string</tt> function -to convert it to a numeric value that we store in <tt>Token.Number</tt>. Note -that this isn't doing sufficient error checking: it will raise <tt>Failure</tt> -if the string "1.23.45.67". Feel free to extend it :). Next we handle -comments: -</p> - -<div class="doc_code"> -<pre> - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - -... - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</div> - -<p>We handle comments by skipping to the end of the line and then return the -next token. Finally, if the input doesn't match one of the above cases, it is -either an operator character like '+' or the end of the file. These are handled -with this code:</p> - -<div class="doc_code"> -<pre> - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] -</pre> -</div> - -<p>With this, we have the complete lexer for the basic Kaleidoscope language -(the <a href="OCamlLangImpl2.html#code">full code listing</a> for the Lexer is -available in the <a href="OCamlLangImpl2.html">next chapter</a> of the -tutorial). Next we'll <a href="OCamlLangImpl2.html">build a simple parser that -uses this to build an Abstract Syntax Tree</a>. When we have that, we'll -include a driver so that you can use the lexer and parser together. -</p> - -<a href="OCamlLangImpl2.html">Next: Implementing a Parser and AST</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl2.html b/docs/tutorial/OCamlLangImpl2.html deleted file mode 100644 index 6665109..0000000 --- a/docs/tutorial/OCamlLangImpl2.html +++ /dev/null @@ -1,1045 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Implementing a Parser and AST</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Implementing a Parser and AST</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 2 - <ol> - <li><a href="#intro">Chapter 2 Introduction</a></li> - <li><a href="#ast">The Abstract Syntax Tree (AST)</a></li> - <li><a href="#parserbasics">Parser Basics</a></li> - <li><a href="#parserprimexprs">Basic Expression Parsing</a></li> - <li><a href="#parserbinops">Binary Expression Parsing</a></li> - <li><a href="#parsertop">Parsing the Rest</a></li> - <li><a href="#driver">The Driver</a></li> - <li><a href="#conclusions">Conclusions</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl3.html">Chapter 3</a>: Code generation to LLVM IR</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 2 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 2 of the "<a href="index.html">Implementing a language -with LLVM in Objective Caml</a>" tutorial. This chapter shows you how to use -the lexer, built in <a href="OCamlLangImpl1.html">Chapter 1</a>, to build a -full <a href="http://en.wikipedia.org/wiki/Parsing">parser</a> for our -Kaleidoscope language. Once we have a parser, we'll define and build an <a -href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax -Tree</a> (AST).</p> - -<p>The parser we will build uses a combination of <a -href="http://en.wikipedia.org/wiki/Recursive_descent_parser">Recursive Descent -Parsing</a> and <a href= -"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence -Parsing</a> to parse the Kaleidoscope language (the latter for -binary expressions and the former for everything else). Before we get to -parsing though, lets talk about the output of the parser: the Abstract Syntax -Tree.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="ast">The Abstract Syntax Tree (AST)</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>The AST for a program captures its behavior in such a way that it is easy for -later stages of the compiler (e.g. code generation) to interpret. We basically -want one object for each construct in the language, and the AST should closely -model the language. In Kaleidoscope, we have expressions, a prototype, and a -function object. We'll start with expressions first:</p> - -<div class="doc_code"> -<pre> -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float -</pre> -</div> - -<p>The code above shows the definition of the base ExprAST class and one -subclass which we use for numeric literals. The important thing to note about -this code is that the Number variant captures the numeric value of the -literal as an instance variable. This allows later phases of the compiler to -know what the stored numeric value is.</p> - -<p>Right now we only create the AST, so there are no useful functions on -them. It would be very easy to add a function to pretty print the code, -for example. Here are the other expression AST node definitions that we'll use -in the basic form of the Kaleidoscope language: -</p> - -<div class="doc_code"> -<pre> - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array -</pre> -</div> - -<p>This is all (intentionally) rather straight-forward: variables capture the -variable name, binary operators capture their opcode (e.g. '+'), and calls -capture a function name as well as a list of any argument expressions. One thing -that is nice about our AST is that it captures the language features without -talking about the syntax of the language. Note that there is no discussion about -precedence of binary operators, lexical structure, etc.</p> - -<p>For our basic language, these are all of the expression nodes we'll define. -Because it doesn't have conditional control flow, it isn't Turing-complete; -we'll fix that in a later installment. The two things we need next are a way -to talk about the interface to a function, and a way to talk about functions -themselves:</p> - -<div class="doc_code"> -<pre> -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</div> - -<p>In Kaleidoscope, functions are typed with just a count of their arguments. -Since all values are double precision floating point, the type of each argument -doesn't need to be stored anywhere. In a more aggressive and realistic -language, the "expr" variants would probably have a type field.</p> - -<p>With this scaffolding, we can now talk about parsing expressions and function -bodies in Kaleidoscope.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="parserbasics">Parser Basics</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Now that we have an AST to build, we need to define the parser code to build -it. The idea here is that we want to parse something like "x+y" (which is -returned as three tokens by the lexer) into an AST that could be generated with -calls like this:</p> - -<div class="doc_code"> -<pre> - let x = Variable "x" in - let y = Variable "y" in - let result = Binary ('+', x, y) in - ... -</pre> -</div> - -<p> -The error handling routines make use of the builtin <tt>Stream.Failure</tt> and -<tt>Stream.Error</tt>s. <tt>Stream.Failure</tt> is raised when the parser is -unable to find any matching token in the first position of a pattern. -<tt>Stream.Error</tt> is raised when the first token matches, but the rest do -not. The error recovery in our parser will not be the best and is not -particular user-friendly, but it will be enough for our tutorial. These -exceptions make it easier to handle errors in routines that have various return -types.</p> - -<p>With these basic types and exceptions, we can implement the first -piece of our grammar: numeric literals.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="parserprimexprs">Basic Expression - Parsing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>We start with numeric literals, because they are the simplest to process. -For each production in our grammar, we'll define a function which parses that -production. We call this class of expressions "primary" expressions, for -reasons that will become more clear <a href="OCamlLangImpl6.html#unary"> -later in the tutorial</a>. In order to parse an arbitrary primary expression, -we need to determine what sort of expression it is. For numeric literals, we -have:</p> - -<div class="doc_code"> -<pre> -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n -</pre> -</div> - -<p>This routine is very simple: it expects to be called when the current token -is a <tt>Token.Number</tt> token. It takes the current number value, creates -a <tt>Ast.Number</tt> node, advances the lexer to the next token, and finally -returns.</p> - -<p>There are some interesting aspects to this. The most important one is that -this routine eats all of the tokens that correspond to the production and -returns the lexer buffer with the next token (which is not part of the grammar -production) ready to go. This is a fairly standard way to go for recursive -descent parsers. For a better example, the parenthesis operator is defined like -this:</p> - -<div class="doc_code"> -<pre> - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e -</pre> -</div> - -<p>This function illustrates a number of interesting things about the -parser:</p> - -<p> -1) It shows how we use the <tt>Stream.Error</tt> exception. When called, this -function expects that the current token is a '(' token, but after parsing the -subexpression, it is possible that there is no ')' waiting. For example, if -the user types in "(4 x" instead of "(4)", the parser should emit an error. -Because errors can occur, the parser needs a way to indicate that they -happened. In our parser, we use the camlp4 shortcut syntax <tt>token ?? "parse -error"</tt>, where if the token before the <tt>??</tt> does not match, then -<tt>Stream.Error "parse error"</tt> will be raised.</p> - -<p>2) Another interesting aspect of this function is that it uses recursion by -calling <tt>Parser.parse_primary</tt> (we will soon see that -<tt>Parser.parse_primary</tt> can call <tt>Parser.parse_primary</tt>). This is -powerful because it allows us to handle recursive grammars, and keeps each -production very simple. Note that parentheses do not cause construction of AST -nodes themselves. While we could do it this way, the most important role of -parentheses are to guide the parser and provide grouping. Once the parser -constructs the AST, parentheses are not needed.</p> - -<p>The next simple production is for handling variable references and function -calls:</p> - -<div class="doc_code"> -<pre> - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream -</pre> -</div> - -<p>This routine follows the same style as the other routines. (It expects to be -called if the current token is a <tt>Token.Ident</tt> token). It also has -recursion and error handling. One interesting aspect of this is that it uses -<em>look-ahead</em> to determine if the current identifier is a stand alone -variable reference or if it is a function call expression. It handles this by -checking to see if the token after the identifier is a '(' token, constructing -either a <tt>Ast.Variable</tt> or <tt>Ast.Call</tt> node as appropriate. -</p> - -<p>We finish up by raising an exception if we received a token we didn't -expect:</p> - -<div class="doc_code"> -<pre> - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") -</pre> -</div> - -<p>Now that basic expressions are handled, we need to handle binary expressions. -They are a bit more complex.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="parserbinops">Binary Expression - Parsing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Binary expressions are significantly harder to parse because they are often -ambiguous. For example, when given the string "x+y*z", the parser can choose -to parse it as either "(x+y)*z" or "x+(y*z)". With common definitions from -mathematics, we expect the later parse, because "*" (multiplication) has -higher <em>precedence</em> than "+" (addition).</p> - -<p>There are many ways to handle this, but an elegant and efficient way is to -use <a href= -"http://en.wikipedia.org/wiki/Operator-precedence_parser">Operator-Precedence -Parsing</a>. This parsing technique uses the precedence of binary operators to -guide recursion. To start with, we need a table of precedences:</p> - -<div class="doc_code"> -<pre> -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -... - -let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - ... -</pre> -</div> - -<p>For the basic form of Kaleidoscope, we will only support 4 binary operators -(this can obviously be extended by you, our brave and intrepid reader). The -<tt>Parser.precedence</tt> function returns the precedence for the current -token, or -1 if the token is not a binary operator. Having a <tt>Hashtbl.t</tt> -makes it easy to add new operators and makes it clear that the algorithm doesn't -depend on the specific operators involved, but it would be easy enough to -eliminate the <tt>Hashtbl.t</tt> and do the comparisons in the -<tt>Parser.precedence</tt> function. (Or just use a fixed-size array).</p> - -<p>With the helper above defined, we can now start parsing binary expressions. -The basic idea of operator precedence parsing is to break down an expression -with potentially ambiguous binary operators into pieces. Consider ,for example, -the expression "a+b+(c+d)*e*f+g". Operator precedence parsing considers this -as a stream of primary expressions separated by binary operators. As such, -it will first parse the leading primary expression "a", then it will see the -pairs [+, b] [+, (c+d)] [*, e] [*, f] and [+, g]. Note that because parentheses -are primary expressions, the binary expression parser doesn't need to worry -about nested subexpressions like (c+d) at all. -</p> - -<p> -To start, an expression is a primary expression potentially followed by a -sequence of [binop,primaryexpr] pairs:</p> - -<div class="doc_code"> -<pre> -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream -</pre> -</div> - -<p><tt>Parser.parse_bin_rhs</tt> is the function that parses the sequence of -pairs for us. It takes a precedence and a pointer to an expression for the part -that has been parsed so far. Note that "x" is a perfectly valid expression: As -such, "binoprhs" is allowed to be empty, in which case it returns the expression -that is passed into it. In our example above, the code passes the expression for -"a" into <tt>Parser.parse_bin_rhs</tt> and the current token is "+".</p> - -<p>The precedence value passed into <tt>Parser.parse_bin_rhs</tt> indicates the -<em>minimal operator precedence</em> that the function is allowed to eat. For -example, if the current pair stream is [+, x] and <tt>Parser.parse_bin_rhs</tt> -is passed in a precedence of 40, it will not consume any tokens (because the -precedence of '+' is only 20). With this in mind, <tt>Parser.parse_bin_rhs</tt> -starts with:</p> - -<div class="doc_code"> -<pre> -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin -</pre> -</div> - -<p>This code gets the precedence of the current token and checks to see if if is -too low. Because we defined invalid tokens to have a precedence of -1, this -check implicitly knows that the pair-stream ends when the token stream runs out -of binary operators. If this check succeeds, we know that the token is a binary -operator and that it will be included in this expression:</p> - -<div class="doc_code"> -<pre> - (* Eat the binop. *) - Stream.junk stream; - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> -</pre> -</div> - -<p>As such, this code eats (and remembers) the binary operator and then parses -the primary expression that follows. This builds up the whole pair, the first of -which is [+, b] for the running example.</p> - -<p>Now that we parsed the left-hand side of an expression and one pair of the -RHS sequence, we have to decide which way the expression associates. In -particular, we could have "(a+b) binop unparsed" or "a + (b binop unparsed)". -To determine this, we look ahead at "binop" to determine its precedence and -compare it to BinOp's precedence (which is '+' in this case):</p> - -<div class="doc_code"> -<pre> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec -</pre> -</div> - -<p>If the precedence of the binop to the right of "RHS" is lower or equal to the -precedence of our current operator, then we know that the parentheses associate -as "(a+b) binop ...". In our example, the current operator is "+" and the next -operator is "+", we know that they have the same precedence. In this case we'll -create the AST node for "a+b", and then continue parsing:</p> - -<div class="doc_code"> -<pre> - ... if body omitted ... - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end -</pre> -</div> - -<p>In our example above, this will turn "a+b+" into "(a+b)" and execute the next -iteration of the loop, with "+" as the current token. The code above will eat, -remember, and parse "(c+d)" as the primary expression, which makes the -current pair equal to [+, (c+d)]. It will then evaluate the 'if' conditional above with -"*" as the binop to the right of the primary. In this case, the precedence of "*" is -higher than the precedence of "+" so the if condition will be entered.</p> - -<p>The critical question left here is "how can the if condition parse the right -hand side in full"? In particular, to build the AST correctly for our example, -it needs to get all of "(c+d)*e*f" as the RHS expression variable. The code to -do this is surprisingly simple (code from the above two blocks duplicated for -context):</p> - -<div class="doc_code"> -<pre> - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - if token_prec < precedence c2 - then <b>parse_bin_rhs (token_prec + 1) rhs stream</b> - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end -</pre> -</div> - -<p>At this point, we know that the binary operator to the RHS of our primary -has higher precedence than the binop we are currently parsing. As such, we know -that any sequence of pairs whose operators are all higher precedence than "+" -should be parsed together and returned as "RHS". To do this, we recursively -invoke the <tt>Parser.parse_bin_rhs</tt> function specifying "token_prec+1" as -the minimum precedence required for it to continue. In our example above, this -will cause it to return the AST node for "(c+d)*e*f" as RHS, which is then set -as the RHS of the '+' expression.</p> - -<p>Finally, on the next iteration of the while loop, the "+g" piece is parsed -and added to the AST. With this little bit of code (14 non-trivial lines), we -correctly handle fully general binary expression parsing in a very elegant way. -This was a whirlwind tour of this code, and it is somewhat subtle. I recommend -running through it with a few tough examples to see how it works. -</p> - -<p>This wraps up handling of expressions. At this point, we can point the -parser at an arbitrary token stream and build an expression from it, stopping -at the first token that is not part of the expression. Next up we need to -handle function definitions, etc.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="parsertop">Parsing the Rest</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -The next thing missing is handling of function prototypes. In Kaleidoscope, -these are used both for 'extern' function declarations as well as function body -definitions. The code to do this is straight-forward and not very interesting -(once you've survived expressions): -</p> - -<div class="doc_code"> -<pre> -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") -</pre> -</div> - -<p>Given this, a function definition is very simple, just a prototype plus -an expression to implement the body:</p> - -<div class="doc_code"> -<pre> -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) -</pre> -</div> - -<p>In addition, we support 'extern' to declare functions like 'sin' and 'cos' as -well as to support forward declaration of user functions. These 'extern's are just -prototypes with no body:</p> - -<div class="doc_code"> -<pre> -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</div> - -<p>Finally, we'll also let the user type in arbitrary top-level expressions and -evaluate them on the fly. We will handle this by defining anonymous nullary -(zero argument) functions for them:</p> - -<div class="doc_code"> -<pre> -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) -</pre> -</div> - -<p>Now that we have all the pieces, let's build a little driver that will let us -actually <em>execute</em> this code we've built!</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="driver">The Driver</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>The driver for this simply invokes all of the parsing pieces with a top-level -dispatch loop. There isn't much interesting here, so I'll just include the -top-level loop. See <a href="#code">below</a> for full code in the "Top-Level -Parsing" section.</p> - -<div class="doc_code"> -<pre> -(* top ::= definition | external | expression | ';' *) -let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - ignore(Parser.parse_definition stream); - print_endline "parsed a function definition."; - | Token.Extern -> - ignore(Parser.parse_extern stream); - print_endline "parsed an extern."; - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - ignore(Parser.parse_toplevel stream); - print_endline "parsed a top-level expr"; - with Stream.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream -</pre> -</div> - -<p>The most interesting part of this is that we ignore top-level semicolons. -Why is this, you ask? The basic reason is that if you type "4 + 5" at the -command line, the parser doesn't know whether that is the end of what you will type -or not. For example, on the next line you could type "def foo..." in which case -4+5 is the end of a top-level expression. Alternatively you could type "* 6", -which would continue the expression. Having top-level semicolons allows you to -type "4+5;", and the parser will know you are done.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="conclusions">Conclusions</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>With just under 300 lines of commented code (240 lines of non-comment, -non-blank code), we fully defined our minimal language, including a lexer, -parser, and AST builder. With this done, the executable will validate -Kaleidoscope code and tell us if it is grammatically invalid. For -example, here is a sample interaction:</p> - -<div class="doc_code"> -<pre> -$ <b>./toy.byte</b> -ready> <b>def foo(x y) x+foo(y, 4.0);</b> -Parsed a function definition. -ready> <b>def foo(x y) x+y y;</b> -Parsed a function definition. -Parsed a top-level expr -ready> <b>def foo(x y) x+y );</b> -Parsed a function definition. -Error: unknown token when expecting an expression -ready> <b>extern sin(a);</b> -ready> Parsed an extern -ready> <b>^D</b> -$ -</pre> -</div> - -<p>There is a lot of room for extension here. You can define new AST nodes, -extend the language in many ways, etc. In the <a href="OCamlLangImpl3.html"> -next installment</a>, we will describe how to generate LLVM Intermediate -Representation (IR) from the AST.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for this and the previous chapter. -Note that it is fully self-contained: you don't need LLVM or any external -libraries at all for this. (Besides the ocaml standard libraries, of -course.) To build this, just compile with:</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -(* top ::= definition | external | expression | ';' *) -let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - ignore(Parser.parse_definition stream); - print_endline "parsed a function definition."; - | Token.Extern -> - ignore(Parser.parse_extern stream); - print_endline "parsed an extern."; - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - ignore(Parser.parse_toplevel stream); - print_endline "parsed a top-level expr"; - with Stream.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop stream; -;; - -main () -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl3.html">Next: Implementing Code Generation to LLVM IR</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a> - <a href="mailto:erickt@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl3.html b/docs/tutorial/OCamlLangImpl3.html deleted file mode 100644 index febd7f5..0000000 --- a/docs/tutorial/OCamlLangImpl3.html +++ /dev/null @@ -1,1093 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Implementing code generation to LLVM IR</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 3 - <ol> - <li><a href="#intro">Chapter 3 Introduction</a></li> - <li><a href="#basics">Code Generation Setup</a></li> - <li><a href="#exprs">Expression Code Generation</a></li> - <li><a href="#funcs">Function Code Generation</a></li> - <li><a href="#driver">Driver Changes and Closing Thoughts</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl4.html">Chapter 4</a>: Adding JIT and Optimizer -Support</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 3 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. This chapter shows you how to transform the <a -href="OCamlLangImpl2.html">Abstract Syntax Tree</a>, built in Chapter 2, into -LLVM IR. This will teach you a little bit about how LLVM does things, as well -as demonstrate how easy it is to use. It's much more work to build a lexer and -parser than it is to generate LLVM IR code. :) -</p> - -<p><b>Please note</b>: the code in this chapter and later require LLVM 2.3 or -LLVM SVN to work. LLVM 2.2 and before will not work with it.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="basics">Code Generation Setup</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -In order to generate LLVM IR, we want some simple setup to get started. First -we define virtual code generation (codegen) methods in each AST class:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - | Ast.Number n -> ... - | Ast.Variable name -> ... -</pre> -</div> - -<p>The <tt>Codegen.codegen_expr</tt> function says to emit IR for that AST node -along with all the things it depends on, and they all return an LLVM Value -object. "Value" is the class used to represent a "<a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single -Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect -of SSA values is that their value is computed as the related instruction -executes, and it does not get a new value until (and if) the instruction -re-executes. In other words, there is no way to "change" an SSA value. For -more information, please read up on <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single -Assignment</a> - the concepts are really quite natural once you grok them.</p> - -<p>The -second thing we want is an "Error" exception like we used for the parser, which -will be used to report errors found during code generation (for example, use of -an undeclared parameter):</p> - -<div class="doc_code"> -<pre> -exception Error of string - -let the_module = create_module (global_context ()) "my cool jit" -let builder = builder (global_context ()) -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context -</pre> -</div> - -<p>The static variables will be used during code generation. -<tt>Codgen.the_module</tt> is the LLVM construct that contains all of the -functions and global variables in a chunk of code. In many ways, it is the -top-level structure that the LLVM IR uses to contain code.</p> - -<p>The <tt>Codegen.builder</tt> object is a helper object that makes it easy to -generate LLVM instructions. Instances of the <a -href="http://llvm.org/doxygen/IRBuilder_8h-source.html"><tt>IRBuilder</tt></a> -class keep track of the current place to insert instructions and has methods to -create new instructions.</p> - -<p>The <tt>Codegen.named_values</tt> map keeps track of which values are defined -in the current scope and what their LLVM representation is. (In other words, it -is a symbol table for the code). In this form of Kaleidoscope, the only things -that can be referenced are function parameters. As such, function parameters -will be in this map when generating code for their function body.</p> - -<p> -With these basics in place, we can start talking about how to generate code for -each expression. Note that this assumes that the <tt>Codgen.builder</tt> has -been set up to generate code <em>into</em> something. For now, we'll assume -that this has already been done, and we'll just use it to emit code.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="exprs">Expression Code Generation</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Generating LLVM code for expression nodes is very straightforward: less -than 30 lines of commented code for all four of our expression nodes. First -we'll do numeric literals:</p> - -<div class="doc_code"> -<pre> - | Ast.Number n -> const_float double_type n -</pre> -</div> - -<p>In the LLVM IR, numeric constants are represented with the -<tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt> -internally (<tt>APFloat</tt> has the capability of holding floating point -constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just -creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR -that constants are all uniqued together and shared. For this reason, the API -uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)".</p> - -<div class="doc_code"> -<pre> - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) -</pre> -</div> - -<p>References to variables are also quite simple using LLVM. In the simple -version of Kaleidoscope, we assume that the variable has already been emitted -somewhere and its value is available. In practice, the only values that can be -in the <tt>Codegen.named_values</tt> map are function arguments. This code -simply checks to see that the specified name is in the map (if not, an unknown -variable is being referenced) and returns the value for it. In future chapters, -we'll add support for <a href="LangImpl5.html#for">loop induction variables</a> -in the symbol table, and for <a href="LangImpl7.html#localvars">local -variables</a>.</p> - -<div class="doc_code"> -<pre> - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end -</pre> -</div> - -<p>Binary operators start to get more interesting. The basic idea here is that -we recursively emit code for the left-hand side of the expression, then the -right-hand side, then we compute the result of the binary expression. In this -code, we do a simple switch on the opcode to create the right LLVM instruction. -</p> - -<p>In the example above, the LLVM builder class is starting to show its value. -IRBuilder knows where to insert the newly created instruction, all you have to -do is specify what instruction to create (e.g. with <tt>Llvm.create_add</tt>), -which operands to use (<tt>lhs</tt> and <tt>rhs</tt> here) and optionally -provide a name for the generated instruction.</p> - -<p>One nice thing about LLVM is that the name is just a hint. For instance, if -the code above emits multiple "addtmp" variables, LLVM will automatically -provide each one with an increasing, unique numeric suffix. Local value names -for instructions are purely optional, but it makes it much easier to read the -IR dumps.</p> - -<p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained by -strict rules: for example, the Left and Right operators of -an <a href="../LangRef.html#i_add">add instruction</a> must have the same -type, and the result type of the add must match the operand types. Because -all values in Kaleidoscope are doubles, this makes for very simple code for add, -sub and mul.</p> - -<p>On the other hand, LLVM specifies that the <a -href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value -(a one bit integer). The problem with this is that Kaleidoscope wants the value to be a 0.0 or 1.0 value. In order to get these semantics, we combine the fcmp instruction with -a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction -converts its input integer into a floating point value by treating the input -as an unsigned value. In contrast, if we used the <a -href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<' -operator would return 0.0 and -1.0, depending on the input value.</p> - -<div class="doc_code"> -<pre> - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder -</pre> -</div> - -<p>Code generation for function calls is quite straightforward with LLVM. The -code above initially does a function name lookup in the LLVM Module's symbol -table. Recall that the LLVM Module is the container that holds all of the -functions we are JIT'ing. By giving each function the same name as what the -user specifies, we can use the LLVM symbol table to resolve function names for -us.</p> - -<p>Once we have the function to call, we recursively codegen each argument that -is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call -instruction</a>. Note that LLVM uses the native C calling conventions by -default, allowing these calls to also call into standard library functions like -"sin" and "cos", with no additional effort.</p> - -<p>This wraps up our handling of the four basic expressions that we have so far -in Kaleidoscope. Feel free to go in and add some more. For example, by -browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find -several other interesting instructions that are really easy to plug into our -basic framework.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="funcs">Function Code Generation</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Code generation for prototypes and functions must handle a number of -details, which make their code less beautiful than expression code -generation, but allows us to illustrate some important points. First, lets -talk about code generation for prototypes: they are used both for function -bodies and external function declarations. The code starts with:</p> - -<div class="doc_code"> -<pre> -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with -</pre> -</div> - -<p>This code packs a lot of power into a few lines. Note first that this -function returns a "Function*" instead of a "Value*" (although at the moment -they both are modeled by <tt>llvalue</tt> in ocaml). Because a "prototype" -really talks about the external interface for a function (not the value computed -by an expression), it makes sense for it to return the LLVM Function it -corresponds to when codegen'd.</p> - -<p>The call to <tt>Llvm.function_type</tt> creates the <tt>Llvm.llvalue</tt> -that should be used for a given Prototype. Since all function arguments in -Kaleidoscope are of type double, the first line creates a vector of "N" LLVM -double types. It then uses the <tt>Llvm.function_type</tt> method to create a -function type that takes "N" doubles as arguments, returns one double as a -result, and that is not vararg (that uses the function -<tt>Llvm.var_arg_function_type</tt>). Note that Types in LLVM are uniqued just -like <tt>Constant</tt>s are, so you don't "new" a type, you "get" it.</p> - -<p>The final line above checks if the function has already been defined in -<tt>Codegen.the_module</tt>. If not, we will create it.</p> - -<div class="doc_code"> -<pre> - | None -> declare_function name ft the_module -</pre> -</div> - -<p>This indicates the type and name to use, as well as which module to insert -into. By default we assume a function has -<tt>Llvm.Linkage.ExternalLinkage</tt>. "<a href="LangRef.html#linkage">external -linkage</a>" means that the function may be defined outside the current module -and/or that it is callable by functions outside the module. The "<tt>name</tt>" -passed in is the name the user specified: this name is registered in -"<tt>Codegen.the_module</tt>"s symbol table, which is used by the function call -code above.</p> - -<p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases: -first, we want to allow 'extern'ing a function more than once, as long as the -prototypes for the externs match (since all arguments have the same type, we -just have to check that the number of arguments match). Second, we want to -allow 'extern'ing a function and then defining a body for it. This is useful -when defining mutually recursive functions.</p> - -<div class="doc_code"> -<pre> - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if Array.length (basic_blocks f) == 0 then () else - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if Array.length (params f) == Array.length args then () else - raise (Error "redefinition of function with different # args"); - f - in -</pre> -</div> - -<p>In order to verify the logic above, we first check to see if the pre-existing -function is "empty". In this case, empty means that it has no basic blocks in -it, which means it has no body. If it has no body, it is a forward -declaration. Since we don't allow anything after a full definition of the -function, the code rejects this case. If the previous reference to a function -was an 'extern', we simply verify that the number of arguments for that -definition and this one match up. If not, we emit an error.</p> - -<div class="doc_code"> -<pre> - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f -</pre> -</div> - -<p>The last bit of code for prototypes loops over all of the arguments in the -function, setting the name of the LLVM Argument objects to match, and registering -the arguments in the <tt>Codegen.named_values</tt> map for future use by the -<tt>Ast.Variable</tt> variant. Once this is set up, it returns the Function -object to the caller. Note that we don't check for conflicting -argument names here (e.g. "extern foo(a b a)"). Doing so would be very -straight-forward with the mechanics we have already used above.</p> - -<div class="doc_code"> -<pre> -let codegen_func = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in -</pre> -</div> - -<p>Code generation for function definitions starts out simply enough: we just -codegen the prototype (Proto) and verify that it is ok. We then clear out the -<tt>Codegen.named_values</tt> map to make sure that there isn't anything in it -from the last function we compiled. Code generation of the prototype ensures -that there is an LLVM Function object that is ready to go for us.</p> - -<div class="doc_code"> -<pre> - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in -</pre> -</div> - -<p>Now we get to the point where the <tt>Codegen.builder</tt> is set up. The -first line creates a new -<a href="http://en.wikipedia.org/wiki/Basic_block">basic block</a> (named -"entry"), which is inserted into <tt>the_function</tt>. The second line then -tells the builder that new instructions should be inserted into the end of the -new basic block. Basic blocks in LLVM are an important part of functions that -define the <a -href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>. -Since we don't have any control flow, our functions will only contain one -block at this point. We'll fix this in <a href="OCamlLangImpl5.html">Chapter -5</a> :).</p> - -<div class="doc_code"> -<pre> - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - the_function -</pre> -</div> - -<p>Once the insertion point is set up, we call the <tt>Codegen.codegen_func</tt> -method for the root expression of the function. If no error happens, this emits -code to compute the expression into the entry block and returns the value that -was computed. Assuming no error, we then create an LLVM <a -href="../LangRef.html#i_ret">ret instruction</a>, which completes the function. -Once the function is built, we call -<tt>Llvm_analysis.assert_valid_function</tt>, which is provided by LLVM. This -function does a variety of consistency checks on the generated code, to -determine if our compiler is doing everything right. Using this is important: -it can catch a lot of bugs. Once the function is finished and validated, we -return it.</p> - -<div class="doc_code"> -<pre> - with e -> - delete_function the_function; - raise e -</pre> -</div> - -<p>The only piece left here is handling of the error case. For simplicity, we -handle this by merely deleting the function we produced with the -<tt>Llvm.delete_function</tt> method. This allows the user to redefine a -function that they incorrectly typed in before: if we didn't delete it, it -would live in the symbol table, with a body, preventing future redefinition.</p> - -<p>This code does have a bug, though. Since the <tt>Codegen.codegen_proto</tt> -can return a previously defined forward declaration, our code can actually delete -a forward declaration. There are a number of ways to fix this bug, see what you -can come up with! Here is a testcase:</p> - -<div class="doc_code"> -<pre> -extern foo(a b); # ok, defines foo. -def foo(a b) c; # error, 'c' is invalid. -def bar() foo(1, 2); # error, unknown function "foo" -</pre> -</div> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="driver">Driver Changes and -Closing Thoughts</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -For now, code generation to LLVM doesn't really get us much, except that we can -look at the pretty IR calls. The sample code inserts calls to Codegen into the -"<tt>Toplevel.main_loop</tt>", and then dumps out the LLVM IR. This gives a -nice way to look at the LLVM IR for simple functions. For example: -</p> - -<div class="doc_code"> -<pre> -ready> <b>4+5</b>; -Read top-level expression: -define double @""() { -entry: - %addtmp = fadd double 4.000000e+00, 5.000000e+00 - ret double %addtmp -} -</pre> -</div> - -<p>Note how the parser turns the top-level expression into anonymous functions -for us. This will be handy when we add <a href="OCamlLangImpl4.html#jit">JIT -support</a> in the next chapter. Also note that the code is very literally -transcribed, no optimizations are being performed. We will -<a href="OCamlLangImpl4.html#trivialconstfold">add optimizations</a> explicitly -in the next chapter.</p> - -<div class="doc_code"> -<pre> -ready> <b>def foo(a b) a*a + 2*a*b + b*b;</b> -Read function definition: -define double @foo(double %a, double %b) { -entry: - %multmp = fmul double %a, %a - %multmp1 = fmul double 2.000000e+00, %a - %multmp2 = fmul double %multmp1, %b - %addtmp = fadd double %multmp, %multmp2 - %multmp3 = fmul double %b, %b - %addtmp4 = fadd double %addtmp, %multmp3 - ret double %addtmp4 -} -</pre> -</div> - -<p>This shows some simple arithmetic. Notice the striking similarity to the -LLVM builder calls that we use to create the instructions.</p> - -<div class="doc_code"> -<pre> -ready> <b>def bar(a) foo(a, 4.0) + bar(31337);</b> -Read function definition: -define double @bar(double %a) { -entry: - %calltmp = call double @foo( double %a, double 4.000000e+00 ) - %calltmp1 = call double @bar( double 3.133700e+04 ) - %addtmp = fadd double %calltmp, %calltmp1 - ret double %addtmp -} -</pre> -</div> - -<p>This shows some function calls. Note that this function will take a long -time to execute if you call it. In the future we'll add conditional control -flow to actually make recursion useful :).</p> - -<div class="doc_code"> -<pre> -ready> <b>extern cos(x);</b> -Read extern: -declare double @cos(double) - -ready> <b>cos(1.234);</b> -Read top-level expression: -define double @""() { -entry: - %calltmp = call double @cos( double 1.234000e+00 ) - ret double %calltmp -} -</pre> -</div> - -<p>This shows an extern for the libm "cos" function, and a call to it.</p> - - -<div class="doc_code"> -<pre> -ready> <b>^D</b> -; ModuleID = 'my cool jit' - -define double @""() { -entry: - %addtmp = fadd double 4.000000e+00, 5.000000e+00 - ret double %addtmp -} - -define double @foo(double %a, double %b) { -entry: - %multmp = fmul double %a, %a - %multmp1 = fmul double 2.000000e+00, %a - %multmp2 = fmul double %multmp1, %b - %addtmp = fadd double %multmp, %multmp2 - %multmp3 = fmul double %b, %b - %addtmp4 = fadd double %addtmp, %multmp3 - ret double %addtmp4 -} - -define double @bar(double %a) { -entry: - %calltmp = call double @foo( double %a, double 4.000000e+00 ) - %calltmp1 = call double @bar( double 3.133700e+04 ) - %addtmp = fadd double %calltmp, %calltmp1 - ret double %addtmp -} - -declare double @cos(double) - -define double @""() { -entry: - %calltmp = call double @cos( double 1.234000e+00 ) - ret double %calltmp -} -</pre> -</div> - -<p>When you quit the current demo, it dumps out the IR for the entire module -generated. Here you can see the big picture with all the functions referencing -each other.</p> - -<p>This wraps up the third chapter of the Kaleidoscope tutorial. Up next, we'll -describe how to <a href="OCamlLangImpl4.html">add JIT codegen and optimizer -support</a> to this so we can actually start running code!</p> - -</div> - - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for our running example, enhanced with the -LLVM code generator. Because this uses the LLVM libraries, we need to link -them in. To do this, we use the <a -href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform -our makefile/command line about which options to use:</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -</pre> -</dd> - -<dt>myocamlbuild.ml:</dt> -<dd class="doc_code"> -<pre> -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>codegen.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - the_function - with e -> - delete_function the_function; - raise e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm - -(* top ::= definition | external | expression | ';' *) -let rec main_loop stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - dump_value (Codegen.codegen_func e); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm - -let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl4.html">Next: Adding JIT and Optimizer Support</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl4.html b/docs/tutorial/OCamlLangImpl4.html deleted file mode 100644 index 116c618..0000000 --- a/docs/tutorial/OCamlLangImpl4.html +++ /dev/null @@ -1,1029 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Adding JIT and Optimizer Support</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Adding JIT and Optimizer Support</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 4 - <ol> - <li><a href="#intro">Chapter 4 Introduction</a></li> - <li><a href="#trivialconstfold">Trivial Constant Folding</a></li> - <li><a href="#optimizerpasses">LLVM Optimization Passes</a></li> - <li><a href="#jit">Adding a JIT Compiler</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl5.html">Chapter 5</a>: Extending the Language: Control -Flow</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 4 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 4 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. Chapters 1-3 described the implementation of a simple -language and added support for generating LLVM IR. This chapter describes -two new techniques: adding optimizer support to your language, and adding JIT -compiler support. These additions will demonstrate how to get nice, efficient code -for the Kaleidoscope language.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="trivialconstfold">Trivial Constant -Folding</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p><b>Note:</b> the default <tt>IRBuilder</tt> now always includes the constant -folding optimisations below.<p> - -<p> -Our demonstration for Chapter 3 is elegant and easy to extend. Unfortunately, -it does not produce wonderful code. For example, when compiling simple code, -we don't get obvious optimizations:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) 1+2+x;</b> -Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 1.000000e+00, 2.000000e+00 - %addtmp1 = fadd double %addtmp, %x - ret double %addtmp1 -} -</pre> -</div> - -<p>This code is a very, very literal transcription of the AST built by parsing -the input. As such, this transcription lacks optimizations like constant folding -(we'd like to get "<tt>add x, 3.0</tt>" in the example above) as well as other -more important optimizations. Constant folding, in particular, is a very common -and very important optimization: so much so that many language implementors -implement constant folding support in their AST representation.</p> - -<p>With LLVM, you don't need this support in the AST. Since all calls to build -LLVM IR go through the LLVM builder, it would be nice if the builder itself -checked to see if there was a constant folding opportunity when you call it. -If so, it could just do the constant fold and return the constant instead of -creating an instruction. This is exactly what the <tt>LLVMFoldingBuilder</tt> -class does. - -<p>All we did was switch from <tt>LLVMBuilder</tt> to -<tt>LLVMFoldingBuilder</tt>. Though we change no other code, we now have all of our -instructions implicitly constant folded without us having to do anything -about it. For example, the input above now compiles to:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) 1+2+x;</b> -Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 3.000000e+00, %x - ret double %addtmp -} -</pre> -</div> - -<p>Well, that was easy :). In practice, we recommend always using -<tt>LLVMFoldingBuilder</tt> when generating code like this. It has no -"syntactic overhead" for its use (you don't have to uglify your compiler with -constant checks everywhere) and it can dramatically reduce the amount of -LLVM IR that is generated in some cases (particular for languages with a macro -preprocessor or that use a lot of constants).</p> - -<p>On the other hand, the <tt>LLVMFoldingBuilder</tt> is limited by the fact -that it does all of its analysis inline with the code as it is built. If you -take a slightly more complex example:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) (1+2+x)*(x+(1+2));</b> -ready> Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double 3.000000e+00, %x - %addtmp1 = fadd double %x, 3.000000e+00 - %multmp = fmul double %addtmp, %addtmp1 - ret double %multmp -} -</pre> -</div> - -<p>In this case, the LHS and RHS of the multiplication are the same value. We'd -really like to see this generate "<tt>tmp = x+3; result = tmp*tmp;</tt>" instead -of computing "<tt>x*3</tt>" twice.</p> - -<p>Unfortunately, no amount of local analysis will be able to detect and correct -this. This requires two transformations: reassociation of expressions (to -make the add's lexically identical) and Common Subexpression Elimination (CSE) -to delete the redundant add instruction. Fortunately, LLVM provides a broad -range of optimizations that you can use, in the form of "passes".</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="optimizerpasses">LLVM Optimization - Passes</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>LLVM provides many optimization passes, which do many different sorts of -things and have different tradeoffs. Unlike other systems, LLVM doesn't hold -to the mistaken notion that one set of optimizations is right for all languages -and for all situations. LLVM allows a compiler implementor to make complete -decisions about what optimizations to use, in which order, and in what -situation.</p> - -<p>As a concrete example, LLVM supports both "whole module" passes, which look -across as large of body of code as they can (often a whole file, but if run -at link time, this can be a substantial portion of the whole program). It also -supports and includes "per-function" passes which just operate on a single -function at a time, without looking at other functions. For more information -on passes and how they are run, see the <a href="../WritingAnLLVMPass.html">How -to Write a Pass</a> document and the <a href="../Passes.html">List of LLVM -Passes</a>.</p> - -<p>For Kaleidoscope, we are currently generating functions on the fly, one at -a time, as the user types them in. We aren't shooting for the ultimate -optimization experience in this setting, but we also want to catch the easy and -quick stuff where possible. As such, we will choose to run a few per-function -optimizations as the user types the function in. If we wanted to make a "static -Kaleidoscope compiler", we would use exactly the code we have now, except that -we would defer running the optimizer until the entire file has been parsed.</p> - -<p>In order to get per-function optimizations going, we need to set up a -<a href="../WritingAnLLVMPass.html#passmanager">Llvm.PassManager</a> to hold and -organize the LLVM optimizations that we want to run. Once we have that, we can -add a set of optimizations to run. The code looks like this:</p> - -<div class="doc_code"> -<pre> - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combining the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; -</pre> -</div> - -<p>The meat of the matter here, is the definition of "<tt>the_fpm</tt>". It -requires a pointer to the <tt>the_module</tt> to construct itself. Once it is -set up, we use a series of "add" calls to add a bunch of LLVM passes. The -first pass is basically boilerplate, it adds a pass so that later optimizations -know how the data structures in the program are laid out. The -"<tt>the_execution_engine</tt>" variable is related to the JIT, which we will -get to in the next section.</p> - -<p>In this case, we choose to add 4 optimization passes. The passes we chose -here are a pretty standard set of "cleanup" optimizations that are useful for -a wide variety of code. I won't delve into what they do but, believe me, -they are a good starting place :).</p> - -<p>Once the <tt>Llvm.PassManager.</tt> is set up, we need to make use of it. -We do this by running it after our newly created function is constructed (in -<tt>Codegen.codegen_func</tt>), but before it is returned to the client:</p> - -<div class="doc_code"> -<pre> -let codegen_func the_fpm = function - ... - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function -</pre> -</div> - -<p>As you can see, this is pretty straightforward. The <tt>the_fpm</tt> -optimizes and updates the LLVM Function* in place, improving (hopefully) its -body. With this in place, we can try our test above again:</p> - -<div class="doc_code"> -<pre> -ready> <b>def test(x) (1+2+x)*(x+(1+2));</b> -ready> Read function definition: -define double @test(double %x) { -entry: - %addtmp = fadd double %x, 3.000000e+00 - %multmp = fmul double %addtmp, %addtmp - ret double %multmp -} -</pre> -</div> - -<p>As expected, we now get our nicely optimized code, saving a floating point -add instruction from every execution of this function.</p> - -<p>LLVM provides a wide variety of optimizations that can be used in certain -circumstances. Some <a href="../Passes.html">documentation about the various -passes</a> is available, but it isn't very complete. Another good source of -ideas can come from looking at the passes that <tt>llvm-gcc</tt> or -<tt>llvm-ld</tt> run to get started. The "<tt>opt</tt>" tool allows you to -experiment with passes from the command line, so you can see if they do -anything.</p> - -<p>Now that we have reasonable code coming out of our front-end, lets talk about -executing it!</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="jit">Adding a JIT Compiler</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Code that is available in LLVM IR can have a wide variety of tools -applied to it. For example, you can run optimizations on it (as we did above), -you can dump it out in textual or binary forms, you can compile the code to an -assembly file (.s) for some target, or you can JIT compile it. The nice thing -about the LLVM IR representation is that it is the "common currency" between -many different parts of the compiler. -</p> - -<p>In this section, we'll add JIT compiler support to our interpreter. The -basic idea that we want for Kaleidoscope is to have the user enter function -bodies as they do now, but immediately evaluate the top-level expressions they -type in. For example, if they type in "1 + 2;", we should evaluate and print -out 3. If they define a function, they should be able to call it from the -command line.</p> - -<p>In order to do this, we first declare and initialize the JIT. This is done -by adding a global variable and a call in <tt>main</tt>:</p> - -<div class="doc_code"> -<pre> -... -let main () = - ... - <b>(* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in</b> - ... -</pre> -</div> - -<p>This creates an abstract "Execution Engine" which can be either a JIT -compiler or the LLVM interpreter. LLVM will automatically pick a JIT compiler -for you if one is available for your platform, otherwise it will fall back to -the interpreter.</p> - -<p>Once the <tt>Llvm_executionengine.ExecutionEngine.t</tt> is created, the JIT -is ready to be used. There are a variety of APIs that are useful, but the -simplest one is the "<tt>Llvm_executionengine.ExecutionEngine.run_function</tt>" -function. This method JIT compiles the specified LLVM Function and returns a -function pointer to the generated machine code. In our case, this means that we -can change the code that parses a top-level expression to look like this:</p> - -<div class="doc_code"> -<pre> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); -</pre> -</div> - -<p>Recall that we compile top-level expressions into a self-contained LLVM -function that takes no arguments and returns the computed double. Because the -LLVM JIT compiler matches the native platform ABI, this means that you can just -cast the result pointer to a function pointer of that type and call it directly. -This means, there is no difference between JIT compiled code and native machine -code that is statically linked into your application.</p> - -<p>With just these two changes, lets see how Kaleidoscope works now!</p> - -<div class="doc_code"> -<pre> -ready> <b>4+5;</b> -define double @""() { -entry: - ret double 9.000000e+00 -} - -<em>Evaluated to 9.000000</em> -</pre> -</div> - -<p>Well this looks like it is basically working. The dump of the function -shows the "no argument function that always returns double" that we synthesize -for each top level expression that is typed in. This demonstrates very basic -functionality, but can we do more?</p> - -<div class="doc_code"> -<pre> -ready> <b>def testfunc(x y) x + y*2; </b> -Read function definition: -define double @testfunc(double %x, double %y) { -entry: - %multmp = fmul double %y, 2.000000e+00 - %addtmp = fadd double %multmp, %x - ret double %addtmp -} - -ready> <b>testfunc(4, 10);</b> -define double @""() { -entry: - %calltmp = call double @testfunc( double 4.000000e+00, double 1.000000e+01 ) - ret double %calltmp -} - -<em>Evaluated to 24.000000</em> -</pre> -</div> - -<p>This illustrates that we can now call user code, but there is something a bit -subtle going on here. Note that we only invoke the JIT on the anonymous -functions that <em>call testfunc</em>, but we never invoked it -on <em>testfunc</em> itself. What actually happened here is that the JIT -scanned for all non-JIT'd functions transitively called from the anonymous -function and compiled all of them before returning -from <tt>run_function</tt>.</p> - -<p>The JIT provides a number of other more advanced interfaces for things like -freeing allocated machine code, rejit'ing functions to update them, etc. -However, even with this simple code, we get some surprisingly powerful -capabilities - check this out (I removed the dump of the anonymous functions, -you should get the idea by now :) :</p> - -<div class="doc_code"> -<pre> -ready> <b>extern sin(x);</b> -Read extern: -declare double @sin(double) - -ready> <b>extern cos(x);</b> -Read extern: -declare double @cos(double) - -ready> <b>sin(1.0);</b> -<em>Evaluated to 0.841471</em> - -ready> <b>def foo(x) sin(x)*sin(x) + cos(x)*cos(x);</b> -Read function definition: -define double @foo(double %x) { -entry: - %calltmp = call double @sin( double %x ) - %multmp = fmul double %calltmp, %calltmp - %calltmp2 = call double @cos( double %x ) - %multmp4 = fmul double %calltmp2, %calltmp2 - %addtmp = fadd double %multmp, %multmp4 - ret double %addtmp -} - -ready> <b>foo(4.0);</b> -<em>Evaluated to 1.000000</em> -</pre> -</div> - -<p>Whoa, how does the JIT know about sin and cos? The answer is surprisingly -simple: in this example, the JIT started execution of a function and got to a -function call. It realized that the function was not yet JIT compiled and -invoked the standard set of routines to resolve the function. In this case, -there is no body defined for the function, so the JIT ended up calling -"<tt>dlsym("sin")</tt>" on the Kaleidoscope process itself. Since -"<tt>sin</tt>" is defined within the JIT's address space, it simply patches up -calls in the module to call the libm version of <tt>sin</tt> directly.</p> - -<p>The LLVM JIT provides a number of interfaces (look in the -<tt>llvm_executionengine.mli</tt> file) for controlling how unknown functions -get resolved. It allows you to establish explicit mappings between IR objects -and addresses (useful for LLVM global variables that you want to map to static -tables, for example), allows you to dynamically decide on the fly based on the -function name, and even allows you to have the JIT compile functions lazily the -first time they're called.</p> - -<p>One interesting application of this is that we can now extend the language -by writing arbitrary C code to implement operations. For example, if we add: -</p> - -<div class="doc_code"> -<pre> -/* putchard - putchar that takes a double and returns 0. */ -extern "C" -double putchard(double X) { - putchar((char)X); - return 0; -} -</pre> -</div> - -<p>Now we can produce simple output to the console by using things like: -"<tt>extern putchard(x); putchard(120);</tt>", which prints a lowercase 'x' on -the console (120 is the ASCII code for 'x'). Similar code could be used to -implement file I/O, console input, and many other capabilities in -Kaleidoscope.</p> - -<p>This completes the JIT and optimizer chapter of the Kaleidoscope tutorial. At -this point, we can compile a non-Turing-complete programming language, optimize -and JIT compile it in a user-driven way. Next up we'll look into <a -href="OCamlLangImpl5.html">extending the language with control flow -constructs</a>, tackling some interesting LLVM IR issues along the way.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for our running example, enhanced with the -LLVM JIT and optimizer. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings -</pre> -</dd> - -<dt>myocamlbuild.ml:</dt> -<dd class="doc_code"> -<pre> -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>codegen.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () -</pre> -</dd> - -<dt>bindings.c</dt> -<dd class="doc_code"> -<pre> -#include <stdio.h> - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl5.html">Next: Extending the language: control flow</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl5.html b/docs/tutorial/OCamlLangImpl5.html deleted file mode 100644 index 131d5b2..0000000 --- a/docs/tutorial/OCamlLangImpl5.html +++ /dev/null @@ -1,1569 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: Control Flow</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 5 - <ol> - <li><a href="#intro">Chapter 5 Introduction</a></li> - <li><a href="#ifthen">If/Then/Else</a> - <ol> - <li><a href="#iflexer">Lexer Extensions</a></li> - <li><a href="#ifast">AST Extensions</a></li> - <li><a href="#ifparser">Parser Extensions</a></li> - <li><a href="#ifir">LLVM IR</a></li> - <li><a href="#ifcodegen">Code Generation</a></li> - </ol> - </li> - <li><a href="#for">'for' Loop Expression</a> - <ol> - <li><a href="#forlexer">Lexer Extensions</a></li> - <li><a href="#forast">AST Extensions</a></li> - <li><a href="#forparser">Parser Extensions</a></li> - <li><a href="#forir">LLVM IR</a></li> - <li><a href="#forcodegen">Code Generation</a></li> - </ol> - </li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl6.html">Chapter 6</a>: Extending the Language: -User-defined Operators</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 5 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple -Kaleidoscope language and included support for generating LLVM IR, followed by -optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is -mostly useless: it has no control flow other than call and return. This means -that you can't have conditional branches in the code, significantly limiting its -power. In this episode of "build that compiler", we'll extend Kaleidoscope to -have an if/then/else expression plus a simple 'for' loop.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="ifthen">If/Then/Else</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Extending Kaleidoscope to support if/then/else is quite straightforward. It -basically requires adding lexer support for this "new" concept to the lexer, -parser, AST, and LLVM code emitter. This example is nice, because it shows how -easy it is to "grow" a language over time, incrementally extending it as new -ideas are discovered.</p> - -<p>Before we get going on "how" we add this extension, lets talk about "what" we -want. The basic idea is that we want to be able to write this sort of thing: -</p> - -<div class="doc_code"> -<pre> -def fib(x) - if x < 3 then - 1 - else - fib(x-1)+fib(x-2); -</pre> -</div> - -<p>In Kaleidoscope, every construct is an expression: there are no statements. -As such, the if/then/else expression needs to return a value like any other. -Since we're using a mostly functional form, we'll have it evaluate its -conditional, then return the 'then' or 'else' value based on how the condition -was resolved. This is very similar to the C "?:" expression.</p> - -<p>The semantics of the if/then/else expression is that it evaluates the -condition to a boolean equality value: 0.0 is considered to be false and -everything else is considered to be true. -If the condition is true, the first subexpression is evaluated and returned, if -the condition is false, the second subexpression is evaluated and returned. -Since Kaleidoscope allows side-effects, this behavior is important to nail down. -</p> - -<p>Now that we know what we "want", lets break this down into its constituent -pieces.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for -If/Then/Else</a></div> -<!-- ======================================================================= --> - - -<div class="doc_text"> - -<p>The lexer extensions are straightforward. First we add new variants -for the relevant tokens:</p> - -<div class="doc_code"> -<pre> - (* control *) - | If | Then | Else | For | In -</pre> -</div> - -<p>Once we have that, we recognize the new keywords in the lexer. This is pretty simple -stuff:</p> - -<div class="doc_code"> -<pre> - ... - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | id -> [< 'Token.Ident id; stream >] -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="ifast">AST Extensions for - If/Then/Else</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>To represent the new expression we add a new AST variant for it:</p> - -<div class="doc_code"> -<pre> -type expr = - ... - (* variant for if/then/else. *) - | If of expr * expr * expr -</pre> -</div> - -<p>The AST variant just has pointers to the various subexpressions.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="ifparser">Parser Extensions for -If/Then/Else</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>Now that we have the relevant tokens coming from the lexer and we have the -AST node to build, our parsing logic is relatively straightforward. First we -define a new parsing function:</p> - -<div class="doc_code"> -<pre> -let rec parse_primary = parser - ... - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) -</pre> -</div> - -<p>Next we hook it up as a primary expression:</p> - -<div class="doc_code"> -<pre> -let rec parse_primary = parser - ... - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>Now that we have it parsing and building the AST, the final piece is adding -LLVM code generation support. This is the most interesting part of the -if/then/else example, because this is where it starts to introduce new concepts. -All of the code above has been thoroughly described in previous chapters. -</p> - -<p>To motivate the code we want to produce, lets take a look at a simple -example. Consider:</p> - -<div class="doc_code"> -<pre> -extern foo(); -extern bar(); -def baz(x) if x then foo() else bar(); -</pre> -</div> - -<p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope -looks like this:</p> - -<div class="doc_code"> -<pre> -declare double @foo() - -declare double @bar() - -define double @baz(double %x) { -entry: - %ifcond = fcmp one double %x, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: ; preds = %entry - %calltmp = call double @foo() - br label %ifcont - -else: ; preds = %entry - %calltmp1 = call double @bar() - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>To visualize the control flow graph, you can use a nifty feature of the LLVM -'<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR -into "t.ll" and run "<tt>llvm-as < t.ll | opt -analyze -view-cfg</tt>", <a -href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll -see this graph:</p> - -<div style="text-align: center"><img src="LangImpl5-cfg.png" alt="Example CFG" width="423" -height="315"></div> - -<p>Another way to get this is to call "<tt>Llvm_analysis.view_function_cfg -f</tt>" or "<tt>Llvm_analysis.view_function_cfg_only f</tt>" (where <tt>f</tt> -is a "<tt>Function</tt>") either by inserting actual calls into the code and -recompiling or by calling these in the debugger. LLVM has many nice features -for visualizing various graphs.</p> - -<p>Getting back to the generated code, it is fairly simple: the entry block -evaluates the conditional expression ("x" in our case here) and compares the -result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>" -instruction ('one' is "Ordered and Not Equal"). Based on the result of this -expression, the code jumps to either the "then" or "else" blocks, which contain -the expressions for the true/false cases.</p> - -<p>Once the then/else blocks are finished executing, they both branch back to the -'ifcont' block to execute the code that happens after the if/then/else. In this -case the only thing left to do is to return to the caller of the function. The -question then becomes: how does the code know which expression to return?</p> - -<p>The answer to this question involves an important SSA operation: the -<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi -operation</a>. If you're not familiar with SSA, <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia -article</a> is a good introduction and there are various other introductions to -it available on your favorite search engine. The short version is that -"execution" of the Phi operation requires "remembering" which block control came -from. The Phi operation takes on the value corresponding to the input control -block. In this case, if control comes in from the "then" block, it gets the -value of "calltmp". If control comes from the "else" block, it gets the value -of "calltmp1".</p> - -<p>At this point, you are probably starting to think "Oh no! This means my -simple and elegant front-end will have to start generating SSA form in order to -use LLVM!". Fortunately, this is not the case, and we strongly advise -<em>not</em> implementing an SSA construction algorithm in your front-end -unless there is an amazingly good reason to do so. In practice, there are two -sorts of values that float around in code written for your average imperative -programming language that might need Phi nodes:</p> - -<ol> -<li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li> -<li>Values that are implicit in the structure of your AST, such as the Phi node -in this case.</li> -</ol> - -<p>In <a href="OCamlLangImpl7.html">Chapter 7</a> of this tutorial ("mutable -variables"), we'll talk about #1 -in depth. For now, just believe me that you don't need SSA construction to -handle this case. For #2, you have the choice of using the techniques that we will -describe for #1, or you can insert Phi nodes directly, if convenient. In this -case, it is really really easy to generate the Phi node, so we choose to do it -directly.</p> - -<p>Okay, enough of the motivation and overview, lets generate code!</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="ifcodegen">Code Generation for -If/Then/Else</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>In order to generate code for this, we implement the <tt>Codegen</tt> method -for <tt>IfExprAST</tt>:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - ... - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in -</pre> -</div> - -<p>This code is straightforward and similar to what we saw before. We emit the -expression for the condition, then compare that value to zero to get a truth -value as a 1-bit (bool) value.</p> - -<div class="doc_code"> -<pre> - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - position_at_end then_bb builder; -</pre> -</div> - -<p> -As opposed to the <a href="LangImpl5.html">C++ tutorial</a>, we have to build -our basic blocks bottom up since we can't have dangling BasicBlocks. We start -off by saving a pointer to the first block (which might not be the entry -block), which we'll need to build a conditional branch later. We do this by -asking the <tt>builder</tt> for the current BasicBlock. The fourth line -gets the current Function object that is being built. It gets this by the -<tt>start_bb</tt> for its "parent" (the function it is currently embedded -into).</p> - -<p>Once it has that, it creates one block. It is automatically appended into -the function's list of blocks.</p> - -<div class="doc_code"> -<pre> - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in -</pre> -</div> - -<p>We move the builder to start inserting into the "then" block. Strictly -speaking, this call moves the insertion point to be at the end of the specified -block. However, since the "then" block is empty, it also starts out by -inserting at the beginning of the block. :)</p> - -<p>Once the insertion point is set, we recursively codegen the "then" expression -from the AST.</p> - -<p>The final line here is quite subtle, but is very important. The basic issue -is that when we create the Phi node in the merge block, we need to set up the -block/value pairs that indicate how the Phi will work. Importantly, the Phi -node expects to have an entry for each predecessor of the block in the CFG. Why -then, are we getting the current block when we just set it to ThenBB 5 lines -above? The problem is that the "Then" expression may actually itself change the -block that the Builder is emitting into if, for example, it contains a nested -"if/then/else" expression. Because calling Codegen recursively could -arbitrarily change the notion of the current block, we are required to get an -up-to-date value for code that will set up the Phi node.</p> - -<div class="doc_code"> -<pre> - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in -</pre> -</div> - -<p>Code generation for the 'else' block is basically identical to codegen for -the 'then' block.</p> - -<div class="doc_code"> -<pre> - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in -</pre> -</div> - -<p>The first two lines here are now familiar: the first adds the "merge" block -to the Function object. The second block changes the insertion point so that -newly created code will go into the "merge" block. Once that is done, we need -to create the PHI node and set up the block/value pairs for the PHI.</p> - -<div class="doc_code"> -<pre> - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); -</pre> -</div> - -<p>Once the blocks are created, we can emit the conditional branch that chooses -between them. Note that creating new blocks does not implicitly affect the -IRBuilder, so it is still inserting into the block that the condition -went into. This is why we needed to save the "start" block.</p> - -<div class="doc_code"> -<pre> - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi -</pre> -</div> - -<p>To finish off the blocks, we create an unconditional branch -to the merge block. One interesting (and very important) aspect of the LLVM IR -is that it <a href="../LangRef.html#functionstructure">requires all basic blocks -to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow -instruction</a> such as return or branch. This means that all control flow, -<em>including fall throughs</em> must be made explicit in the LLVM IR. If you -violate this rule, the verifier will emit an error. - -<p>Finally, the CodeGen function returns the phi node as the value computed by -the if/then/else expression. In our example above, this returned value will -feed into the code for the top-level function, which will create the return -instruction.</p> - -<p>Overall, we now have the ability to execute conditional code in -Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language -that can calculate a wide variety of numeric functions. Next up we'll add -another useful expression that is familiar from non-functional languages...</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="for">'for' Loop Expression</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Now that we know how to add basic control flow constructs to the language, -we have the tools to add more powerful things. Lets add something more -aggressive, a 'for' expression:</p> - -<div class="doc_code"> -<pre> - extern putchard(char); - def printstar(n) - for i = 1, i < n, 1.0 in - putchard(42); # ascii 42 = '*' - - # print 100 '*' characters - printstar(100); -</pre> -</div> - -<p>This expression defines a new variable ("i" in this case) which iterates from -a starting value, while the condition ("i < n" in this case) is true, -incrementing by an optional step value ("1.0" in this case). If the step value -is omitted, it defaults to 1.0. While the loop is true, it executes its -body expression. Because we don't have anything better to return, we'll just -define the loop as always returning 0.0. In the future when we have mutable -variables, it will get more useful.</p> - -<p>As before, lets talk about the changes that we need to Kaleidoscope to -support this.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for -the 'for' Loop</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>The lexer extensions are the same sort of thing as for if/then/else:</p> - -<div class="doc_code"> -<pre> - ... in Token.token ... - (* control *) - | If | Then | Else - <b>| For | In</b> - - ... in Lexer.lex_ident... - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - <b>| "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >]</b> - | id -> [< 'Token.Ident id; stream >] -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="forast">AST Extensions for -the 'for' Loop</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>The AST variant is just as simple. It basically boils down to capturing -the variable name and the constituent expressions in the node.</p> - -<div class="doc_code"> -<pre> -type expr = - ... - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="forparser">Parser Extensions for -the 'for' Loop</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>The parser code is also fairly standard. The only interesting thing here is -handling of the optional step value. The parser code handles it by checking to -see if the second comma is present. If not, it sets the step value to null in -the AST node:</p> - -<div class="doc_code"> -<pre> -let rec parse_primary = parser - ... - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream -</pre> -</div> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="forir">LLVM IR for -the 'for' Loop</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>Now we get to the good part: the LLVM IR we want to generate for this thing. -With the simple example above, we get this LLVM IR (note that this dump is -generated with optimizations disabled for clarity): -</p> - -<div class="doc_code"> -<pre> -declare double @putchard(double) - -define double @printstar(double %n) { -entry: - ; initial value = 1.0 (inlined into phi) - br label %loop - -loop: ; preds = %loop, %entry - %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] - ; body - %calltmp = call double @putchard( double 4.200000e+01 ) - ; increment - %nextvar = fadd double %i, 1.000000e+00 - - ; termination test - %cmptmp = fcmp ult double %i, %n - %booltmp = uitofp i1 %cmptmp to double - %loopcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %loopcond, label %loop, label %afterloop - -afterloop: ; preds = %loop - ; loop always returns 0.0 - ret double 0.000000e+00 -} -</pre> -</div> - -<p>This loop contains all the same constructs we saw before: a phi node, several -expressions, and some basic blocks. Lets see how this fits together.</p> - -</div> - -<!-- ======================================================================= --> -<div class="doc_subsubsection"><a name="forcodegen">Code Generation for -the 'for' Loop</a></div> -<!-- ======================================================================= --> - -<div class="doc_text"> - -<p>The first part of Codegen is very simple: we just output the start expression -for the loop value:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - ... - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in -</pre> -</div> - -<p>With this out of the way, the next step is to set up the LLVM basic block -for the start of the loop body. In the case above, the whole loop body is one -block, but remember that the body code itself could consist of multiple blocks -(e.g. if it contains an if/then/else or a for/in expression).</p> - -<div class="doc_code"> -<pre> - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); -</pre> -</div> - -<p>This code is similar to what we saw for if/then/else. Because we will need -it to create the Phi node, we remember the block that falls through into the -loop. Once we have that, we create the actual block that starts the loop and -create an unconditional branch for the fall-through between the two blocks.</p> - -<div class="doc_code"> -<pre> - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in -</pre> -</div> - -<p>Now that the "preheader" for the loop is set up, we switch to emitting code -for the loop body. To begin with, we move the insertion point and create the -PHI node for the loop induction variable. Since we already know the incoming -value for the starting value, we add it to the Phi node. Note that the Phi will -eventually get a second value for the backedge, but we can't set it up yet -(because it doesn't exist!).</p> - -<div class="doc_code"> -<pre> - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); -</pre> -</div> - -<p>Now the code starts to get more interesting. Our 'for' loop introduces a new -variable to the symbol table. This means that our symbol table can now contain -either function arguments or loop variables. To handle this, before we codegen -the body of the loop, we add the loop variable as the current value for its -name. Note that it is possible that there is a variable of the same name in the -outer scope. It would be easy to make this an error (emit an error and return -null if there is already an entry for VarName) but we choose to allow shadowing -of variables. In order to handle this correctly, we remember the Value that -we are potentially shadowing in <tt>old_val</tt> (which will be None if there is -no shadowed variable).</p> - -<p>Once the loop variable is set into the symbol table, the code recursively -codegen's the body. This allows the body to use the loop variable: any -references to it will naturally find it in the symbol table.</p> - -<div class="doc_code"> -<pre> - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in -</pre> -</div> - -<p>Now that the body is emitted, we compute the next value of the iteration -variable by adding the step value, or 1.0 if it isn't present. -'<tt>next_var</tt>' will be the value of the loop variable on the next iteration -of the loop.</p> - -<div class="doc_code"> -<pre> - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in -</pre> -</div> - -<p>Finally, we evaluate the exit value of the loop, to determine whether the -loop should exit. This mirrors the condition evaluation for the if/then/else -statement.</p> - -<div class="doc_code"> -<pre> - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; -</pre> -</div> - -<p>With the code for the body of the loop complete, we just need to finish up -the control flow for it. This code remembers the end block (for the phi node), then creates the block for the loop exit ("afterloop"). Based on the value of the -exit condition, it creates a conditional branch that chooses between executing -the loop again and exiting the loop. Any future code is emitted in the -"afterloop" block, so it sets the insertion position to it.</p> - -<div class="doc_code"> -<pre> - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type -</pre> -</div> - -<p>The final code handles various cleanups: now that we have the -"<tt>next_var</tt>" value, we can add the incoming value to the loop PHI node. -After that, we remove the loop variable from the symbol table, so that it isn't -in scope after the for loop. Finally, code generation of the for loop always -returns 0.0, so that is what we return from <tt>Codegen.codegen_expr</tt>.</p> - -<p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of -the tutorial. In this chapter we added two control flow constructs, and used -them to motivate a couple of aspects of the LLVM IR that are important for -front-end implementors to know. In the next chapter of our saga, we will get -a bit crazier and add <a href="OCamlLangImpl6.html">user-defined operators</a> -to our poor innocent language.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for our running example, enhanced with the -if/then/else and for expressions.. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings -</pre> -</dd> - -<dt>myocamlbuild.ml:</dt> -<dd class="doc_code"> -<pre> -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = Prototype of string * string array - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_primary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>codegen.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> raise (Error "invalid binary operator") - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - -let codegen_proto = function - | Ast.Prototype (name, args) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () -</pre> -</dd> - -<dt>bindings.c</dt> -<dd class="doc_code"> -<pre> -#include <stdio.h> - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl6.html">Next: Extending the language: user-defined -operators</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl6.html b/docs/tutorial/OCamlLangImpl6.html deleted file mode 100644 index b444fff..0000000 --- a/docs/tutorial/OCamlLangImpl6.html +++ /dev/null @@ -1,1574 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: User-defined Operators</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Extending the Language: User-defined Operators</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 6 - <ol> - <li><a href="#intro">Chapter 6 Introduction</a></li> - <li><a href="#idea">User-defined Operators: the Idea</a></li> - <li><a href="#binary">User-defined Binary Operators</a></li> - <li><a href="#unary">User-defined Unary Operators</a></li> - <li><a href="#example">Kicking the Tires</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="OCamlLangImpl7.html">Chapter 7</a>: Extending the Language: Mutable -Variables / SSA Construction</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 6 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 6 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. At this point in our tutorial, we now have a fully -functional language that is fairly minimal, but also useful. There -is still one big problem with it, however. Our language doesn't have many -useful operators (like division, logical negation, or even any comparisons -besides less-than).</p> - -<p>This chapter of the tutorial takes a wild digression into adding user-defined -operators to the simple and beautiful Kaleidoscope language. This digression now -gives us a simple and ugly language in some ways, but also a powerful one at the -same time. One of the great things about creating your own language is that you -get to decide what is good or bad. In this tutorial we'll assume that it is -okay to use this as a way to show some interesting parsing techniques.</p> - -<p>At the end of this tutorial, we'll run through an example Kaleidoscope -application that <a href="#example">renders the Mandelbrot set</a>. This gives -an example of what you can build with Kaleidoscope and its feature set.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="idea">User-defined Operators: the Idea</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -The "operator overloading" that we will add to Kaleidoscope is more general than -languages like C++. In C++, you are only allowed to redefine existing -operators: you can't programatically change the grammar, introduce new -operators, change precedence levels, etc. In this chapter, we will add this -capability to Kaleidoscope, which will let the user round out the set of -operators that are supported.</p> - -<p>The point of going into user-defined operators in a tutorial like this is to -show the power and flexibility of using a hand-written parser. Thus far, the parser -we have been implementing uses recursive descent for most parts of the grammar and -operator precedence parsing for the expressions. See <a -href="OCamlLangImpl2.html">Chapter 2</a> for details. Without using operator -precedence parsing, it would be very difficult to allow the programmer to -introduce new operators into the grammar: the grammar is dynamically extensible -as the JIT runs.</p> - -<p>The two specific features we'll add are programmable unary operators (right -now, Kaleidoscope has no unary operators at all) as well as binary operators. -An example of this is:</p> - -<div class="doc_code"> -<pre> -# Logical unary not. -def unary!(v) - if v then - 0 - else - 1; - -# Define > with the same precedence as <. -def binary> 10 (LHS RHS) - RHS < LHS; - -# Binary "logical or", (note that it does not "short circuit") -def binary| 5 (LHS RHS) - if LHS then - 1 - else if RHS then - 1 - else - 0; - -# Define = with slightly lower precedence than relationals. -def binary= 9 (LHS RHS) - !(LHS < RHS | LHS > RHS); -</pre> -</div> - -<p>Many languages aspire to being able to implement their standard runtime -library in the language itself. In Kaleidoscope, we can implement significant -parts of the language in the library!</p> - -<p>We will break down implementation of these features into two parts: -implementing support for user-defined binary operators and adding unary -operators.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="binary">User-defined Binary Operators</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Adding support for user-defined binary operators is pretty simple with our -current framework. We'll first add support for the unary/binary keywords:</p> - -<div class="doc_code"> -<pre> -type token = - ... - <b>(* operators *) - | Binary | Unary</b> - -... - -and lex_ident buffer = parser - ... - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - <b>| "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >]</b> -</pre> -</div> - -<p>This just adds lexer support for the unary and binary keywords, like we -did in <a href="OCamlLangImpl5.html#iflexer">previous chapters</a>. One nice -thing about our current AST, is that we represent binary operators with full -generalisation by using their ASCII code as the opcode. For our extended -operators, we'll use this same representation, so we don't need any new AST or -parser support.</p> - -<p>On the other hand, we have to be able to represent the definitions of these -new operators, in the "def binary| 5" part of the function definition. In our -grammar so far, the "name" for the function definition is parsed as the -"prototype" production and into the <tt>Ast.Prototype</tt> AST node. To -represent our new user-defined operators as prototypes, we have to extend -the <tt>Ast.Prototype</tt> AST node like this:</p> - -<div class="doc_code"> -<pre> -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = - | Prototype of string * string array - <b>| BinOpPrototype of string * string array * int</b> -</pre> -</div> - -<p>Basically, in addition to knowing a name for the prototype, we now keep track -of whether it was an operator, and if it was, what precedence level the operator -is at. The precedence is only used for binary operators (as you'll see below, -it just doesn't apply for unary operators). Now that we have a way to represent -the prototype for a user-defined operator, we need to parse it:</p> - -<div class="doc_code"> -<pre> -(* prototype - * ::= id '(' id* ')' - <b>* ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *)</b> -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - <b>| [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence)</b> - | [< >] -> - raise (Stream.Error "expected function name in prototype") -</pre> -</div> - -<p>This is all fairly straightforward parsing code, and we have already seen -a lot of similar code in the past. One interesting part about the code above is -the couple lines that set up <tt>name</tt> for binary operators. This builds -names like "binary@" for a newly defined "@" operator. This then takes -advantage of the fact that symbol names in the LLVM symbol table are allowed to -have any character in them, including embedded nul characters.</p> - -<p>The next interesting thing to add, is codegen support for these binary -operators. Given our current structure, this is a simple addition of a default -case for our existing binary operator node:</p> - -<div class="doc_code"> -<pre> -let codegen_expr = function - ... - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - <b>| _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder</b> - end -</pre> -</div> - -<p>As you can see above, the new code is actually really simple. It just does -a lookup for the appropriate operator in the symbol table and generates a -function call to it. Since user-defined operators are just built as normal -functions (because the "prototype" boils down to a function with the right -name) everything falls into place.</p> - -<p>The final piece of code we are missing, is a bit of top level magic:</p> - -<div class="doc_code"> -<pre> -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - <b>(* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end;</b> - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - ... -</pre> -</div> - -<p>Basically, before codegening a function, if it is a user-defined operator, we -register it in the precedence table. This allows the binary operator parsing -logic we already have in place to handle it. Since we are working on a -fully-general operator precedence parser, this is all we need to do to "extend -the grammar".</p> - -<p>Now we have useful user-defined binary operators. This builds a lot -on the previous framework we built for other operators. Adding unary operators -is a bit more challenging, because we don't have any framework for it yet - lets -see what it takes.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="unary">User-defined Unary Operators</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Since we don't currently support unary operators in the Kaleidoscope -language, we'll need to add everything to support them. Above, we added simple -support for the 'unary' keyword to the lexer. In addition to that, we need an -AST node:</p> - -<div class="doc_code"> -<pre> -type expr = - ... - (* variant for a unary operator. *) - | Unary of char * expr - ... -</pre> -</div> - -<p>This AST node is very simple and obvious by now. It directly mirrors the -binary operator AST node, except that it only has one child. With this, we -need to add the parsing logic. Parsing a unary operator is pretty simple: we'll -add a new function to do it:</p> - -<div class="doc_code"> -<pre> -(* unary - * ::= primary - * ::= '!' unary *) -and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream -</pre> -</div> - -<p>The grammar we add is pretty straightforward here. If we see a unary -operator when parsing a primary operator, we eat the operator as a prefix and -parse the remaining piece as another unary operator. This allows us to handle -multiple unary operators (e.g. "!!x"). Note that unary operators can't have -ambiguous parses like binary operators can, so there is no need for precedence -information.</p> - -<p>The problem with this function, is that we need to call ParseUnary from -somewhere. To do this, we change previous callers of ParsePrimary to call -<tt>parse_unary</tt> instead:</p> - -<div class="doc_code"> -<pre> -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - ... - <b>(* Parse the unary expression after the binary operator. *) - let rhs = parse_unary stream in</b> - ... - -... - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=<b>parse_unary</b>; stream >] -> parse_bin_rhs 0 lhs stream -</pre> -</div> - -<p>With these two simple changes, we are now able to parse unary operators and build the -AST for them. Next up, we need to add parser support for prototypes, to parse -the unary operator prototype. We extend the binary operator code above -with:</p> - -<div class="doc_code"> -<pre> -(* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - <b>* ::= unary LETTER number? (id)</b> *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - <b>let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in</b> - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - <b>| [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence)</b> - | [< >] -> - raise (Stream.Error "expected function name in prototype") -</pre> -</div> - -<p>As with binary operators, we name unary operators with a name that includes -the operator character. This assists us at code generation time. Speaking of, -the final piece we need to add is codegen support for unary operators. It looks -like this:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - ... - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder -</pre> -</div> - -<p>This code is similar to, but simpler than, the code for binary operators. It -is simpler primarily because it doesn't need to handle any predefined operators. -</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="example">Kicking the Tires</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>It is somewhat hard to believe, but with a few simple extensions we've -covered in the last chapters, we have grown a real-ish language. With this, we -can do a lot of interesting things, including I/O, math, and a bunch of other -things. For example, we can now add a nice sequencing operator (printd is -defined to print out the specified value and a newline):</p> - -<div class="doc_code"> -<pre> -ready> <b>extern printd(x);</b> -Read extern: declare double @printd(double) -ready> <b>def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.</b> -.. -ready> <b>printd(123) : printd(456) : printd(789);</b> -123.000000 -456.000000 -789.000000 -Evaluated to 0.000000 -</pre> -</div> - -<p>We can also define a bunch of other "primitive" operations, such as:</p> - -<div class="doc_code"> -<pre> -# Logical unary not. -def unary!(v) - if v then - 0 - else - 1; - -# Unary negate. -def unary-(v) - 0-v; - -# Define > with the same precedence as >. -def binary> 10 (LHS RHS) - RHS < LHS; - -# Binary logical or, which does not short circuit. -def binary| 5 (LHS RHS) - if LHS then - 1 - else if RHS then - 1 - else - 0; - -# Binary logical and, which does not short circuit. -def binary& 6 (LHS RHS) - if !LHS then - 0 - else - !!RHS; - -# Define = with slightly lower precedence than relationals. -def binary = 9 (LHS RHS) - !(LHS < RHS | LHS > RHS); - -</pre> -</div> - - -<p>Given the previous if/then/else support, we can also define interesting -functions for I/O. For example, the following prints out a character whose -"density" reflects the value passed in: the lower the value, the denser the -character:</p> - -<div class="doc_code"> -<pre> -ready> -<b> -extern putchard(char) -def printdensity(d) - if d > 8 then - putchard(32) # ' ' - else if d > 4 then - putchard(46) # '.' - else if d > 2 then - putchard(43) # '+' - else - putchard(42); # '*'</b> -... -ready> <b>printdensity(1): printdensity(2): printdensity(3) : - printdensity(4): printdensity(5): printdensity(9): putchard(10);</b> -*++.. -Evaluated to 0.000000 -</pre> -</div> - -<p>Based on these simple primitive operations, we can start to define more -interesting things. For example, here's a little function that solves for the -number of iterations it takes a function in the complex plane to -converge:</p> - -<div class="doc_code"> -<pre> -# determine whether the specific location diverges. -# Solve for z = z^2 + c in the complex plane. -def mandleconverger(real imag iters creal cimag) - if iters > 255 | (real*real + imag*imag > 4) then - iters - else - mandleconverger(real*real - imag*imag + creal, - 2*real*imag + cimag, - iters+1, creal, cimag); - -# return the number of iterations required for the iteration to escape -def mandleconverge(real imag) - mandleconverger(real, imag, 0, real, imag); -</pre> -</div> - -<p>This "z = z<sup>2</sup> + c" function is a beautiful little creature that is the basis -for computation of the <a -href="http://en.wikipedia.org/wiki/Mandelbrot_set">Mandelbrot Set</a>. Our -<tt>mandelconverge</tt> function returns the number of iterations that it takes -for a complex orbit to escape, saturating to 255. This is not a very useful -function by itself, but if you plot its value over a two-dimensional plane, -you can see the Mandelbrot set. Given that we are limited to using putchard -here, our amazing graphical output is limited, but we can whip together -something using the density plotter above:</p> - -<div class="doc_code"> -<pre> -# compute and plot the mandlebrot set with the specified 2 dimensional range -# info. -def mandelhelp(xmin xmax xstep ymin ymax ystep) - for y = ymin, y < ymax, ystep in ( - (for x = xmin, x < xmax, xstep in - printdensity(mandleconverge(x,y))) - : putchard(10) - ) - -# mandel - This is a convenient helper function for ploting the mandelbrot set -# from the specified position with the specified Magnification. -def mandel(realstart imagstart realmag imagmag) - mandelhelp(realstart, realstart+realmag*78, realmag, - imagstart, imagstart+imagmag*40, imagmag); -</pre> -</div> - -<p>Given this, we can try plotting out the mandlebrot set! Lets try it out:</p> - -<div class="doc_code"> -<pre> -ready> <b>mandel(-2.3, -1.3, 0.05, 0.07);</b> -*******************************+++++++++++************************************* -*************************+++++++++++++++++++++++******************************* -**********************+++++++++++++++++++++++++++++**************************** -*******************+++++++++++++++++++++.. ...++++++++************************* -*****************++++++++++++++++++++++.... ...+++++++++*********************** -***************+++++++++++++++++++++++..... ...+++++++++********************* -**************+++++++++++++++++++++++.... ....+++++++++******************** -*************++++++++++++++++++++++...... .....++++++++******************* -************+++++++++++++++++++++....... .......+++++++****************** -***********+++++++++++++++++++.... ... .+++++++***************** -**********+++++++++++++++++....... .+++++++**************** -*********++++++++++++++........... ...+++++++*************** -********++++++++++++............ ...++++++++************** -********++++++++++... .......... .++++++++************** -*******+++++++++..... .+++++++++************* -*******++++++++...... ..+++++++++************* -*******++++++....... ..+++++++++************* -*******+++++...... ..+++++++++************* -*******.... .... ...+++++++++************* -*******.... . ...+++++++++************* -*******+++++...... ...+++++++++************* -*******++++++....... ..+++++++++************* -*******++++++++...... .+++++++++************* -*******+++++++++..... ..+++++++++************* -********++++++++++... .......... .++++++++************** -********++++++++++++............ ...++++++++************** -*********++++++++++++++.......... ...+++++++*************** -**********++++++++++++++++........ .+++++++**************** -**********++++++++++++++++++++.... ... ..+++++++**************** -***********++++++++++++++++++++++....... .......++++++++***************** -************+++++++++++++++++++++++...... ......++++++++****************** -**************+++++++++++++++++++++++.... ....++++++++******************** -***************+++++++++++++++++++++++..... ...+++++++++********************* -*****************++++++++++++++++++++++.... ...++++++++*********************** -*******************+++++++++++++++++++++......++++++++************************* -*********************++++++++++++++++++++++.++++++++*************************** -*************************+++++++++++++++++++++++******************************* -******************************+++++++++++++************************************ -******************************************************************************* -******************************************************************************* -******************************************************************************* -Evaluated to 0.000000 -ready> <b>mandel(-2, -1, 0.02, 0.04);</b> -**************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ -***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -*********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. -*******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... -*****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... -***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ -**************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... -************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. -***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . -**********++++++++++++++++++++++++++++++++++++++++++++++............. -********+++++++++++++++++++++++++++++++++++++++++++.................. -*******+++++++++++++++++++++++++++++++++++++++....................... -******+++++++++++++++++++++++++++++++++++........................... -*****++++++++++++++++++++++++++++++++............................ -*****++++++++++++++++++++++++++++............................... -****++++++++++++++++++++++++++...... ......................... -***++++++++++++++++++++++++......... ...... ........... -***++++++++++++++++++++++............ -**+++++++++++++++++++++.............. -**+++++++++++++++++++................ -*++++++++++++++++++................. -*++++++++++++++++............ ... -*++++++++++++++.............. -*+++....++++................ -*.......... ........... -* -*.......... ........... -*+++....++++................ -*++++++++++++++.............. -*++++++++++++++++............ ... -*++++++++++++++++++................. -**+++++++++++++++++++................ -**+++++++++++++++++++++.............. -***++++++++++++++++++++++............ -***++++++++++++++++++++++++......... ...... ........... -****++++++++++++++++++++++++++...... ......................... -*****++++++++++++++++++++++++++++............................... -*****++++++++++++++++++++++++++++++++............................ -******+++++++++++++++++++++++++++++++++++........................... -*******+++++++++++++++++++++++++++++++++++++++....................... -********+++++++++++++++++++++++++++++++++++++++++++.................. -Evaluated to 0.000000 -ready> <b>mandel(-0.9, -1.4, 0.02, 0.03);</b> -******************************************************************************* -******************************************************************************* -******************************************************************************* -**********+++++++++++++++++++++************************************************ -*+++++++++++++++++++++++++++++++++++++++*************************************** -+++++++++++++++++++++++++++++++++++++++++++++********************************** -++++++++++++++++++++++++++++++++++++++++++++++++++***************************** -++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** -+++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* -+++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** -+++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** -++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ -+++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** -++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** -++++++++++++++++++++++++............. .......++++++++++++++++++++++****** -+++++++++++++++++++++++............. ........+++++++++++++++++++++++**** -++++++++++++++++++++++........... ..........++++++++++++++++++++++*** -++++++++++++++++++++........... .........++++++++++++++++++++++* -++++++++++++++++++............ ...........++++++++++++++++++++ -++++++++++++++++............... .............++++++++++++++++++ -++++++++++++++................. ...............++++++++++++++++ -++++++++++++.................. .................++++++++++++++ -+++++++++.................. .................+++++++++++++ -++++++........ . ......... ..++++++++++++ -++............ ...... ....++++++++++ -.............. ...++++++++++ -.............. ....+++++++++ -.............. .....++++++++ -............. ......++++++++ -........... .......++++++++ -......... ........+++++++ -......... ........+++++++ -......... ....+++++++ -........ ...+++++++ -....... ...+++++++ - ....+++++++ - .....+++++++ - ....+++++++ - ....+++++++ - ....+++++++ -Evaluated to 0.000000 -ready> <b>^D</b> -</pre> -</div> - -<p>At this point, you may be starting to realize that Kaleidoscope is a real -and powerful language. It may not be self-similar :), but it can be used to -plot things that are!</p> - -<p>With this, we conclude the "adding user-defined operators" chapter of the -tutorial. We have successfully augmented our language, adding the ability to -extend the language in the library, and we have shown how this can be used to -build a simple but interesting end-user application in Kaleidoscope. At this -point, Kaleidoscope can build a variety of applications that are functional and -can call functions with side-effects, but it can't actually define and mutate a -variable itself.</p> - -<p>Strikingly, variable mutation is an important feature of some -languages, and it is not at all obvious how to <a href="OCamlLangImpl7.html">add -support for mutable variables</a> without having to add an "SSA construction" -phase to your front-end. In the next chapter, we will describe how you can -add variable mutation without building SSA in your front-end.</p> - -</div> - - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for our running example, enhanced with the -if/then/else and for expressions.. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings -</pre> -</dd> - -<dt>myocamlbuild.ml:</dt> -<dd class="doc_code"> -<pre> -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In - - (* operators *) - | Binary | Unary -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a unary operator. *) - | Unary of char * expr - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = - | Prototype of string * string array - | BinOpPrototype of string * string array * int - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* unary - * ::= primary - * ::= '!' unary *) -and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the unary expression after the binary operator. *) - let rhs = parse_unary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - | [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence) - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>codegen.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - (try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name")) - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder - | Ast.Binary (op, lhs, rhs) -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let preheader_bb = insertion_block builder in - let the_function = block_parent preheader_bb in - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Start the PHI node with an entry for start. *) - let variable = build_phi [(start_val, preheader_bb)] var_name builder in - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name variable; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - let next_var = build_add variable step_val "nextvar" builder in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let loop_end_bb = insertion_block builder in - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Add a new entry to the PHI node for the backedge. *) - add_incoming (next_var, loop_end_bb) variable; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - -let codegen_proto = function - | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end; - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () -</pre> -</dd> - -<dt>bindings.c</dt> -<dd class="doc_code"> -<pre> -#include <stdio.h> - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} - -/* printd - printf that takes a double prints it as "%f\n", returning 0. */ -extern double printd(double X) { - printf("%f\n", X); - return 0; -} -</pre> -</dd> -</dl> - -<a href="OCamlLangImpl7.html">Next: Extending the language: mutable variables / -SSA construction</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/OCamlLangImpl7.html b/docs/tutorial/OCamlLangImpl7.html deleted file mode 100644 index c140888..0000000 --- a/docs/tutorial/OCamlLangImpl7.html +++ /dev/null @@ -1,1907 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> - -<html> -<head> - <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA - construction</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Chris Lattner"> - <meta name="author" content="Erick Tryzelaar"> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div> - -<ul> -<li><a href="index.html">Up to Tutorial Index</a></li> -<li>Chapter 7 - <ol> - <li><a href="#intro">Chapter 7 Introduction</a></li> - <li><a href="#why">Why is this a hard problem?</a></li> - <li><a href="#memory">Memory in LLVM</a></li> - <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li> - <li><a href="#adjustments">Adjusting Existing Variables for - Mutation</a></li> - <li><a href="#assignment">New Assignment Operator</a></li> - <li><a href="#localvars">User-defined Local Variables</a></li> - <li><a href="#code">Full Code Listing</a></li> - </ol> -</li> -<li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM - tidbits</li> -</ul> - -<div class="doc_author"> - <p> - Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> - and <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a> - </p> -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language -with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very -respectable, albeit simple, <a -href="http://en.wikipedia.org/wiki/Functional_programming">functional -programming language</a>. In our journey, we learned some parsing techniques, -how to build and represent an AST, how to build LLVM IR, and how to optimize -the resultant code as well as JIT compile it.</p> - -<p>While Kaleidoscope is interesting as a functional language, the fact that it -is functional makes it "too easy" to generate LLVM IR for it. In particular, a -functional language makes it very easy to build LLVM IR directly in <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>. -Since LLVM requires that the input code be in SSA form, this is a very nice -property and it is often unclear to newcomers how to generate code for an -imperative language with mutable variables.</p> - -<p>The short (and happy) summary of this chapter is that there is no need for -your front-end to build SSA form: LLVM provides highly tuned and well tested -support for this, though the way it works is a bit unexpected for some.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="why">Why is this a hard problem?</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -To understand why mutable variables cause complexities in SSA construction, -consider this extremely simple C example: -</p> - -<div class="doc_code"> -<pre> -int G, H; -int test(_Bool Condition) { - int X; - if (Condition) - X = G; - else - X = H; - return X; -} -</pre> -</div> - -<p>In this case, we have the variable "X", whose value depends on the path -executed in the program. Because there are two different possible values for X -before the return instruction, a PHI node is inserted to merge the two values. -The LLVM IR that we want for this example looks like this:</p> - -<div class="doc_code"> -<pre> -@G = weak global i32 0 ; type of @G is i32* -@H = weak global i32 0 ; type of @H is i32* - -define i32 @test(i1 %Condition) { -entry: - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - br label %cond_next - -cond_false: - %X.1 = load i32* @H - br label %cond_next - -cond_next: - %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] - ret i32 %X.2 -} -</pre> -</div> - -<p>In this example, the loads from the G and H global variables are explicit in -the LLVM IR, and they live in the then/else branches of the if statement -(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node -in the cond_next block selects the right value to use based on where control -flow is coming from: if control flow comes from the cond_false block, X.2 gets -the value of X.1. Alternatively, if control flow comes from cond_true, it gets -the value of X.0. The intent of this chapter is not to explain the details of -SSA form. For more information, see one of the many <a -href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online -references</a>.</p> - -<p>The question for this article is "who places the phi nodes when lowering -assignments to mutable variables?". The issue here is that LLVM -<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it. -However, SSA construction requires non-trivial algorithms and data structures, -so it is inconvenient and wasteful for every front-end to have to reproduce this -logic.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="memory">Memory in LLVM</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>The 'trick' here is that while LLVM does require all register values to be -in SSA form, it does not require (or permit) memory objects to be in SSA form. -In the example above, note that the loads from G and H are direct accesses to -G and H: they are not renamed or versioned. This differs from some other -compiler systems, which do try to version memory objects. In LLVM, instead of -encoding dataflow analysis of memory into the LLVM IR, it is handled with <a -href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on -demand.</p> - -<p> -With this in mind, the high-level idea is that we want to make a stack variable -(which lives in memory, because it is on the stack) for each mutable object in -a function. To take advantage of this trick, we need to talk about how LLVM -represents stack variables. -</p> - -<p>In LLVM, all memory accesses are explicit with load/store instructions, and -it is carefully designed not to have (or need) an "address-of" operator. Notice -how the type of the @G/@H global variables is actually "i32*" even though the -variable is defined as "i32". What this means is that @G defines <em>space</em> -for an i32 in the global data area, but its <em>name</em> actually refers to the -address for that space. Stack variables work the same way, except that instead of -being declared with global variable definitions, they are declared with the -<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p> - -<div class="doc_code"> -<pre> -define i32 @example() { -entry: - %X = alloca i32 ; type of %X is i32*. - ... - %tmp = load i32* %X ; load the stack value %X from the stack. - %tmp2 = add i32 %tmp, 1 ; increment it - store i32 %tmp2, i32* %X ; store it back - ... -</pre> -</div> - -<p>This code shows an example of how you can declare and manipulate a stack -variable in the LLVM IR. Stack memory allocated with the alloca instruction is -fully general: you can pass the address of the stack slot to functions, you can -store it in other variables, etc. In our example above, we could rewrite the -example to use the alloca technique to avoid using a PHI node:</p> - -<div class="doc_code"> -<pre> -@G = weak global i32 0 ; type of @G is i32* -@H = weak global i32 0 ; type of @H is i32* - -define i32 @test(i1 %Condition) { -entry: - %X = alloca i32 ; type of %X is i32*. - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - store i32 %X.0, i32* %X ; Update X - br label %cond_next - -cond_false: - %X.1 = load i32* @H - store i32 %X.1, i32* %X ; Update X - br label %cond_next - -cond_next: - %X.2 = load i32* %X ; Read X - ret i32 %X.2 -} -</pre> -</div> - -<p>With this, we have discovered a way to handle arbitrary mutable variables -without the need to create Phi nodes at all:</p> - -<ol> -<li>Each mutable variable becomes a stack allocation.</li> -<li>Each read of the variable becomes a load from the stack.</li> -<li>Each update of the variable becomes a store to the stack.</li> -<li>Taking the address of a variable just uses the stack address directly.</li> -</ol> - -<p>While this solution has solved our immediate problem, it introduced another -one: we have now apparently introduced a lot of stack traffic for very simple -and common operations, a major performance problem. Fortunately for us, the -LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles -this case, promoting allocas like this into SSA registers, inserting Phi nodes -as appropriate. If you run this example through the pass, for example, you'll -get:</p> - -<div class="doc_code"> -<pre> -$ <b>llvm-as < example.ll | opt -mem2reg | llvm-dis</b> -@G = weak global i32 0 -@H = weak global i32 0 - -define i32 @test(i1 %Condition) { -entry: - br i1 %Condition, label %cond_true, label %cond_false - -cond_true: - %X.0 = load i32* @G - br label %cond_next - -cond_false: - %X.1 = load i32* @H - br label %cond_next - -cond_next: - %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] - ret i32 %X.01 -} -</pre> -</div> - -<p>The mem2reg pass implements the standard "iterated dominance frontier" -algorithm for constructing SSA form and has a number of optimizations that speed -up (very common) degenerate cases. The mem2reg optimization pass is the answer -to dealing with mutable variables, and we highly recommend that you depend on -it. Note that mem2reg only works on variables in certain circumstances:</p> - -<ol> -<li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it -promotes them. It does not apply to global variables or heap allocations.</li> - -<li>mem2reg only looks for alloca instructions in the entry block of the -function. Being in the entry block guarantees that the alloca is only executed -once, which makes analysis simpler.</li> - -<li>mem2reg only promotes allocas whose uses are direct loads and stores. If -the address of the stack object is passed to a function, or if any funny pointer -arithmetic is involved, the alloca will not be promoted.</li> - -<li>mem2reg only works on allocas of <a -href="../LangRef.html#t_classifications">first class</a> -values (such as pointers, scalars and vectors), and only if the array size -of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of -promoting structs or arrays to registers. Note that the "scalarrepl" pass is -more powerful and can promote structs, "unions", and arrays in many cases.</li> - -</ol> - -<p> -All of these properties are easy to satisfy for most imperative languages, and -we'll illustrate it below with Kaleidoscope. The final question you may be -asking is: should I bother with this nonsense for my front-end? Wouldn't it be -better if I just did SSA construction directly, avoiding use of the mem2reg -optimization pass? In short, we strongly recommend that you use this technique -for building SSA form, unless there is an extremely good reason not to. Using -this technique is:</p> - -<ul> -<li>Proven and well tested: llvm-gcc and clang both use this technique for local -mutable variables. As such, the most common clients of LLVM are using this to -handle a bulk of their variables. You can be sure that bugs are found fast and -fixed early.</li> - -<li>Extremely Fast: mem2reg has a number of special cases that make it fast in -common cases as well as fully general. For example, it has fast-paths for -variables that are only used in a single block, variables that only have one -assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc. -</li> - -<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html"> -Debug information in LLVM</a> relies on having the address of the variable -exposed so that debug info can be attached to it. This technique dovetails -very naturally with this style of debug info.</li> -</ul> - -<p>If nothing else, this makes it much easier to get your front-end up and -running, and is very simple to implement. Lets extend Kaleidoscope with mutable -variables now! -</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="kalvars">Mutable Variables in -Kaleidoscope</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Now that we know the sort of problem we want to tackle, lets see what this -looks like in the context of our little Kaleidoscope language. We're going to -add two features:</p> - -<ol> -<li>The ability to mutate variables with the '=' operator.</li> -<li>The ability to define new variables.</li> -</ol> - -<p>While the first item is really what this is about, we only have variables -for incoming arguments as well as for induction variables, and redefining those only -goes so far :). Also, the ability to define new variables is a -useful thing regardless of whether you will be mutating them. Here's a -motivating example that shows how we could use these:</p> - -<div class="doc_code"> -<pre> -# Define ':' for sequencing: as a low-precedence operator that ignores operands -# and just returns the RHS. -def binary : 1 (x y) y; - -# Recursive fib, we could do this before. -def fib(x) - if (x < 3) then - 1 - else - fib(x-1)+fib(x-2); - -# Iterative fib. -def fibi(x) - <b>var a = 1, b = 1, c in</b> - (for i = 3, i < x in - <b>c = a + b</b> : - <b>a = b</b> : - <b>b = c</b>) : - b; - -# Call it. -fibi(10); -</pre> -</div> - -<p> -In order to mutate variables, we have to change our existing variables to use -the "alloca trick". Once we have that, we'll add our new operator, then extend -Kaleidoscope to support new variable definitions. -</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="adjustments">Adjusting Existing Variables for -Mutation</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -The symbol table in Kaleidoscope is managed at code generation time by the -'<tt>named_values</tt>' map. This map currently keeps track of the LLVM -"Value*" that holds the double value for the named variable. In order to -support mutation, we need to change this slightly, so that it -<tt>named_values</tt> holds the <em>memory location</em> of the variable in -question. Note that this change is a refactoring: it changes the structure of -the code, but does not (by itself) change the behavior of the compiler. All of -these changes are isolated in the Kaleidoscope code generator.</p> - -<p> -At this point in Kaleidoscope's development, it only supports variables for two -things: incoming arguments to functions and the induction variable of 'for' -loops. For consistency, we'll allow mutation of these variables in addition to -other user-defined variables. This means that these will both need memory -locations. -</p> - -<p>To start our transformation of Kaleidoscope, we'll change the -<tt>named_values</tt> map so that it maps to AllocaInst* instead of Value*. -Once we do this, the C++ compiler will tell us what parts of the code we need to -update:</p> - -<p><b>Note:</b> the ocaml bindings currently model both <tt>Value*</tt>s and -<tt>AllocInst*</tt>s as <tt>Llvm.llvalue</tt>s, but this may change in the -future to be more type safe.</p> - -<div class="doc_code"> -<pre> -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -</pre> -</div> - -<p>Also, since we will need to create these alloca's, we'll use a helper -function that ensures that the allocas are created in the entry block of the -function:</p> - -<div class="doc_code"> -<pre> -(* Create an alloca instruction in the entry block of the function. This - * is used for mutable variables etc. *) -let create_entry_block_alloca the_function var_name = - let builder = builder_at (instr_begin (entry_block the_function)) in - build_alloca double_type var_name builder -</pre> -</div> - -<p>This funny looking code creates an <tt>Llvm.llbuilder</tt> object that is -pointing at the first instruction of the entry block. It then creates an alloca -with the expected name and returns it. Because all values in Kaleidoscope are -doubles, there is no need to pass in a type to use.</p> - -<p>With this in place, the first functionality change we want to make is to -variable references. In our new scheme, variables live on the stack, so code -generating a reference to them actually needs to produce a load from the stack -slot:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - ... - | Ast.Variable name -> - let v = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - <b>(* Load the value. *) - build_load v name builder</b> -</pre> -</div> - -<p>As you can see, this is pretty straightforward. Now we need to update the -things that define the variables to set up the alloca. We'll start with -<tt>codegen_expr Ast.For ...</tt> (see the <a href="#code">full code listing</a> -for the unabridged code):</p> - -<div class="doc_code"> -<pre> - | Ast.For (var_name, start, end_, step, body) -> - let the_function = block_parent (insertion_block builder) in - - (* Create an alloca for the variable in the entry block. *) - <b>let alloca = create_entry_block_alloca the_function var_name in</b> - - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - <b>(* Store the value into the alloca. *) - ignore(build_store start_val alloca builder);</b> - - ... - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - <b>Hashtbl.add named_values var_name alloca;</b> - - ... - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - <b>(* Reload, increment, and restore the alloca. This handles the case where - * the body of the loop mutates the variable. *) - let cur_var = build_load alloca var_name builder in - let next_var = build_add cur_var step_val "nextvar" builder in - ignore(build_store next_var alloca builder);</b> - ... -</pre> -</div> - -<p>This code is virtually identical to the code <a -href="OCamlLangImpl5.html#forcodegen">before we allowed mutable variables</a>. -The big difference is that we no longer have to construct a PHI node, and we use -load/store to access the variable as needed.</p> - -<p>To support mutable argument variables, we need to also make allocas for them. -The code for this is also pretty simple:</p> - -<div class="doc_code"> -<pre> -(* Create an alloca for each argument and register the argument in the symbol - * table so that references to it will succeed. *) -let create_argument_allocas the_function proto = - let args = match proto with - | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args - in - Array.iteri (fun i ai -> - let var_name = args.(i) in - (* Create an alloca for this variable. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Store the initial value into the alloca. *) - ignore(build_store ai alloca builder); - - (* Add arguments to variable symbol table. *) - Hashtbl.add named_values var_name alloca; - ) (params the_function) -</pre> -</div> - -<p>For each argument, we make an alloca, store the input value to the function -into the alloca, and register the alloca as the memory location for the -argument. This method gets invoked by <tt>Codegen.codegen_func</tt> right after -it sets up the entry block for the function.</p> - -<p>The final missing piece is adding the mem2reg pass, which allows us to get -good codegen once again:</p> - -<div class="doc_code"> -<pre> -let main () = - ... - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - <b>(* Promote allocas to registers. *) - add_memory_to_register_promotion the_fpm;</b> - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combining the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; -</pre> -</div> - -<p>It is interesting to see what the code looks like before and after the -mem2reg optimization runs. For example, this is the before/after code for our -recursive fib function. Before the optimization:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - <b>%x1 = alloca double - store double %x, double* %x1 - %x2 = load double* %x1</b> - %cmptmp = fcmp ult double %x2, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: ; preds = %entry - br label %ifcont - -else: ; preds = %entry - <b>%x3 = load double* %x1</b> - %subtmp = fsub double %x3, 1.000000e+00 - %calltmp = call double @fib( double %subtmp ) - <b>%x4 = load double* %x1</b> - %subtmp5 = fsub double %x4, 2.000000e+00 - %calltmp6 = call double @fib( double %subtmp5 ) - %addtmp = fadd double %calltmp, %calltmp6 - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>Here there is only one variable (x, the input argument) but you can still -see the extremely simple-minded code generation strategy we are using. In the -entry block, an alloca is created, and the initial input value is stored into -it. Each reference to the variable does a reload from the stack. Also, note -that we didn't modify the if/then/else expression, so it still inserts a PHI -node. While we could make an alloca for it, it is actually easier to create a -PHI node for it, so we still just make the PHI.</p> - -<p>Here is the code after the mem2reg pass runs:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp one double %booltmp, 0.000000e+00 - br i1 %ifcond, label %then, label %else - -then: - br label %ifcont - -else: - %subtmp = fsub double <b>%x</b>, 1.000000e+00 - %calltmp = call double @fib( double %subtmp ) - %subtmp5 = fsub double <b>%x</b>, 2.000000e+00 - %calltmp6 = call double @fib( double %subtmp5 ) - %addtmp = fadd double %calltmp, %calltmp6 - br label %ifcont - -ifcont: ; preds = %else, %then - %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] - ret double %iftmp -} -</pre> -</div> - -<p>This is a trivial case for mem2reg, since there are no redefinitions of the -variable. The point of showing this is to calm your tension about inserting -such blatent inefficiencies :).</p> - -<p>After the rest of the optimizers run, we get:</p> - -<div class="doc_code"> -<pre> -define double @fib(double %x) { -entry: - %cmptmp = fcmp ult double %x, 3.000000e+00 - %booltmp = uitofp i1 %cmptmp to double - %ifcond = fcmp ueq double %booltmp, 0.000000e+00 - br i1 %ifcond, label %else, label %ifcont - -else: - %subtmp = fsub double %x, 1.000000e+00 - %calltmp = call double @fib( double %subtmp ) - %subtmp5 = fsub double %x, 2.000000e+00 - %calltmp6 = call double @fib( double %subtmp5 ) - %addtmp = fadd double %calltmp, %calltmp6 - ret double %addtmp - -ifcont: - ret double 1.000000e+00 -} -</pre> -</div> - -<p>Here we see that the simplifycfg pass decided to clone the return instruction -into the end of the 'else' block. This allowed it to eliminate some branches -and the PHI node.</p> - -<p>Now that all symbol table references are updated to use stack variables, -we'll add the assignment operator.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="assignment">New Assignment Operator</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>With our current framework, adding a new assignment operator is really -simple. We will parse it just like any other binary operator, but handle it -internally (instead of allowing the user to define it). The first step is to -set a precedence:</p> - -<div class="doc_code"> -<pre> -let main () = - (* Install standard binary operators. - * 1 is the lowest precedence. *) - <b>Hashtbl.add Parser.binop_precedence '=' 2;</b> - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - ... -</pre> -</div> - -<p>Now that the parser knows the precedence of the binary operator, it takes -care of all the parsing and AST generation. We just need to implement codegen -for the assignment operator. This looks like:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - begin match op with - | '=' -> - (* Special case '=' because we don't want to emit the LHS as an - * expression. *) - let name = - match lhs with - | Ast.Variable name -> name - | _ -> raise (Error "destination of '=' must be a variable") - in -</pre> -</div> - -<p>Unlike the rest of the binary operators, our assignment operator doesn't -follow the "emit LHS, emit RHS, do computation" model. As such, it is handled -as a special case before the other binary operators are handled. The other -strange thing is that it requires the LHS to be a variable. It is invalid to -have "(x+1) = expr" - only things like "x = expr" are allowed. -</p> - - -<div class="doc_code"> -<pre> - (* Codegen the rhs. *) - let val_ = codegen_expr rhs in - - (* Lookup the name. *) - let variable = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - ignore(build_store val_ variable builder); - val_ - | _ -> - ... -</pre> -</div> - -<p>Once we have the variable, codegen'ing the assignment is straightforward: -we emit the RHS of the assignment, create a store, and return the computed -value. Returning a value allows for chained assignments like "X = (Y = Z)".</p> - -<p>Now that we have an assignment operator, we can mutate loop variables and -arguments. For example, we can now run code like this:</p> - -<div class="doc_code"> -<pre> -# Function to print a double. -extern printd(x); - -# Define ':' for sequencing: as a low-precedence operator that ignores operands -# and just returns the RHS. -def binary : 1 (x y) y; - -def test(x) - printd(x) : - x = 4 : - printd(x); - -test(123); -</pre> -</div> - -<p>When run, this example prints "123" and then "4", showing that we did -actually mutate the value! Okay, we have now officially implemented our goal: -getting this to work requires SSA construction in the general case. However, -to be really useful, we want the ability to define our own local variables, lets -add this next! -</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="localvars">User-defined Local -Variables</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p>Adding var/in is just like any other other extensions we made to -Kaleidoscope: we extend the lexer, the parser, the AST and the code generator. -The first step for adding our new 'var/in' construct is to extend the lexer. -As before, this is pretty trivial, the code looks like this:</p> - -<div class="doc_code"> -<pre> -type token = - ... - <b>(* var definition *) - | Var</b> - -... - -and lex_ident buffer = parser - ... - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - <b>| "var" -> [< 'Token.Var; stream >]</b> - ... -</pre> -</div> - -<p>The next step is to define the AST node that we will construct. For var/in, -it looks like this:</p> - -<div class="doc_code"> -<pre> -type expr = - ... - (* variant for var/in. *) - | Var of (string * expr option) array * expr - ... -</pre> -</div> - -<p>var/in allows a list of names to be defined all at once, and each name can -optionally have an initializer value. As such, we capture this information in -the VarNames vector. Also, var/in has a body, this body is allowed to access -the variables defined by the var/in.</p> - -<p>With this in place, we can define the parser pieces. The first thing we do -is add it as a primary expression:</p> - -<div class="doc_code"> -<pre> -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr - <b>* ::= varexpr</b> *) -let rec parse_primary = parser - ... - <b>(* varexpr - * ::= 'var' identifier ('=' expression? - * (',' identifier ('=' expression)?)* 'in' expression *) - | [< 'Token.Var; - (* At least one variable name is required. *) - 'Token.Ident id ?? "expected identifier after var"; - init=parse_var_init; - var_names=parse_var_names [(id, init)]; - (* At this point, we have to have 'in'. *) - 'Token.In ?? "expected 'in' keyword after 'var'"; - body=parse_expr >] -> - Ast.Var (Array.of_list (List.rev var_names), body)</b> - -... - -and parse_var_init = parser - (* read in the optional initializer. *) - | [< 'Token.Kwd '='; e=parse_expr >] -> Some e - | [< >] -> None - -and parse_var_names accumulator = parser - | [< 'Token.Kwd ','; - 'Token.Ident id ?? "expected identifier list after var"; - init=parse_var_init; - e=parse_var_names ((id, init) :: accumulator) >] -> e - | [< >] -> accumulator -</pre> -</div> - -<p>Now that we can parse and represent the code, we need to support emission of -LLVM IR for it. This code starts out with:</p> - -<div class="doc_code"> -<pre> -let rec codegen_expr = function - ... - | Ast.Var (var_names, body) - let old_bindings = ref [] in - - let the_function = block_parent (insertion_block builder) in - - (* Register all variables and emit their initializer. *) - Array.iter (fun (var_name, init) -> -</pre> -</div> - -<p>Basically it loops over all the variables, installing them one at a time. -For each variable we put into the symbol table, we remember the previous value -that we replace in OldBindings.</p> - -<div class="doc_code"> -<pre> - (* Emit the initializer before adding the variable to scope, this - * prevents the initializer from referencing the variable itself, and - * permits stuff like this: - * var a = 1 in - * var a = a in ... # refers to outer 'a'. *) - let init_val = - match init with - | Some init -> codegen_expr init - (* If not specified, use 0.0. *) - | None -> const_float double_type 0.0 - in - - let alloca = create_entry_block_alloca the_function var_name in - ignore(build_store init_val alloca builder); - - (* Remember the old variable binding so that we can restore the binding - * when we unrecurse. *) - - begin - try - let old_value = Hashtbl.find named_values var_name in - old_bindings := (var_name, old_value) :: !old_bindings; - with Not_found > () - end; - - (* Remember this binding. *) - Hashtbl.add named_values var_name alloca; - ) var_names; -</pre> -</div> - -<p>There are more comments here than code. The basic idea is that we emit the -initializer, create the alloca, then update the symbol table to point to it. -Once all the variables are installed in the symbol table, we evaluate the body -of the var/in expression:</p> - -<div class="doc_code"> -<pre> - (* Codegen the body, now that all vars are in scope. *) - let body_val = codegen_expr body in -</pre> -</div> - -<p>Finally, before returning, we restore the previous variable bindings:</p> - -<div class="doc_code"> -<pre> - (* Pop all our variables from scope. *) - List.iter (fun (var_name, old_value) -> - Hashtbl.add named_values var_name old_value - ) !old_bindings; - - (* Return the body computation. *) - body_val -</pre> -</div> - -<p>The end result of all of this is that we get properly scoped variable -definitions, and we even (trivially) allow mutation of them :).</p> - -<p>With this, we completed what we set out to do. Our nice iterative fib -example from the intro compiles and runs just fine. The mem2reg pass optimizes -all of our stack variables into SSA registers, inserting PHI nodes where needed, -and our front-end remains simple: no "iterated dominance frontier" computation -anywhere in sight.</p> - -</div> - -<!-- *********************************************************************** --> -<div class="doc_section"><a name="code">Full Code Listing</a></div> -<!-- *********************************************************************** --> - -<div class="doc_text"> - -<p> -Here is the complete code listing for our running example, enhanced with mutable -variables and var/in support. To build this example, use: -</p> - -<div class="doc_code"> -<pre> -# Compile -ocamlbuild toy.byte -# Run -./toy.byte -</pre> -</div> - -<p>Here is the code:</p> - -<dl> -<dt>_tags:</dt> -<dd class="doc_code"> -<pre> -<{lexer,parser}.ml>: use_camlp4, pp(camlp4of) -<*.{byte,native}>: g++, use_llvm, use_llvm_analysis -<*.{byte,native}>: use_llvm_executionengine, use_llvm_target -<*.{byte,native}>: use_llvm_scalar_opts, use_bindings -</pre> -</dd> - -<dt>myocamlbuild.ml:</dt> -<dd class="doc_code"> -<pre> -open Ocamlbuild_plugin;; - -ocaml_lib ~extern:true "llvm";; -ocaml_lib ~extern:true "llvm_analysis";; -ocaml_lib ~extern:true "llvm_executionengine";; -ocaml_lib ~extern:true "llvm_target";; -ocaml_lib ~extern:true "llvm_scalar_opts";; - -flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; -dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; -</pre> -</dd> - -<dt>token.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer Tokens - *===----------------------------------------------------------------------===*) - -(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of - * these others for known things. *) -type token = - (* commands *) - | Def | Extern - - (* primary *) - | Ident of string | Number of float - - (* unknown *) - | Kwd of char - - (* control *) - | If | Then | Else - | For | In - - (* operators *) - | Binary | Unary - - (* var definition *) - | Var -</pre> -</dd> - -<dt>lexer.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Lexer - *===----------------------------------------------------------------------===*) - -let rec lex = parser - (* Skip any whitespace. *) - | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream - - (* identifier: [a-zA-Z][a-zA-Z0-9] *) - | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_ident buffer stream - - (* number: [0-9.]+ *) - | [< ' ('0' .. '9' as c); stream >] -> - let buffer = Buffer.create 1 in - Buffer.add_char buffer c; - lex_number buffer stream - - (* Comment until end of line. *) - | [< ' ('#'); stream >] -> - lex_comment stream - - (* Otherwise, just return the character as its ascii value. *) - | [< 'c; stream >] -> - [< 'Token.Kwd c; lex stream >] - - (* end of stream. *) - | [< >] -> [< >] - -and lex_number buffer = parser - | [< ' ('0' .. '9' | '.' as c); stream >] -> - Buffer.add_char buffer c; - lex_number buffer stream - | [< stream=lex >] -> - [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] - -and lex_ident buffer = parser - | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> - Buffer.add_char buffer c; - lex_ident buffer stream - | [< stream=lex >] -> - match Buffer.contents buffer with - | "def" -> [< 'Token.Def; stream >] - | "extern" -> [< 'Token.Extern; stream >] - | "if" -> [< 'Token.If; stream >] - | "then" -> [< 'Token.Then; stream >] - | "else" -> [< 'Token.Else; stream >] - | "for" -> [< 'Token.For; stream >] - | "in" -> [< 'Token.In; stream >] - | "binary" -> [< 'Token.Binary; stream >] - | "unary" -> [< 'Token.Unary; stream >] - | "var" -> [< 'Token.Var; stream >] - | id -> [< 'Token.Ident id; stream >] - -and lex_comment = parser - | [< ' ('\n'); stream=lex >] -> stream - | [< 'c; e=lex_comment >] -> e - | [< >] -> [< >] -</pre> -</dd> - -<dt>ast.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Abstract Syntax Tree (aka Parse Tree) - *===----------------------------------------------------------------------===*) - -(* expr - Base type for all expression nodes. *) -type expr = - (* variant for numeric literals like "1.0". *) - | Number of float - - (* variant for referencing a variable, like "a". *) - | Variable of string - - (* variant for a unary operator. *) - | Unary of char * expr - - (* variant for a binary operator. *) - | Binary of char * expr * expr - - (* variant for function calls. *) - | Call of string * expr array - - (* variant for if/then/else. *) - | If of expr * expr * expr - - (* variant for for/in. *) - | For of string * expr * expr * expr option * expr - - (* variant for var/in. *) - | Var of (string * expr option) array * expr - -(* proto - This type represents the "prototype" for a function, which captures - * its name, and its argument names (thus implicitly the number of arguments the - * function takes). *) -type proto = - | Prototype of string * string array - | BinOpPrototype of string * string array * int - -(* func - This type represents a function definition itself. *) -type func = Function of proto * expr -</pre> -</dd> - -<dt>parser.ml:</dt> -<dd class="doc_code"> -<pre> -(*===---------------------------------------------------------------------=== - * Parser - *===---------------------------------------------------------------------===*) - -(* binop_precedence - This holds the precedence for each binary operator that is - * defined *) -let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 - -(* precedence - Get the precedence of the pending binary operator token. *) -let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 - -(* primary - * ::= identifier - * ::= numberexpr - * ::= parenexpr - * ::= ifexpr - * ::= forexpr - * ::= varexpr *) -let rec parse_primary = parser - (* numberexpr ::= number *) - | [< 'Token.Number n >] -> Ast.Number n - - (* parenexpr ::= '(' expression ')' *) - | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e - - (* identifierexpr - * ::= identifier - * ::= identifier '(' argumentexpr ')' *) - | [< 'Token.Ident id; stream >] -> - let rec parse_args accumulator = parser - | [< e=parse_expr; stream >] -> - begin parser - | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e - | [< >] -> e :: accumulator - end stream - | [< >] -> accumulator - in - let rec parse_ident id = parser - (* Call. *) - | [< 'Token.Kwd '('; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')'">] -> - Ast.Call (id, Array.of_list (List.rev args)) - - (* Simple variable ref. *) - | [< >] -> Ast.Variable id - in - parse_ident id stream - - (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) - | [< 'Token.If; c=parse_expr; - 'Token.Then ?? "expected 'then'"; t=parse_expr; - 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> - Ast.If (c, t, e) - - (* forexpr - ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) - | [< 'Token.For; - 'Token.Ident id ?? "expected identifier after for"; - 'Token.Kwd '=' ?? "expected '=' after for"; - stream >] -> - begin parser - | [< - start=parse_expr; - 'Token.Kwd ',' ?? "expected ',' after for"; - end_=parse_expr; - stream >] -> - let step = - begin parser - | [< 'Token.Kwd ','; step=parse_expr >] -> Some step - | [< >] -> None - end stream - in - begin parser - | [< 'Token.In; body=parse_expr >] -> - Ast.For (id, start, end_, step, body) - | [< >] -> - raise (Stream.Error "expected 'in' after for") - end stream - | [< >] -> - raise (Stream.Error "expected '=' after for") - end stream - - (* varexpr - * ::= 'var' identifier ('=' expression? - * (',' identifier ('=' expression)?)* 'in' expression *) - | [< 'Token.Var; - (* At least one variable name is required. *) - 'Token.Ident id ?? "expected identifier after var"; - init=parse_var_init; - var_names=parse_var_names [(id, init)]; - (* At this point, we have to have 'in'. *) - 'Token.In ?? "expected 'in' keyword after 'var'"; - body=parse_expr >] -> - Ast.Var (Array.of_list (List.rev var_names), body) - - | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") - -(* unary - * ::= primary - * ::= '!' unary *) -and parse_unary = parser - (* If this is a unary operator, read it. *) - | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> - Ast.Unary (op, operand) - - (* If the current token is not an operator, it must be a primary expr. *) - | [< stream >] -> parse_primary stream - -(* binoprhs - * ::= ('+' primary)* *) -and parse_bin_rhs expr_prec lhs stream = - match Stream.peek stream with - (* If this is a binop, find its precedence. *) - | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> - let token_prec = precedence c in - - (* If this is a binop that binds at least as tightly as the current binop, - * consume it, otherwise we are done. *) - if token_prec < expr_prec then lhs else begin - (* Eat the binop. *) - Stream.junk stream; - - (* Parse the primary expression after the binary operator. *) - let rhs = parse_unary stream in - - (* Okay, we know this is a binop. *) - let rhs = - match Stream.peek stream with - | Some (Token.Kwd c2) -> - (* If BinOp binds less tightly with rhs than the operator after - * rhs, let the pending operator take rhs as its lhs. *) - let next_prec = precedence c2 in - if token_prec < next_prec - then parse_bin_rhs (token_prec + 1) rhs stream - else rhs - | _ -> rhs - in - - (* Merge lhs/rhs. *) - let lhs = Ast.Binary (c, lhs, rhs) in - parse_bin_rhs expr_prec lhs stream - end - | _ -> lhs - -and parse_var_init = parser - (* read in the optional initializer. *) - | [< 'Token.Kwd '='; e=parse_expr >] -> Some e - | [< >] -> None - -and parse_var_names accumulator = parser - | [< 'Token.Kwd ','; - 'Token.Ident id ?? "expected identifier list after var"; - init=parse_var_init; - e=parse_var_names ((id, init) :: accumulator) >] -> e - | [< >] -> accumulator - -(* expression - * ::= primary binoprhs *) -and parse_expr = parser - | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream - -(* prototype - * ::= id '(' id* ')' - * ::= binary LETTER number? (id, id) - * ::= unary LETTER number? (id) *) -let parse_prototype = - let rec parse_args accumulator = parser - | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e - | [< >] -> accumulator - in - let parse_operator = parser - | [< 'Token.Unary >] -> "unary", 1 - | [< 'Token.Binary >] -> "binary", 2 - in - let parse_binary_precedence = parser - | [< 'Token.Number n >] -> int_of_float n - | [< >] -> 30 - in - parser - | [< 'Token.Ident id; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - (* success. *) - Ast.Prototype (id, Array.of_list (List.rev args)) - | [< (prefix, kind)=parse_operator; - 'Token.Kwd op ?? "expected an operator"; - (* Read the precedence if present. *) - binary_precedence=parse_binary_precedence; - 'Token.Kwd '(' ?? "expected '(' in prototype"; - args=parse_args []; - 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> - let name = prefix ^ (String.make 1 op) in - let args = Array.of_list (List.rev args) in - - (* Verify right number of arguments for operator. *) - if Array.length args != kind - then raise (Stream.Error "invalid number of operands for operator") - else - if kind == 1 then - Ast.Prototype (name, args) - else - Ast.BinOpPrototype (name, args, binary_precedence) - | [< >] -> - raise (Stream.Error "expected function name in prototype") - -(* definition ::= 'def' prototype expression *) -let parse_definition = parser - | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> - Ast.Function (p, e) - -(* toplevelexpr ::= expression *) -let parse_toplevel = parser - | [< e=parse_expr >] -> - (* Make an anonymous proto. *) - Ast.Function (Ast.Prototype ("", [||]), e) - -(* external ::= 'extern' prototype *) -let parse_extern = parser - | [< 'Token.Extern; e=parse_prototype >] -> e -</pre> -</dd> - -<dt>codegen.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Code Generation - *===----------------------------------------------------------------------===*) - -open Llvm - -exception Error of string - -let context = global_context () -let the_module = create_module context "my cool jit" -let builder = builder context -let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 -let double_type = double_type context - -(* Create an alloca instruction in the entry block of the function. This - * is used for mutable variables etc. *) -let create_entry_block_alloca the_function var_name = - let builder = builder_at context (instr_begin (entry_block the_function)) in - build_alloca double_type var_name builder - -let rec codegen_expr = function - | Ast.Number n -> const_float double_type n - | Ast.Variable name -> - let v = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - (* Load the value. *) - build_load v name builder - | Ast.Unary (op, operand) -> - let operand = codegen_expr operand in - let callee = "unary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown unary operator") - in - build_call callee [|operand|] "unop" builder - | Ast.Binary (op, lhs, rhs) -> - begin match op with - | '=' -> - (* Special case '=' because we don't want to emit the LHS as an - * expression. *) - let name = - match lhs with - | Ast.Variable name -> name - | _ -> raise (Error "destination of '=' must be a variable") - in - - (* Codegen the rhs. *) - let val_ = codegen_expr rhs in - - (* Lookup the name. *) - let variable = try Hashtbl.find named_values name with - | Not_found -> raise (Error "unknown variable name") - in - ignore(build_store val_ variable builder); - val_ - | _ -> - let lhs_val = codegen_expr lhs in - let rhs_val = codegen_expr rhs in - begin - match op with - | '+' -> build_add lhs_val rhs_val "addtmp" builder - | '-' -> build_sub lhs_val rhs_val "subtmp" builder - | '*' -> build_mul lhs_val rhs_val "multmp" builder - | '<' -> - (* Convert bool 0/1 to double 0.0 or 1.0 *) - let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in - build_uitofp i double_type "booltmp" builder - | _ -> - (* If it wasn't a builtin binary operator, it must be a user defined - * one. Emit a call to it. *) - let callee = "binary" ^ (String.make 1 op) in - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "binary operator not found!") - in - build_call callee [|lhs_val; rhs_val|] "binop" builder - end - end - | Ast.Call (callee, args) -> - (* Look up the name in the module table. *) - let callee = - match lookup_function callee the_module with - | Some callee -> callee - | None -> raise (Error "unknown function referenced") - in - let params = params callee in - - (* If argument mismatch error. *) - if Array.length params == Array.length args then () else - raise (Error "incorrect # arguments passed"); - let args = Array.map codegen_expr args in - build_call callee args "calltmp" builder - | Ast.If (cond, then_, else_) -> - let cond = codegen_expr cond in - - (* Convert condition to a bool by comparing equal to 0.0 *) - let zero = const_float double_type 0.0 in - let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in - - (* Grab the first block so that we might later add the conditional branch - * to it at the end of the function. *) - let start_bb = insertion_block builder in - let the_function = block_parent start_bb in - - let then_bb = append_block context "then" the_function in - - (* Emit 'then' value. *) - position_at_end then_bb builder; - let then_val = codegen_expr then_ in - - (* Codegen of 'then' can change the current block, update then_bb for the - * phi. We create a new name because one is used for the phi node, and the - * other is used for the conditional branch. *) - let new_then_bb = insertion_block builder in - - (* Emit 'else' value. *) - let else_bb = append_block context "else" the_function in - position_at_end else_bb builder; - let else_val = codegen_expr else_ in - - (* Codegen of 'else' can change the current block, update else_bb for the - * phi. *) - let new_else_bb = insertion_block builder in - - (* Emit merge block. *) - let merge_bb = append_block context "ifcont" the_function in - position_at_end merge_bb builder; - let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in - let phi = build_phi incoming "iftmp" builder in - - (* Return to the start block to add the conditional branch. *) - position_at_end start_bb builder; - ignore (build_cond_br cond_val then_bb else_bb builder); - - (* Set a unconditional branch at the end of the 'then' block and the - * 'else' block to the 'merge' block. *) - position_at_end new_then_bb builder; ignore (build_br merge_bb builder); - position_at_end new_else_bb builder; ignore (build_br merge_bb builder); - - (* Finally, set the builder to the end of the merge block. *) - position_at_end merge_bb builder; - - phi - | Ast.For (var_name, start, end_, step, body) -> - (* Output this as: - * var = alloca double - * ... - * start = startexpr - * store start -> var - * goto loop - * loop: - * ... - * bodyexpr - * ... - * loopend: - * step = stepexpr - * endcond = endexpr - * - * curvar = load var - * nextvar = curvar + step - * store nextvar -> var - * br endcond, loop, endloop - * outloop: *) - - let the_function = block_parent (insertion_block builder) in - - (* Create an alloca for the variable in the entry block. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Emit the start code first, without 'variable' in scope. *) - let start_val = codegen_expr start in - - (* Store the value into the alloca. *) - ignore(build_store start_val alloca builder); - - (* Make the new basic block for the loop header, inserting after current - * block. *) - let loop_bb = append_block context "loop" the_function in - - (* Insert an explicit fall through from the current block to the - * loop_bb. *) - ignore (build_br loop_bb builder); - - (* Start insertion in loop_bb. *) - position_at_end loop_bb builder; - - (* Within the loop, the variable is defined equal to the PHI node. If it - * shadows an existing variable, we have to restore it, so save it - * now. *) - let old_val = - try Some (Hashtbl.find named_values var_name) with Not_found -> None - in - Hashtbl.add named_values var_name alloca; - - (* Emit the body of the loop. This, like any other expr, can change the - * current BB. Note that we ignore the value computed by the body, but - * don't allow an error *) - ignore (codegen_expr body); - - (* Emit the step value. *) - let step_val = - match step with - | Some step -> codegen_expr step - (* If not specified, use 1.0. *) - | None -> const_float double_type 1.0 - in - - (* Compute the end condition. *) - let end_cond = codegen_expr end_ in - - (* Reload, increment, and restore the alloca. This handles the case where - * the body of the loop mutates the variable. *) - let cur_var = build_load alloca var_name builder in - let next_var = build_add cur_var step_val "nextvar" builder in - ignore(build_store next_var alloca builder); - - (* Convert condition to a bool by comparing equal to 0.0. *) - let zero = const_float double_type 0.0 in - let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in - - (* Create the "after loop" block and insert it. *) - let after_bb = append_block context "afterloop" the_function in - - (* Insert the conditional branch into the end of loop_end_bb. *) - ignore (build_cond_br end_cond loop_bb after_bb builder); - - (* Any new code will be inserted in after_bb. *) - position_at_end after_bb builder; - - (* Restore the unshadowed variable. *) - begin match old_val with - | Some old_val -> Hashtbl.add named_values var_name old_val - | None -> () - end; - - (* for expr always returns 0.0. *) - const_null double_type - | Ast.Var (var_names, body) -> - let old_bindings = ref [] in - - let the_function = block_parent (insertion_block builder) in - - (* Register all variables and emit their initializer. *) - Array.iter (fun (var_name, init) -> - (* Emit the initializer before adding the variable to scope, this - * prevents the initializer from referencing the variable itself, and - * permits stuff like this: - * var a = 1 in - * var a = a in ... # refers to outer 'a'. *) - let init_val = - match init with - | Some init -> codegen_expr init - (* If not specified, use 0.0. *) - | None -> const_float double_type 0.0 - in - - let alloca = create_entry_block_alloca the_function var_name in - ignore(build_store init_val alloca builder); - - (* Remember the old variable binding so that we can restore the binding - * when we unrecurse. *) - begin - try - let old_value = Hashtbl.find named_values var_name in - old_bindings := (var_name, old_value) :: !old_bindings; - with Not_found -> () - end; - - (* Remember this binding. *) - Hashtbl.add named_values var_name alloca; - ) var_names; - - (* Codegen the body, now that all vars are in scope. *) - let body_val = codegen_expr body in - - (* Pop all our variables from scope. *) - List.iter (fun (var_name, old_value) -> - Hashtbl.add named_values var_name old_value - ) !old_bindings; - - (* Return the body computation. *) - body_val - -let codegen_proto = function - | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> - (* Make the function type: double(double,double) etc. *) - let doubles = Array.make (Array.length args) double_type in - let ft = function_type double_type doubles in - let f = - match lookup_function name the_module with - | None -> declare_function name ft the_module - - (* If 'f' conflicted, there was already something named 'name'. If it - * has a body, don't allow redefinition or reextern. *) - | Some f -> - (* If 'f' already has a body, reject this. *) - if block_begin f <> At_end f then - raise (Error "redefinition of function"); - - (* If 'f' took a different number of arguments, reject. *) - if element_type (type_of f) <> ft then - raise (Error "redefinition of function with different # args"); - f - in - - (* Set names for all arguments. *) - Array.iteri (fun i a -> - let n = args.(i) in - set_value_name n a; - Hashtbl.add named_values n a; - ) (params f); - f - -(* Create an alloca for each argument and register the argument in the symbol - * table so that references to it will succeed. *) -let create_argument_allocas the_function proto = - let args = match proto with - | Ast.Prototype (_, args) | Ast.BinOpPrototype (_, args, _) -> args - in - Array.iteri (fun i ai -> - let var_name = args.(i) in - (* Create an alloca for this variable. *) - let alloca = create_entry_block_alloca the_function var_name in - - (* Store the initial value into the alloca. *) - ignore(build_store ai alloca builder); - - (* Add arguments to variable symbol table. *) - Hashtbl.add named_values var_name alloca; - ) (params the_function) - -let codegen_func the_fpm = function - | Ast.Function (proto, body) -> - Hashtbl.clear named_values; - let the_function = codegen_proto proto in - - (* If this is an operator, install it. *) - begin match proto with - | Ast.BinOpPrototype (name, args, prec) -> - let op = name.[String.length name - 1] in - Hashtbl.add Parser.binop_precedence op prec; - | _ -> () - end; - - (* Create a new basic block to start insertion into. *) - let bb = append_block context "entry" the_function in - position_at_end bb builder; - - try - (* Add all arguments to the symbol table and create their allocas. *) - create_argument_allocas the_function proto; - - let ret_val = codegen_expr body in - - (* Finish off the function. *) - let _ = build_ret ret_val builder in - - (* Validate the generated code, checking for consistency. *) - Llvm_analysis.assert_valid_function the_function; - - (* Optimize the function. *) - let _ = PassManager.run_function the_function the_fpm in - - the_function - with e -> - delete_function the_function; - raise e -</pre> -</dd> - -<dt>toplevel.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Top-Level parsing and JIT Driver - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine - -(* top ::= definition | external | expression | ';' *) -let rec main_loop the_fpm the_execution_engine stream = - match Stream.peek stream with - | None -> () - - (* ignore top-level semicolons. *) - | Some (Token.Kwd ';') -> - Stream.junk stream; - main_loop the_fpm the_execution_engine stream - - | Some token -> - begin - try match token with - | Token.Def -> - let e = Parser.parse_definition stream in - print_endline "parsed a function definition."; - dump_value (Codegen.codegen_func the_fpm e); - | Token.Extern -> - let e = Parser.parse_extern stream in - print_endline "parsed an extern."; - dump_value (Codegen.codegen_proto e); - | _ -> - (* Evaluate a top-level expression into an anonymous function. *) - let e = Parser.parse_toplevel stream in - print_endline "parsed a top-level expr"; - let the_function = Codegen.codegen_func the_fpm e in - dump_value the_function; - - (* JIT the function, returning a function pointer. *) - let result = ExecutionEngine.run_function the_function [||] - the_execution_engine in - - print_string "Evaluated to "; - print_float (GenericValue.as_float Codegen.double_type result); - print_newline (); - with Stream.Error s | Codegen.Error s -> - (* Skip token for error recovery. *) - Stream.junk stream; - print_endline s; - end; - print_string "ready> "; flush stdout; - main_loop the_fpm the_execution_engine stream -</pre> -</dd> - -<dt>toy.ml:</dt> -<dd class="doc_code"> -<pre> -(*===----------------------------------------------------------------------=== - * Main driver code. - *===----------------------------------------------------------------------===*) - -open Llvm -open Llvm_executionengine -open Llvm_target -open Llvm_scalar_opts - -let main () = - ignore (initialize_native_target ()); - - (* Install standard binary operators. - * 1 is the lowest precedence. *) - Hashtbl.add Parser.binop_precedence '=' 2; - Hashtbl.add Parser.binop_precedence '<' 10; - Hashtbl.add Parser.binop_precedence '+' 20; - Hashtbl.add Parser.binop_precedence '-' 20; - Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) - - (* Prime the first token. *) - print_string "ready> "; flush stdout; - let stream = Lexer.lex (Stream.of_channel stdin) in - - (* Create the JIT. *) - let the_execution_engine = ExecutionEngine.create Codegen.the_module in - let the_fpm = PassManager.create_function Codegen.the_module in - - (* Set up the optimizer pipeline. Start with registering info about how the - * target lays out data structures. *) - TargetData.add (ExecutionEngine.target_data the_execution_engine) the_fpm; - - (* Promote allocas to registers. *) - add_memory_to_register_promotion the_fpm; - - (* Do simple "peephole" optimizations and bit-twiddling optzn. *) - add_instruction_combination the_fpm; - - (* reassociate expressions. *) - add_reassociation the_fpm; - - (* Eliminate Common SubExpressions. *) - add_gvn the_fpm; - - (* Simplify the control flow graph (deleting unreachable blocks, etc). *) - add_cfg_simplification the_fpm; - - ignore (PassManager.initialize the_fpm); - - (* Run the main "interpreter loop" now. *) - Toplevel.main_loop the_fpm the_execution_engine stream; - - (* Print out all the generated code. *) - dump_module Codegen.the_module -;; - -main () -</pre> -</dd> - -<dt>bindings.c</dt> -<dd class="doc_code"> -<pre> -#include <stdio.h> - -/* putchard - putchar that takes a double and returns 0. */ -extern double putchard(double X) { - putchar((char)X); - return 0; -} - -/* printd - printf that takes a double prints it as "%f\n", returning 0. */ -extern double printd(double X) { - printf("%f\n", X); - return 0; -} -</pre> -</dd> -</dl> - -<a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a> -</div> - -<!-- *********************************************************************** --> -<hr> -<address> - <a href="http://jigsaw.w3.org/css-validator/check/referer"><img - src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a> - <a href="http://validator.w3.org/check/referer"><img - src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a> - - <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> - <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br> - <a href="mailto:idadesub@users.sourceforge.net">Erick Tryzelaar</a><br> - Last modified: $Date$ -</address> -</body> -</html> diff --git a/docs/tutorial/index.html b/docs/tutorial/index.html deleted file mode 100644 index 250b533..0000000 --- a/docs/tutorial/index.html +++ /dev/null @@ -1,48 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <title>LLVM Tutorial: Table of Contents</title> - <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> - <meta name="author" content="Owen Anderson"> - <meta name="description" - content="LLVM Tutorial: Table of Contents."> - <link rel="stylesheet" href="../llvm.css" type="text/css"> -</head> - -<body> - -<div class="doc_title"> LLVM Tutorial: Table of Contents </div> - -<ol> - <li>Kaleidoscope: Implementing a Language with LLVM - <ol> - <li><a href="LangImpl1.html">Tutorial Introduction and the Lexer</a></li> - <li><a href="LangImpl2.html">Implementing a Parser and AST</a></li> - <li><a href="LangImpl3.html">Implementing Code Generation to LLVM IR</a></li> - <li><a href="LangImpl4.html">Adding JIT and Optimizer Support</a></li> - <li><a href="LangImpl5.html">Extending the language: control flow</a></li> - <li><a href="LangImpl6.html">Extending the language: user-defined operators</a></li> - <li><a href="LangImpl7.html">Extending the language: mutable variables / SSA construction</a></li> - <li><a href="LangImpl8.html">Conclusion and other useful LLVM tidbits</a></li> - </ol></li> - <li>Kaleidoscope: Implementing a Language with LLVM in Objective Caml - <ol> - <li><a href="OCamlLangImpl1.html">Tutorial Introduction and the Lexer</a></li> - <li><a href="OCamlLangImpl2.html">Implementing a Parser and AST</a></li> - <li><a href="OCamlLangImpl3.html">Implementing Code Generation to LLVM IR</a></li> - <li><a href="OCamlLangImpl4.html">Adding JIT and Optimizer Support</a></li> - <li><a href="OCamlLangImpl5.html">Extending the language: control flow</a></li> - <li><a href="OCamlLangImpl6.html">Extending the language: user-defined operators</a></li> - <li><a href="OCamlLangImpl7.html">Extending the language: mutable variables / SSA construction</a></li> - <li><a href="LangImpl8.html">Conclusion and other useful LLVM tidbits</a></li> - </ol></li> - <li>Advanced Topics - <ol> - <li><a href="http://llvm.org/pubs/2004-09-22-LCPCLLVMTutorial.html">Writing - an Optimization for LLVM</a></li> - </ol></li> -</ol> - -</body> -</html> |