aboutsummaryrefslogtreecommitdiffstats
path: root/docs/HistoricalNotes
diff options
context:
space:
mode:
authormike-m <mikem.llvm@gmail.com>2010-05-07 00:28:04 +0000
committermike-m <mikem.llvm@gmail.com>2010-05-07 00:28:04 +0000
commite2c3a49c8029ebd9ef530101cc24c66562e3dff5 (patch)
tree91bf9600cc8df90cf99751a8f8bafc317cffc91e /docs/HistoricalNotes
parentc10b5afbe8138b0fdf3af4ed3e1ddf96cf3cb4cb (diff)
downloadexternal_llvm-e2c3a49c8029ebd9ef530101cc24c66562e3dff5.zip
external_llvm-e2c3a49c8029ebd9ef530101cc24c66562e3dff5.tar.gz
external_llvm-e2c3a49c8029ebd9ef530101cc24c66562e3dff5.tar.bz2
Revert r103213. It broke several sections of live website.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103219 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/HistoricalNotes')
-rw-r--r--docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt74
-rw-r--r--docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt199
-rw-r--r--docs/HistoricalNotes/2000-12-06-EncodingIdea.txt30
-rw-r--r--docs/HistoricalNotes/2000-12-06-MeetingSummary.txt83
-rw-r--r--docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt39
-rw-r--r--docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt67
-rw-r--r--docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt75
-rw-r--r--docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt53
-rw-r--r--docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt89
-rw-r--r--docs/HistoricalNotes/2001-02-09-AdveComments.txt120
-rw-r--r--docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt245
-rw-r--r--docs/HistoricalNotes/2001-02-13-Reference-Memory.txt39
-rw-r--r--docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt47
-rw-r--r--docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt49
-rw-r--r--docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt202
-rw-r--r--docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt45
-rw-r--r--docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt63
-rw-r--r--docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt71
-rw-r--r--docs/HistoricalNotes/2001-06-20-.NET-Differences.txt30
-rw-r--r--docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt31
-rw-r--r--docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt56
-rw-r--r--docs/HistoricalNotes/2002-05-12-InstListChange.txt55
-rw-r--r--docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt72
-rw-r--r--docs/HistoricalNotes/2003-01-23-CygwinNotes.txt28
-rw-r--r--docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt137
-rw-r--r--docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt110
-rw-r--r--docs/HistoricalNotes/2007-OriginalClangReadme.txt178
27 files changed, 2287 insertions, 0 deletions
diff --git a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt
new file mode 100644
index 0000000..f086181
--- /dev/null
+++ b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt
@@ -0,0 +1,74 @@
+Date: Sat, 18 Nov 2000 09:19:35 -0600 (CST)
+From: Vikram Adve <vadve@cs.uiuc.edu>
+To: Chris Lattner <lattner@cs.uiuc.edu>
+Subject: a few thoughts
+
+I've been mulling over the virtual machine problem and I had some
+thoughts about some things for us to think about discuss:
+
+1. We need to be clear on our goals for the VM. Do we want to emphasize
+ portability and safety like the Java VM? Or shall we focus on the
+ architecture interface first (i.e., consider the code generation and
+ processor issues), since the architecture interface question is also
+ important for portable Java-type VMs?
+
+ This is important because the audiences for these two goals are very
+ different. Architects and many compiler people care much more about
+ the second question. The Java compiler and OS community care much more
+ about the first one.
+
+ Also, while the architecture interface question is important for
+ Java-type VMs, the design constraints are very different.
+
+
+2. Design issues to consider (an initial list that we should continue
+ to modify). Note that I'm not trying to suggest actual solutions here,
+ but just various directions we can pursue:
+
+ a. A single-assignment VM, which we've both already been thinking about.
+
+ b. A strongly-typed VM. One question is do we need the types to be
+ explicitly declared or should they be inferred by the dynamic compiler?
+
+ c. How do we get more high-level information into the VM while keeping
+ to a low-level VM design?
+
+ o Explicit array references as operands? An alternative is
+ to have just an array type, and let the index computations be
+ separate 3-operand instructions.
+
+ o Explicit instructions to handle aliasing, e.g.s:
+ -- an instruction to say "I speculate that these two values are not
+ aliased, but check at runtime", like speculative execution in
+ EPIC?
+ -- or an instruction to check whether two values are aliased and
+ execute different code depending on the answer, somewhat like
+ predicated code in EPIC
+
+ o (This one is a difficult but powerful idea.)
+ A "thread-id" field on every instruction that allows the static
+ compiler to generate a set of parallel threads, and then have
+ the runtime compiler and hardware do what they please with it.
+ This has very powerful uses, but thread-id on every instruction
+ is expensive in terms of instruction size and code size.
+ We would need to compactly encode it somehow.
+
+ Also, this will require some reading on at least two other
+ projects:
+ -- Multiscalar architecture from Wisconsin
+ -- Simultaneous multithreading architecture from Washington
+
+ o Or forget all this and stick to a traditional instruction set?
+
+
+BTW, on an unrelated note, after the meeting yesterday, I did remember
+that you had suggested doing instruction scheduling on SSA form instead
+of a dependence DAG earlier in the semester. When we talked about
+it yesterday, I didn't remember where the idea had come from but I
+remembered later. Just giving credit where its due...
+
+Perhaps you can save the above as a file under RCS so you and I can
+continue to expand on this.
+
+--Vikram
+
diff --git a/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt
new file mode 100644
index 0000000..1c725f5
--- /dev/null
+++ b/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt
@@ -0,0 +1,199 @@
+Date: Sun, 19 Nov 2000 16:23:57 -0600 (CST)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram Adve <vadve@cs.uiuc.edu>
+Subject: Re: a few thoughts
+
+Okay... here are a few of my thoughts on this (it's good to know that we
+think so alike!):
+
+> 1. We need to be clear on our goals for the VM. Do we want to emphasize
+> portability and safety like the Java VM? Or shall we focus on the
+> architecture interface first (i.e., consider the code generation and
+> processor issues), since the architecture interface question is also
+> important for portable Java-type VMs?
+
+I forsee the architecture looking kinda like this: (which is completely
+subject to change)
+
+1. The VM code is NOT guaranteed safe in a java sense. Doing so makes it
+ basically impossible to support C like languages. Besides that,
+ certifying a register based language as safe at run time would be a
+ pretty expensive operation to have to do. Additionally, we would like
+ to be able to statically eliminate many bounds checks in Java
+ programs... for example.
+
+ 2. Instead, we can do the following (eventually):
+ * Java bytecode is used as our "safe" representation (to avoid
+ reinventing something that we don't add much value to). When the
+ user chooses to execute Java bytecodes directly (ie, not
+ precompiled) the runtime compiler can do some very simple
+ transformations (JIT style) to convert it into valid input for our
+ VM. Performance is not wonderful, but it works right.
+ * The file is scheduled to be compiled (rigorously) at a later
+ time. This could be done by some background process or by a second
+ processor in the system during idle time or something...
+ * To keep things "safe" ie to enforce a sandbox on Java/foreign code,
+ we could sign the generated VM code with a host specific private
+ key. Then before the code is executed/loaded, we can check to see if
+ the trusted compiler generated the code. This would be much quicker
+ than having to validate consistency (especially if bounds checks have
+ been removed, for example)
+
+> This is important because the audiences for these two goals are very
+> different. Architects and many compiler people care much more about
+> the second question. The Java compiler and OS community care much more
+> about the first one.
+
+3. By focusing on a more low level virtual machine, we have much more room
+ for value add. The nice safe "sandbox" VM can be provided as a layer
+ on top of it. It also lets us focus on the more interesting compilers
+ related projects.
+
+> 2. Design issues to consider (an initial list that we should continue
+> to modify). Note that I'm not trying to suggest actual solutions here,
+> but just various directions we can pursue:
+
+Understood. :)
+
+> a. A single-assignment VM, which we've both already been thinking
+> about.
+
+Yup, I think that this makes a lot of sense. I am still intrigued,
+however, by the prospect of a minimally allocated VM representation... I
+think that it could have definate advantages for certain applications
+(think very small machines, like PDAs). I don't, however, think that our
+initial implementations should focus on this. :)
+
+Here are some other auxilliary goals that I think we should consider:
+
+1. Primary goal: Support a high performance dynamic compilation
+ system. This means that we have an "ideal" division of labor between
+ the runtime and static compilers. Of course, the other goals of the
+ system somewhat reduce the importance of this point (f.e. portability
+ reduces performance, but hopefully not much)
+2. Portability to different processors. Since we are most familiar with
+ x86 and solaris, I think that these two are excellent candidates when
+ we get that far...
+3. Support for all languages & styles of programming (general purpose
+ VM). This is the point that disallows java style bytecodes, where all
+ array refs are checked for bounds, etc...
+4. Support linking between different language families. For example, call
+ C functions directly from Java without using the nasty/slow/gross JNI
+ layer. This involves several subpoints:
+ A. Support for languages that require garbage collectors and integration
+ with languages that don't. As a base point, we could insist on
+ always using a conservative GC, but implement free as a noop, f.e.
+
+> b. A strongly-typed VM. One question is do we need the types to be
+> explicitly declared or should they be inferred by the dynamic
+> compiler?
+
+ B. This is kind of similar to another idea that I have: make OOP
+ constructs (virtual function tables, class heirarchies, etc) explicit
+ in the VM representation. I believe that the number of additional
+ constructs would be fairly low, but would give us lots of important
+ information... something else that would/could be important is to
+ have exceptions as first class types so that they would be handled in
+ a uniform way for the entire VM... so that C functions can call Java
+ functions for example...
+
+> c. How do we get more high-level information into the VM while keeping
+> to a low-level VM design?
+> o Explicit array references as operands? An alternative is
+> to have just an array type, and let the index computations be
+> separate 3-operand instructions.
+
+ C. In the model I was thinking of (subject to change of course), we
+ would just have an array type (distinct from the pointer
+ types). This would allow us to have arbitrarily complex index
+ expressions, while still distinguishing "load" from "Array load",
+ for example. Perhaps also, switch jump tables would be first class
+ types as well? This would allow better reasoning about the program.
+
+5. Support dynamic loading of code from various sources. Already
+ mentioned above was the example of loading java bytecodes, but we want
+ to support dynamic loading of VM code as well. This makes the job of
+ the runtime compiler much more interesting: it can do interprocedural
+ optimizations that the static compiler can't do, because it doesn't
+ have all of the required information (for example, inlining from
+ shared libraries, etc...)
+
+6. Define a set of generally useful annotations to add to the VM
+ representation. For example, a function can be analysed to see if it
+ has any sideeffects when run... also, the MOD/REF sets could be
+ calculated, etc... we would have to determine what is reasonable. This
+ would generally be used to make IP optimizations cheaper for the
+ runtime compiler...
+
+> o Explicit instructions to handle aliasing, e.g.s:
+> -- an instruction to say "I speculate that these two values are not
+> aliased, but check at runtime", like speculative execution in
+> EPIC?
+> -- or an instruction to check whether two values are aliased and
+> execute different code depending on the answer, somewhat like
+> predicated code in EPIC
+
+These are also very good points... if this can be determined at compile
+time. I think that an epic style of representation (not the instruction
+packing, just the information presented) could be a very interesting model
+to use... more later...
+
+> o (This one is a difficult but powerful idea.)
+> A "thread-id" field on every instruction that allows the static
+> compiler to generate a set of parallel threads, and then have
+> the runtime compiler and hardware do what they please with it.
+> This has very powerful uses, but thread-id on every instruction
+> is expensive in terms of instruction size and code size.
+> We would need to compactly encode it somehow.
+
+Yes yes yes! :) I think it would be *VERY* useful to include this kind
+of information (which EPIC architectures *implicitly* encode. The trend
+that we are seeing supports this greatly:
+
+1. Commodity processors are getting massive SIMD support:
+ * Intel/Amd MMX/MMX2
+ * AMD's 3Dnow!
+ * Intel's SSE/SSE2
+ * Sun's VIS
+2. SMP is becoming much more common, especially in the server space.
+3. Multiple processors on a die are right around the corner.
+
+If nothing else, not designing this in would severely limit our future
+expansion of the project...
+
+> Also, this will require some reading on at least two other
+> projects:
+> -- Multiscalar architecture from Wisconsin
+> -- Simultaneous multithreading architecture from Washington
+>
+> o Or forget all this and stick to a traditional instruction set?
+
+Heh... :) Well, from a pure research point of view, it is almost more
+attactive to go with the most extreme/different ISA possible. On one axis
+you get safety and conservatism, and on the other you get degree of
+influence that the results have. Of course the problem with pure research
+is that often times there is no concrete product of the research... :)
+
+> BTW, on an unrelated note, after the meeting yesterday, I did remember
+> that you had suggested doing instruction scheduling on SSA form instead
+> of a dependence DAG earlier in the semester. When we talked about
+> it yesterday, I didn't remember where the idea had come from but I
+> remembered later. Just giving credit where its due...
+
+:) Thanks.
+
+> Perhaps you can save the above as a file under RCS so you and I can
+> continue to expand on this.
+
+I think it makes sense to do so when we get our ideas more formalized and
+bounce it back and forth a couple of times... then I'll do a more formal
+writeup of our goals and ideas. Obviously our first implementation will
+not want to do all of the stuff that I pointed out above... be we will
+want to design the project so that we do not artificially limit ourselves
+at sometime in the future...
+
+Anyways, let me know what you think about these ideas... and if they sound
+reasonable...
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt b/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt
new file mode 100644
index 0000000..8c45292
--- /dev/null
+++ b/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt
@@ -0,0 +1,30 @@
+From: Chris Lattner [mailto:sabre@nondot.org]
+Sent: Wednesday, December 06, 2000 6:41 PM
+To: Vikram S. Adve
+Subject: Additional idea with respect to encoding
+
+Here's another idea with respect to keeping the common case instruction
+size down (less than 32 bits ideally):
+
+Instead of encoding an instruction to operate on two register numbers,
+have it operate on two negative offsets based on the current register
+number. Therefore, instead of using:
+
+r57 = add r55, r56 (r57 is the implicit dest register, of course)
+
+We could use:
+
+r57 = add -2, -1
+
+My guess is that most SSA references are to recent values (especially if
+they correspond to expressions like (x+y*z+p*q/ ...), so the negative
+numbers would tend to stay small, even at the end of the procedure (where
+the implicit register destination number could be quite large). Of course
+the negative sign is reduntant, so you would be storing small integers
+almost all of the time, and 5-6 bits worth of register number would be
+plenty for most cases...
+
+What do you think?
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt b/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt
new file mode 100644
index 0000000..b66e185
--- /dev/null
+++ b/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt
@@ -0,0 +1,83 @@
+SUMMARY
+-------
+
+We met to discuss the LLVM instruction format and bytecode representation:
+
+ISSUES RESOLVED
+---------------
+
+1. We decided that we shall use a flat namespace to represent our
+ variables in SSA form, as opposed to having a two dimensional namespace
+ of the original variable and the SSA instance subscript.
+
+ARGUMENT AGAINST:
+ * A two dimensional namespace would be valuable when doing alias
+ analysis because the extra information can help limit the scope of
+ analysis.
+
+ARGUMENT FOR:
+ * Including this information would require that all users of the LLVM
+ bytecode would have to parse and handle it. This would slow down the
+ common case and inflate the instruction representation with another
+ infinite variable space.
+
+REASONING:
+ * It was decided that because original variable sources could be
+ reconstructed from SSA form in linear time, that it would be an
+ unjustified expense for the common case to include the extra
+ information for one optimization. Alias analysis itself is typically
+ greater than linear in asymptotic complexity, so this extra analaysis
+ would not affect the runtime of the optimization in a significant
+ way. Additionally, this would be an unlikely optimization to do at
+ runtime.
+
+
+IDEAS TO CONSIDER
+-----------------
+
+1. Including dominator information in the LLVM bytecode
+ representation. This is one example of an analysis result that may be
+ packaged with the bytecodes themselves. As a conceptual implementation
+ idea, we could include an immediate dominator number for each basic block
+ in the LLVM bytecode program. Basic blocks could be numbered according
+ to the order of occurance in the bytecode representation.
+
+2. Including loop header and body information. This would facilitate
+ detection of intervals and natural loops.
+
+UNRESOLVED ISSUES
+-----------------
+
+1. Will oSUIF provide enough of an infrastructure to support the research
+ that we will be doing? We know that it has less than stellar
+ performance, but hope that this will be of little importance for our
+ static compiler. This could affect us if we decided to do some IP
+ research. Also we do not yet understand the level of exception support
+ currently implemented.
+
+2. Should we consider the requirements of a direct hardware implementation
+ of the LLVM when we design it? If so, several design issues should
+ have their priorities shifted. The other option is to focus on a
+ software layer interpreting the LLVM in all cases.
+
+3. Should we use some form of packetized format to improve forward
+ compatibility? For example, we could design the system to encode a
+ packet type and length field before analysis information, to allow a
+ runtime to skip information that it didn't understand in a bytecode
+ stream. The obvious benefit would be for compatibility, the drawback
+ is that it would tend to splinter that 'standard' LLVM definition.
+
+4. Should we use fixed length instructions or variable length
+ instructions? Fetching variable length instructions is expensive (for
+ either hardware or software based LLVM runtimes), but we have several
+ 'infinite' spaces that instructions operate in (SSA register numbers,
+ type spaces, or packet length [if packets were implemented]). Several
+ options were mentioned including:
+ A. Using 16 or 32 bit numbers, which would be 'big enough'
+ B. A scheme similar to how UTF-8 works, to encode infinite numbers
+ while keeping small number small.
+ C. Use something similar to Huffman encoding, so that the most common
+ numbers are the smallest.
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt b/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt
new file mode 100644
index 0000000..111706a
--- /dev/null
+++ b/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt
@@ -0,0 +1,39 @@
+Date: Wed, 31 Jan 2001 12:04:33 -0600
+From: Vikram S. Adve <vadve@cs.uiuc.edu>
+To: Chris Lattner <lattner@cs.uiuc.edu>
+Subject: another thought
+
+I have a budding idea about making LLVM a little more ambitious: a
+customizable runtime system that can be used to implement language-specific
+virtual machines for many different languages. E.g., a C vm, a C++ vm, a
+Java vm, a Lisp vm, ..
+
+The idea would be that LLVM would provide a standard set of runtime features
+(some low-level like standard assembly instructions with code generation and
+static and runtime optimization; some higher-level like type-safety and
+perhaps a garbage collection library). Each language vm would select the
+runtime features needed for that language, extending or customizing them as
+needed. Most of the machine-dependent code-generation and optimization
+features as well as low-level machine-independent optimizations (like PRE)
+could be provided by LLVM and should be sufficient for any language,
+simplifying the language compiler. (This would also help interoperability
+between languages.) Also, some or most of the higher-level
+machine-independent features like type-safety and access safety should be
+reusable by different languages, with minor extensions. The language
+compiler could then focus on language-specific analyses and optimizations.
+
+The risk is that this sounds like a universal IR -- something that the
+compiler community has tried and failed to develop for decades, and is
+universally skeptical about. No matter what we say, we won't be able to
+convince anyone that we have a universal IR that will work. We need to
+think about whether LLVM is different or if has something novel that might
+convince people. E.g., the idea of providing a package of separable
+features that different languages select from. Also, using SSA with or
+without type-safety as the intermediate representation.
+
+One interesting starting point would be to discuss how a JVM would be
+implemented on top of LLVM a bit more. That might give us clues on how to
+structure LLVM to support one or more language VMs.
+
+--Vikram
+
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt
new file mode 100644
index 0000000..c09cf1f
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt
@@ -0,0 +1,67 @@
+Date: Tue, 6 Feb 2001 20:27:37 -0600 (CST)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: Type notation debate...
+
+This is the way that I am currently planning on implementing types:
+
+Primitive Types:
+type ::= void|bool|sbyte|ubyte|short|ushort|int|uint|long|ulong
+
+Method:
+typelist ::= typelisth | /*empty*/
+typelisth ::= type | typelisth ',' type
+type ::= type (typelist)
+
+Arrays (without and with size):
+type ::= '[' type ']' | '[' INT ',' type ']'
+
+Pointer:
+type ::= type '*'
+
+Structure:
+type ::= '{' typelist '}'
+
+Packed:
+type ::= '<' INT ',' type '>'
+
+Simple examples:
+
+[[ %4, int ]] - array of (array of 4 (int))
+[ { int, int } ] - Array of structure
+[ < %4, int > ] - Array of 128 bit SIMD packets
+int (int, [[int, %4]]) - Method taking a 2d array and int, returning int
+
+
+Okay before you comment, please look at:
+
+http://www.research.att.com/~bs/devXinterview.html
+
+Search for "In another interview, you defined the C declarator syntax as
+an experiment that failed. However, this syntactic construct has been
+around for 27 years and perhaps more; why do you consider it problematic
+(except for its cumbersome syntax)?" and read that response for me. :)
+
+Now with this syntax, his example would be represented as:
+
+[ %10, bool (int, int) * ] *
+
+vs
+
+bool (*(*)[10])(int, int)
+
+in C.
+
+Basically, my argument for this type construction system is that it is
+VERY simple to use and understand (although it IS different than C, it is
+very simple and straightforward, which C is NOT). In fact, I would assert
+that most programmers TODAY do not understand pointers to member
+functions, and have to look up an example when they have to write them.
+
+In my opinion, it is critically important to have clear and concise type
+specifications, because types are going to be all over the programs.
+
+Let me know your thoughts on this. :)
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt
new file mode 100644
index 0000000..8bfefbf
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt
@@ -0,0 +1,75 @@
+Date: Thu, 8 Feb 2001 08:42:04 -0600
+From: Vikram S. Adve <vadve@cs.uiuc.edu>
+To: Chris Lattner <sabre@nondot.org>
+Subject: RE: Type notation debate...
+
+Chris,
+
+> Okay before you comment, please look at:
+>
+> http://www.research.att.com/~bs/devXinterview.html
+
+I read this argument. Even before that, I was already in agreement with you
+and him that the C declarator syntax is difficult and confusing.
+
+But in fact, if you read the entire answer carefully, he came to the same
+conclusion I do: that you have to go with familiar syntax over logical
+syntax because familiarity is such a strong force:
+
+ "However, familiarity is a strong force. To compare, in English, we
+live
+more or less happily with the absurd rules for "to be" (am, are, is, been,
+was, were, ...) and all attempts to simplify are treated with contempt or
+(preferably) humor. It be a curious world and it always beed."
+
+> Basically, my argument for this type construction system is that it is
+> VERY simple to use and understand (although it IS different than C, it is
+> very simple and straightforward, which C is NOT). In fact, I would assert
+> that most programmers TODAY do not understand pointers to member
+> functions, and have to look up an example when they have to write them.
+
+Again, I don't disagree with this at all. But to some extent this
+particular problem is inherently difficult. Your syntax for the above
+example may be easier for you to read because this is the way you have been
+thinking about it. Honestly, I don't find it much easier than the C syntax.
+In either case, I would have to look up an example to write pointers to
+member functions.
+
+But pointers to member functions are nowhere near as common as arrays. And
+the old array syntax:
+ type [ int, int, ...]
+is just much more familiar and clear to people than anything new you
+introduce, no matter how logical it is. Introducing a new syntax that may
+make function pointers easier but makes arrays much more difficult seems
+very risky to me.
+
+> In my opinion, it is critically important to have clear and concise type
+> specifications, because types are going to be all over the programs.
+
+I absolutely agree. But the question is, what is more clear and concise?
+The syntax programmers are used to out of years of experience or a new
+syntax that they have never seen that has a more logical structure. I think
+the answer is the former. Sometimes, you have to give up a better idea
+because you can't overcome sociological barriers to it. Qwerty keyboards
+and Windows are two classic examples of bad technology that are difficult to
+root out.
+
+P.S. Also, while I agree that most your syntax is more logical, there is
+one part that isn't:
+
+Arrays (without and with size):
+type ::= '[' type ']' | '[' INT ',' type ']'.
+
+The arrays with size lists the dimensions and the type in a single list.
+That is just too confusing:
+ [10, 40, int]
+This seems to be a 3-D array where the third dimension is something strange.
+It is too confusing to have a list of 3 things, some of which are dimensions
+and one is a type. Either of the following would be better:
+
+ array [10, 40] of int
+or
+ int [10, 40]
+
+--Vikram
+
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt
new file mode 100644
index 0000000..6e97841
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt
@@ -0,0 +1,53 @@
+Date: Thu, 8 Feb 2001 14:31:05 -0600 (CST)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: RE: Type notation debate...
+
+> Arrays (without and with size):
+> type ::= '[' type ']' | '[' INT ',' type ']'.
+>
+> The arrays with size lists the dimensions and the type in a single list.
+> That is just too confusing:
+
+> [10, 40, int]
+> This seems to be a 3-D array where the third dimension is something strange.
+> It is too confusing to have a list of 3 things, some of which are dimensions
+> and one is a type.
+
+The above grammar indicates that there is only one integer parameter, ie
+the upper bound. The lower bound is always implied to be zero, for
+several reasons:
+
+* As a low level VM, we want to expose addressing computations
+ explicitly. Since the lower bound must always be known in a high level
+ language statically, the language front end can do the translation
+ automatically.
+* This fits more closely with what Java needs, ie what we need in the
+ short term. Java arrays are always zero based.
+
+If a two element list is too confusing, I would recommend an alternate
+syntax of:
+
+type ::= '[' type ']' | '[' INT 'x' type ']'.
+
+For example:
+ [12 x int]
+ [12x int]
+ [ 12 x [ 4x int ]]
+
+Which is syntactically nicer, and more explicit.
+
+> Either of the following would be better:
+> array [10, 40] of int
+
+I considered this approach for arrays in general (ie array of int/ array
+of 12 int), but found that it made declarations WAY too long. Remember
+that because of the nature of llvm, you get a lot of types strewn all over
+the program, and using the 'typedef' like facility is not a wonderful
+option, because then types aren't explicit anymore.
+
+I find this email interesting, because you contradict the previous email
+you sent, where you recommend that we stick to C syntax....
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt
new file mode 100644
index 0000000..7b90327
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt
@@ -0,0 +1,89 @@
+> But in fact, if you read the entire answer carefully, he came to the same
+> conclusion I do: that you have to go with familiar syntax over logical
+> syntax because familiarity is such a strong force:
+> "However, familiarity is a strong force. To compare, in English, we
+live
+> more or less happily with the absurd rules for "to be" (am, are, is, been,
+> was, were, ...) and all attempts to simplify are treated with contempt or
+> (preferably) humor. It be a curious world and it always beed."
+
+Although you have to remember that his situation was considerably
+different than ours. He was in a position where he was designing a high
+level language that had to be COMPATIBLE with C. Our language is such
+that a new person would have to learn the new, different, syntax
+anyways. Making them learn about the type system does not seem like much
+of a stretch from learning the opcodes and how SSA form works, and how
+everything ties together...
+
+> > Basically, my argument for this type construction system is that it is
+> > VERY simple to use and understand (although it IS different than C, it is
+> > very simple and straightforward, which C is NOT). In fact, I would assert
+> > that most programmers TODAY do not understand pointers to member
+> > functions, and have to look up an example when they have to write them.
+
+> Again, I don't disagree with this at all. But to some extent this
+> particular problem is inherently difficult. Your syntax for the above
+> example may be easier for you to read because this is the way you have been
+> thinking about it. Honestly, I don't find it much easier than the C syntax.
+> In either case, I would have to look up an example to write pointers to
+> member functions.
+
+I would argue that because the lexical structure of the language is self
+consistent, any person who spent a significant amount of time programming
+in LLVM directly would understand how to do it without looking it up in a
+manual. The reason this does not work for C is because you rarely have to
+declare these pointers, and the syntax is inconsistent with the method
+declaration and calling syntax.
+
+> But pointers to member functions are nowhere near as common as arrays.
+
+Very true. If you're implementing an object oriented language, however,
+remember that you have to do all the pointer to member function stuff
+yourself.... so everytime you invoke a virtual method one is involved
+(instead of having C++ hide it for you behind "syntactic sugar").
+
+> And the old array syntax:
+> type [ int, int, ...]
+> is just much more familiar and clear to people than anything new you
+> introduce, no matter how logical it is.
+
+Erm... excuse me but how is this the "old array syntax"? If you are
+arguing for consistency with C, you should be asking for 'type int []',
+which is significantly different than the above (beside the above
+introduces a new operator and duplicates information
+needlessly). Basically what I am suggesting is exactly the above without
+the fluff. So instead of:
+
+ type [ int, int, ...]
+
+you use:
+
+ type [ int ]
+
+> Introducing a new syntax that may
+> make function pointers easier but makes arrays much more difficult seems
+> very risky to me.
+
+This is not about function pointers. This is about consistency in the
+type system, and consistency with the rest of the language. The point
+above does not make arrays any more difficult to use, and makes the
+structure of types much more obvious than the "c way".
+
+> > In my opinion, it is critically important to have clear and concise type
+> > specifications, because types are going to be all over the programs.
+>
+> I absolutely agree. But the question is, what is more clear and concise?
+> The syntax programmers are used to out of years of experience or a new
+> syntax that they have never seen that has a more logical structure. I think
+> the answer is the former. Sometimes, you have to give up a better idea
+> because you can't overcome sociological barriers to it. Qwerty keyboards
+> and Windows are two classic examples of bad technology that are difficult to
+> root out.
+
+Very true, but you seem to be advocating a completely different Type
+system than C has, in addition to it not offering the advantages of clear
+structure that the system I recommended does... so you seem to not have a
+problem with changing this, just with what I change it to. :)
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-02-09-AdveComments.txt b/docs/HistoricalNotes/2001-02-09-AdveComments.txt
new file mode 100644
index 0000000..5503233
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-09-AdveComments.txt
@@ -0,0 +1,120 @@
+Ok, here are my comments and suggestions about the LLVM instruction set.
+We should discuss some now, but can discuss many of them later, when we
+revisit synchronization, type inference, and other issues.
+(We have discussed some of the comments already.)
+
+
+o We should consider eliminating the type annotation in cases where it is
+ essentially obvious from the instruction type, e.g., in br, it is obvious
+ that the first arg. should be a bool and the other args should be labels:
+
+ br bool <cond>, label <iftrue>, label <iffalse>
+
+ I think your point was that making all types explicit improves clarity
+ and readability. I agree to some extent, but it also comes at the cost
+ of verbosity. And when the types are obvious from people's experience
+ (e.g., in the br instruction), it doesn't seem to help as much.
+
+
+o On reflection, I really like your idea of having the two different switch
+ types (even though they encode implementation techniques rather than
+ semantics). It should simplify building the CFG and my guess is it could
+ enable some significant optimizations, though we should think about which.
+
+
+o In the lookup-indirect form of the switch, is there a reason not to make
+ the val-type uint? Most HLL switch statements (including Java and C++)
+ require that anyway. And it would also make the val-type uniform
+ in the two forms of the switch.
+
+ I did see the switch-on-bool examples and, while cute, we can just use
+ the branch instructions in that particular case.
+
+
+o I agree with your comment that we don't need 'neg'.
+
+
+o There's a trade-off with the cast instruction:
+ + it avoids having to define all the upcasts and downcasts that are
+ valid for the operands of each instruction (you probably have thought
+ of other benefits also)
+ - it could make the bytecode significantly larger because there could
+ be a lot of cast operations
+
+
+o Making the second arg. to 'shl' a ubyte seems good enough to me.
+ 255 positions seems adequate for several generations of machines
+ and is more compact than uint.
+
+
+o I still have some major concerns about including malloc and free in the
+ language (either as builtin functions or instructions). LLVM must be
+ able to represent code from many different languages. Languages such as
+ C, C++ Java and Fortran 90 would not be able to use our malloc anyway
+ because each of them will want to provide a library implementation of it.
+
+ This gets even worse when code from different languages is linked
+ into a single executable (which is fairly common in large apps).
+ Having a single malloc would just not suffice, and instead would simply
+ complicate the picture further because it adds an extra variant in
+ addition to the one each language provides.
+
+ Instead, providing a default library version of malloc and free
+ (and perhaps a malloc_gc with garbage collection instead of free)
+ would make a good implementation available to anyone who wants it.
+
+ I don't recall all your arguments in favor so let's discuss this again,
+ and soon.
+
+
+o 'alloca' on the other hand sounds like a good idea, and the
+ implementation seems fairly language-independent so it doesn't have the
+ problems with malloc listed above.
+
+
+o About indirect call:
+ Your option #2 sounded good to me. I'm not sure I understand your
+ concern about an explicit 'icall' instruction?
+
+
+o A pair of important synchronization instr'ns to think about:
+ load-linked
+ store-conditional
+
+
+o Other classes of instructions that are valuable for pipeline performance:
+ conditional-move
+ predicated instructions
+
+
+o I believe tail calls are relatively easy to identify; do you know why
+ .NET has a tailcall instruction?
+
+
+o I agree that we need a static data space. Otherwise, emulating global
+ data gets unnecessarily complex.
+
+
+o About explicit parallelism:
+
+ We once talked about adding a symbolic thread-id field to each
+ instruction. (It could be optional so single-threaded codes are
+ not penalized.) This could map well to multi-threaded architectures
+ while providing easy ILP for single-threaded onces. But it is probably
+ too radical an idea to include in a base version of LLVM. Instead, it
+ could a great topic for a separate study.
+
+ What is the semantics of the IA64 stop bit?
+
+
+
+
+o And finally, another thought about the syntax for arrays :-)
+
+ Although this syntax:
+ array <dimension-list> of <type>
+ is verbose, it will be used only in the human-readable assembly code so
+ size should not matter. I think we should consider it because I find it
+ to be the clearest syntax. It could even make arrays of function
+ pointers somewhat readable.
+
diff --git a/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt b/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt
new file mode 100644
index 0000000..5c87330
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt
@@ -0,0 +1,245 @@
+From: Chris Lattner <sabre@nondot.org>
+To: "Vikram S. Adve" <vadve@cs.uiuc.edu>
+Subject: Re: LLVM Feedback
+
+I've included your feedback in the /home/vadve/lattner/llvm/docs directory
+so that it will live in CVS eventually with the rest of LLVM. I've
+significantly updated the documentation to reflect the changes you
+suggested, as specified below:
+
+> We should consider eliminating the type annotation in cases where it is
+> essentially obvious from the instruction type:
+> br bool <cond>, label <iftrue>, label <iffalse>
+> I think your point was that making all types explicit improves clarity
+> and readability. I agree to some extent, but it also comes at the
+> cost of verbosity. And when the types are obvious from people's
+> experience (e.g., in the br instruction), it doesn't seem to help as
+> much.
+
+Very true. We should discuss this more, but my reasoning is more of a
+consistency argument. There are VERY few instructions that can have all
+of the types eliminated, and doing so when available unnecesarily makes
+the language more difficult to handle. Especially when you see 'int
+%this' and 'bool %that' all over the place, I think it would be
+disorienting to see:
+
+ br %predicate, %iftrue, %iffalse
+
+for branches. Even just typing that once gives me the creeps. ;) Like I
+said, we should probably discuss this further in person...
+
+> On reflection, I really like your idea of having the two different
+> switch types (even though they encode implementation techniques rather
+> than semantics). It should simplify building the CFG and my guess is it
+> could enable some significant optimizations, though we should think
+> about which.
+
+Great. I added a note to the switch section commenting on how the VM
+should just use the instruction type as a hint, and that the
+implementation may choose altermate representations (such as predicated
+branches).
+
+> In the lookup-indirect form of the switch, is there a reason not to
+> make the val-type uint?
+
+No. This was something I was debating for a while, and didn't really feel
+strongly about either way. It is common to switch on other types in HLL's
+(for example signed int's are particually common), but in this case, all
+that will be added is an additional 'cast' instruction. I removed that
+from the spec.
+
+> I agree with your comment that we don't need 'neg'
+
+Removed.
+
+> There's a trade-off with the cast instruction:
+> + it avoids having to define all the upcasts and downcasts that are
+> valid for the operands of each instruction (you probably have
+> thought of other benefits also)
+> - it could make the bytecode significantly larger because there could
+> be a lot of cast operations
+
+ + You NEED casts to represent things like:
+ void foo(float);
+ ...
+ int x;
+ ...
+ foo(x);
+ in a language like C. Even in a Java like language, you need upcasts
+ and some way to implement dynamic downcasts.
+ + Not all forms of instructions take every type (for example you can't
+ shift by a floating point number of bits), thus SOME programs will need
+ implicit casts.
+
+To be efficient and to avoid your '-' point above, we just have to be
+careful to specify that the instructions shall operate on all common
+types, therefore casting should be relatively uncommon. For example all
+of the arithmetic operations work on almost all data types.
+
+> Making the second arg. to 'shl' a ubyte seems good enough to me.
+> 255 positions seems adequate for several generations of machines
+
+Okay, that comment is removed.
+
+> and is more compact than uint.
+
+No, it isn't. Remember that the bytecode encoding saves value slots into
+the bytecode instructions themselves, not constant values. This is
+another case where we may introduce more cast instructions (but we will
+also reduce the number of opcode variants that must be supported by a
+virtual machine). Because most shifts are by constant values, I don't
+think that we'll have to cast many shifts. :)
+
+> I still have some major concerns about including malloc and free in the
+> language (either as builtin functions or instructions).
+
+Agreed. How about this proposal:
+
+malloc/free are either built in functions or actual opcodes. They provide
+all of the type safety that the document would indicate, blah blah
+blah. :)
+
+Now, because of all of the excellent points that you raised, an
+implementation may want to override the default malloc/free behavior of
+the program. To do this, they simply implement a "malloc" and
+"free" function. The virtual machine will then be defined to use the user
+defined malloc/free function (which return/take void*'s, not type'd
+pointers like the builtin function would) if one is available, otherwise
+fall back on a system malloc/free.
+
+Does this sound like a good compromise? It would give us all of the
+typesafety/elegance in the language while still allowing the user to do
+all the cool stuff they want to...
+
+> 'alloca' on the other hand sounds like a good idea, and the
+> implementation seems fairly language-independent so it doesn't have the
+> problems with malloc listed above.
+
+Okay, once we get the above stuff figured out, I'll put it all in the
+spec.
+
+> About indirect call:
+> Your option #2 sounded good to me. I'm not sure I understand your
+> concern about an explicit 'icall' instruction?
+
+I worry too much. :) The other alternative has been removed. 'icall' is
+now up in the instruction list next to 'call'.
+
+> I believe tail calls are relatively easy to identify; do you know why
+> .NET has a tailcall instruction?
+
+Although I am just guessing, I believe it probably has to do with the fact
+that they want languages like Haskell and lisp to be efficiently runnable
+on their VM. Of course this means that the VM MUST implement tail calls
+'correctly', or else life will suck. :) I would put this into a future
+feature bin, because it could be pretty handy...
+
+> A pair of important synchronization instr'ns to think about:
+> load-linked
+> store-conditional
+
+What is 'load-linked'? I think that (at least for now) I should add these
+to the 'possible extensions' section, because they are not immediately
+needed...
+
+> Other classes of instructions that are valuable for pipeline
+> performance:
+> conditional-move
+> predicated instructions
+
+Conditional move is effectly a special case of a predicated
+instruction... and I think that all predicated instructions can possibly
+be implemented later in LLVM. It would significantly change things, and
+it doesn't seem to be very necessary right now. It would seem to
+complicate flow control analysis a LOT in the virtual machine. I would
+tend to prefer that a predicated architecture like IA64 convert from a
+"basic block" representation to a predicated rep as part of it's dynamic
+complication phase. Also, if a basic block contains ONLY a move, then
+that can be trivally translated into a conditional move...
+
+> I agree that we need a static data space. Otherwise, emulating global
+> data gets unnecessarily complex.
+
+Definately. Also a later item though. :)
+
+> We once talked about adding a symbolic thread-id field to each
+> ..
+> Instead, it could a great topic for a separate study.
+
+Agreed. :)
+
+> What is the semantics of the IA64 stop bit?
+
+Basically, the IA64 writes instructions like this:
+mov ...
+add ...
+sub ...
+op xxx
+op xxx
+;;
+mov ...
+add ...
+sub ...
+op xxx
+op xxx
+;;
+
+Where the ;; delimits a group of instruction with no dependencies between
+them, which can all be executed concurrently (to the limits of the
+available functional units). The ;; gets translated into a bit set in one
+of the opcodes.
+
+The advantages of this representation is that you don't have to do some
+kind of 'thread id scheduling' pass by having to specify ahead of time how
+many threads to use, and the representation doesn't have a per instruction
+overhead...
+
+> And finally, another thought about the syntax for arrays :-)
+> Although this syntax:
+> array <dimension-list> of <type>
+> is verbose, it will be used only in the human-readable assembly code so
+> size should not matter. I think we should consider it because I find it
+> to be the clearest syntax. It could even make arrays of function
+> pointers somewhat readable.
+
+My only comment will be to give you an example of why this is a bad
+idea. :)
+
+Here is an example of using the switch statement (with my recommended
+syntax):
+
+switch uint %val, label %otherwise,
+ [%3 x {uint, label}] [ { uint %57, label %l1 },
+ { uint %20, label %l2 },
+ { uint %14, label %l3 } ]
+
+Here it is with the syntax you are proposing:
+
+switch uint %val, label %otherwise,
+ array %3 of {uint, label}
+ array of {uint, label}
+ { uint %57, label %l1 },
+ { uint %20, label %l2 },
+ { uint %14, label %l3 }
+
+Which is ambiguous and very verbose. It would be possible to specify
+constants with [] brackets as in my syntax, which would look like this:
+
+switch uint %val, label %otherwise,
+ array %3 of {uint, label} [ { uint %57, label %l1 },
+ { uint %20, label %l2 },
+ { uint %14, label %l3 } ]
+
+But then the syntax is inconsistent between type definition and constant
+definition (why do []'s enclose the constants but not the types??).
+
+Anyways, I'm sure that there is much debate still to be had over
+this... :)
+
+-Chris
+
+http://www.nondot.org/~sabre/os/
+http://www.nondot.org/MagicStats/
+http://korbit.sourceforge.net/
+
+
diff --git a/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt b/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt
new file mode 100644
index 0000000..2c7534d
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt
@@ -0,0 +1,39 @@
+Date: Tue, 13 Feb 2001 13:29:52 -0600 (CST)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: LLVM Concerns...
+
+
+I've updated the documentation to include load store and allocation
+instructions (please take a look and let me know if I'm on the right
+track):
+
+file:/home/vadve/lattner/llvm/docs/LangRef.html#memoryops
+
+I have a couple of concerns I would like to bring up:
+
+1. Reference types
+ Right now, I've spec'd out the language to have a pointer type, which
+ works fine for lots of stuff... except that Java really has
+ references: constrained pointers that cannot be manipulated: added and
+ subtracted, moved, etc... Do we want to have a type like this? It
+ could be very nice for analysis (pointer always points to the start of
+ an object, etc...) and more closely matches Java semantics. The
+ pointer type would be kept for C++ like semantics. Through analysis,
+ C++ pointers could be promoted to references in the LLVM
+ representation.
+
+2. Our "implicit" memory references in assembly language:
+ After thinking about it, this model has two problems:
+ A. If you do pointer analysis and realize that two stores are
+ independent and can share the same memory source object, there is
+ no way to represent this in either the bytecode or assembly.
+ B. When parsing assembly/bytecode, we effectively have to do a full
+ SSA generation/PHI node insertion pass to build the dependencies
+ when we don't want the "pinned" representation. This is not
+ cool.
+ I'm tempted to make memory references explicit in both the assembly and
+ bytecode to get around this... what do you think?
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt b/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt
new file mode 100644
index 0000000..5053433
--- /dev/null
+++ b/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt
@@ -0,0 +1,47 @@
+Date: Tue, 13 Feb 2001 18:25:42 -0600
+From: Vikram S. Adve <vadve@cs.uiuc.edu>
+To: Chris Lattner <sabre@nondot.org>
+Subject: RE: LLVM Concerns...
+
+> 1. Reference types
+> Right now, I've spec'd out the language to have a pointer type, which
+> works fine for lots of stuff... except that Java really has
+> references: constrained pointers that cannot be manipulated: added and
+> subtracted, moved, etc... Do we want to have a type like this? It
+> could be very nice for analysis (pointer always points to the start of
+> an object, etc...) and more closely matches Java semantics. The
+> pointer type would be kept for C++ like semantics. Through analysis,
+> C++ pointers could be promoted to references in the LLVM
+> representation.
+
+
+You're right, having references would be useful. Even for C++ the *static*
+compiler could generate references instead of pointers with fairly
+straightforward analysis. Let's include a reference type for now. But I'm
+also really concerned that LLVM is becoming big and complex and (perhaps)
+too high-level. After we get some initial performance results, we may have
+a clearer idea of what our goals should be and we should revisit this
+question then.
+
+> 2. Our "implicit" memory references in assembly language:
+> After thinking about it, this model has two problems:
+> A. If you do pointer analysis and realize that two stores are
+> independent and can share the same memory source object,
+
+not sure what you meant by "share the same memory source object"
+
+> there is
+> no way to represent this in either the bytecode or assembly.
+> B. When parsing assembly/bytecode, we effectively have to do a full
+> SSA generation/PHI node insertion pass to build the dependencies
+> when we don't want the "pinned" representation. This is not
+> cool.
+
+I understand the concern. But again, let's focus on the performance first
+and then look at the language design issues. E.g., it would be good to know
+how big the bytecode files are before expanding them further. I am pretty
+keen to explore the implications of LLVM for mobile devices. Both bytecode
+size and power consumption are important to consider there.
+
+--Vikram
+
diff --git a/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt b/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt
new file mode 100644
index 0000000..5f7843a
--- /dev/null
+++ b/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt
@@ -0,0 +1,49 @@
+By Chris:
+
+LLVM has been designed with two primary goals in mind. First we strive to
+enable the best possible division of labor between static and dynamic
+compilers, and second, we need a flexible and powerful interface
+between these two complementary stages of compilation. We feel that
+providing a solution to these two goals will yield an excellent solution
+to the performance problem faced by modern architectures and programming
+languages.
+
+A key insight into current compiler and runtime systems is that a
+compiler may fall in anywhere in a "continuum of compilation" to do its
+job. On one side, scripting languages statically compile nothing and
+dynamically compile (or equivalently, interpret) everything. On the far
+other side, traditional static compilers process everything statically and
+nothing dynamically. These approaches have typically been seen as a
+tradeoff between performance and portability. On a deeper level, however,
+there are two reasons that optimal system performance may be obtained by a
+system somewhere in between these two extremes: Dynamic application
+behavior and social constraints.
+
+From a technical perspective, pure static compilation cannot ever give
+optimal performance in all cases, because applications have varying dynamic
+behavior that the static compiler cannot take into consideration. Even
+compilers that support profile guided optimization generate poor code in
+the real world, because using such optimization tunes that application
+to one particular usage pattern, whereas real programs (as opposed to
+benchmarks) often have several different usage patterns.
+
+On a social level, static compilation is a very shortsighted solution to
+the performance problem. Instruction set architectures (ISAs) continuously
+evolve, and each implementation of an ISA (a processor) must choose a set
+of tradeoffs that make sense in the market context that it is designed for.
+With every new processor introduced, the vendor faces two fundamental
+problems: First, there is a lag time between when a processor is introduced
+to when compilers generate quality code for the architecture. Secondly,
+even when compilers catch up to the new architecture there is often a large
+body of legacy code that was compiled for previous generations and will
+not or can not be upgraded. Thus a large percentage of code running on a
+processor may be compiled quite sub-optimally for the current
+characteristics of the dynamic execution environment.
+
+For these reasons, LLVM has been designed from the beginning as a long-term
+solution to these problems. Its design allows the large body of platform
+independent, static, program optimizations currently in compilers to be
+reused unchanged in their current form. It also provides important static
+type information to enable powerful dynamic and link time optimizations
+to be performed quickly and efficiently. This combination enables an
+increase in effective system performance for real world environments.
diff --git a/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt b/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt
new file mode 100644
index 0000000..b546301
--- /dev/null
+++ b/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt
@@ -0,0 +1,202 @@
+Meeting notes: Implementation idea: Exception Handling in C++/Java
+
+The 5/18/01 meeting discussed ideas for implementing exceptions in LLVM.
+We decided that the best solution requires a set of library calls provided by
+the VM, as well as an extension to the LLVM function invocation syntax.
+
+The LLVM function invocation instruction previously looks like this (ignoring
+types):
+
+ call func(arg1, arg2, arg3)
+
+The extension discussed today adds an optional "with" clause that
+associates a label with the call site. The new syntax looks like this:
+
+ call func(arg1, arg2, arg3) with funcCleanup
+
+This funcHandler always stays tightly associated with the call site (being
+encoded directly into the call opcode itself), and should be used whenever
+there is cleanup work that needs to be done for the current function if
+an exception is thrown by func (or if we are in a try block).
+
+To support this, the VM/Runtime provide the following simple library
+functions (all syntax in this document is very abstract):
+
+typedef struct { something } %frame;
+ The VM must export a "frame type", that is an opaque structure used to
+ implement different types of stack walking that may be used by various
+ language runtime libraries. We imagine that it would be typical to
+ represent a frame with a PC and frame pointer pair, although that is not
+ required.
+
+%frame getStackCurrentFrame();
+ Get a frame object for the current function. Note that if the current
+ function was inlined into its caller, the "current" frame will belong to
+ the "caller".
+
+bool isFirstFrame(%frame f);
+ Returns true if the specified frame is the top level (first activated) frame
+ for this thread. For the main thread, this corresponds to the main()
+ function, for a spawned thread, it corresponds to the thread function.
+
+%frame getNextFrame(%frame f);
+ Return the previous frame on the stack. This function is undefined if f
+ satisfies the predicate isFirstFrame(f).
+
+Label *getFrameLabel(%frame f);
+ If a label was associated with f (as discussed below), this function returns
+ it. Otherwise, it returns a null pointer.
+
+doNonLocalBranch(Label *L);
+ At this point, it is not clear whether this should be a function or
+ intrinsic. It should probably be an intrinsic in LLVM, but we'll deal with
+ this issue later.
+
+
+Here is a motivating example that illustrates how these facilities could be
+used to implement the C++ exception model:
+
+void TestFunction(...) {
+ A a; B b;
+ foo(); // Any function call may throw
+ bar();
+ C c;
+
+ try {
+ D d;
+ baz();
+ } catch (int) {
+ ...int Stuff...
+ // execution continues after the try block: the exception is consumed
+ } catch (double) {
+ ...double stuff...
+ throw; // Exception is propogated
+ }
+}
+
+This function would compile to approximately the following code (heavy
+pseudo code follows):
+
+Func:
+ %a = alloca A
+ A::A(%a) // These ctors & dtors could throw, but we ignore this
+ %b = alloca B // minor detail for this example
+ B::B(%b)
+
+ call foo() with fooCleanup // An exception in foo is propogated to fooCleanup
+ call bar() with barCleanup // An exception in bar is propogated to barCleanup
+
+ %c = alloca C
+ C::C(c)
+ %d = alloca D
+ D::D(d)
+ call baz() with bazCleanup // An exception in baz is propogated to bazCleanup
+ d->~D();
+EndTry: // This label corresponds to the end of the try block
+ c->~C() // These could also throw, these are also ignored
+ b->~B()
+ a->~A()
+ return
+
+Note that this is a very straight forward and literal translation: exactly
+what we want for zero cost (when unused) exception handling. Especially on
+platforms with many registers (ie, the IA64) setjmp/longjmp style exception
+handling is *very* impractical. Also, the "with" clauses describe the
+control flow paths explicitly so that analysis is not adversly effected.
+
+The foo/barCleanup labels are implemented as:
+
+TryCleanup: // Executed if an exception escapes the try block
+ c->~C()
+barCleanup: // Executed if an exception escapes from bar()
+ // fall through
+fooCleanup: // Executed if an exception escapes from foo()
+ b->~B()
+ a->~A()
+ Exception *E = getThreadLocalException()
+ call throw(E) // Implemented by the C++ runtime, described below
+
+Which does the work one would expect. getThreadLocalException is a function
+implemented by the C++ support library. It returns the current exception
+object for the current thread. Note that we do not attempt to recycle the
+shutdown code from before, because performance of the mainline code is
+critically important. Also, obviously fooCleanup and barCleanup may be
+merged and one of them eliminated. This just shows how the code generator
+would most likely emit code.
+
+The bazCleanup label is more interesting. Because the exception may be caught
+by the try block, we must dispatch to its handler... but it does not exist
+on the call stack (it does not have a VM Call->Label mapping installed), so
+we must dispatch statically with a goto. The bazHandler thus appears as:
+
+bazHandler:
+ d->~D(); // destruct D as it goes out of scope when entering catch clauses
+ goto TryHandler
+
+In general, TryHandler is not the same as bazHandler, because multiple
+function calls could be made from the try block. In this case, trivial
+optimization could merge the two basic blocks. TryHandler is the code
+that actually determines the type of exception, based on the Exception object
+itself. For this discussion, assume that the exception object contains *at
+least*:
+
+1. A pointer to the RTTI info for the contained object
+2. A pointer to the dtor for the contained object
+3. The contained object itself
+
+Note that it is necessary to maintain #1 & #2 in the exception object itself
+because objects without virtual function tables may be thrown (as in this
+example). Assuming this, TryHandler would look something like this:
+
+TryHandler:
+ Exception *E = getThreadLocalException();
+ switch (E->RTTIType) {
+ case IntRTTIInfo:
+ ...int Stuff... // The action to perform from the catch block
+ break;
+ case DoubleRTTIInfo:
+ ...double Stuff... // The action to perform from the catch block
+ goto TryCleanup // This catch block rethrows the exception
+ break; // Redundant, eliminated by the optimizer
+ default:
+ goto TryCleanup // Exception not caught, rethrow
+ }
+
+ // Exception was consumed
+ if (E->dtor)
+ E->dtor(E->object) // Invoke the dtor on the object if it exists
+ goto EndTry // Continue mainline code...
+
+And that is all there is to it.
+
+The throw(E) function would then be implemented like this (which may be
+inlined into the caller through standard optimization):
+
+function throw(Exception *E) {
+ // Get the start of the stack trace...
+ %frame %f = call getStackCurrentFrame()
+
+ // Get the label information that corresponds to it
+ label * %L = call getFrameLabel(%f)
+ while (%L == 0 && !isFirstFrame(%f)) {
+ // Loop until a cleanup handler is found
+ %f = call getNextFrame(%f)
+ %L = call getFrameLabel(%f)
+ }
+
+ if (%L != 0) {
+ call setThreadLocalException(E) // Allow handlers access to this...
+ call doNonLocalBranch(%L)
+ }
+ // No handler found!
+ call BlowUp() // Ends up calling the terminate() method in use
+}
+
+That's a brief rundown of how C++ exception handling could be implemented in
+llvm. Java would be very similar, except it only uses destructors to unlock
+synchronized blocks, not to destroy data. Also, it uses two stack walks: a
+nondestructive walk that builds a stack trace, then a destructive walk that
+unwinds the stack as shown here.
+
+It would be trivial to get exception interoperability between C++ and Java.
+
diff --git a/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt b/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt
new file mode 100644
index 0000000..3375365
--- /dev/null
+++ b/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt
@@ -0,0 +1,45 @@
+Date: Sat, 19 May 2001 19:09:13 -0500 (CDT)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: RE: Meeting writeup
+
+> I read it through and it looks great!
+
+Thanks!
+
+> The finally clause in Java may need more thought. The code for this clause
+> is like a subroutine because it needs to be entered from many points (end of
+> try block and beginning of each catch block), and then needs to *return to
+> the place from where the code was entered*. That's why JVM has the
+> jsr/jsr_w instruction.
+
+Hrm... I guess that is an implementation decision. It can either be
+modelled as a subroutine (as java bytecodes do), which is really
+gross... or it can be modelled as code duplication (emitted once inline,
+then once in the exception path). Because this could, at worst,
+slightly less than double the amount of code in a function (it is
+bounded) I don't think this is a big deal. One of the really nice things
+about the LLVM representation is that it still allows for runtime code
+generation for exception paths (exceptions paths are not compiled until
+needed). Obviously a static compiler couldn't do this though. :)
+
+In this case, only one copy of the code would be compiled... until the
+other one is needed on demand. Also this strategy fits with the "zero
+cost" exception model... the standard case is not burdened with extra
+branches or "call"s.
+
+> I suppose you could save the return address in a particular register
+> (specific to this finally block), jump to the finally block, and then at the
+> end of the finally block, jump back indirectly through this register. It
+> will complicate building the CFG but I suppose that can be handled. It is
+> also unsafe in terms of checking where control returns (which is I suppose
+> why the JVM doesn't use this).
+
+I think that a code duplication method would be cleaner, and would avoid
+the caveats that you mention. Also, it does not slow down the normal case
+with an indirect branch...
+
+Like everything, we can probably defer a final decision until later. :)
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt b/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt
new file mode 100644
index 0000000..97af16a
--- /dev/null
+++ b/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt
@@ -0,0 +1,63 @@
+Date: Fri, 1 Jun 2001 16:38:17 -0500 (CDT)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: Interesting: GCC passes
+
+
+Take a look at this document (which describes the order of optimizations
+that GCC performs):
+
+http://gcc.gnu.org/onlinedocs/gcc_17.html
+
+The rundown is that after RTL generation, the following happens:
+
+1 . [t] jump optimization (jumps to jumps, etc)
+2 . [t] Delete unreachable code
+3 . Compute live ranges for CSE
+4 . [t] Jump threading (jumps to jumps with identical or inverse conditions)
+5 . [t] CSE
+6 . *** Conversion to SSA
+7 . [t] SSA Based DCE
+8 . *** Conversion to LLVM
+9 . UnSSA
+10. GCSE
+11. LICM
+12. Strength Reduction
+13. Loop unrolling
+14. [t] CSE
+15. [t] DCE
+16. Instruction combination, register movement, scheduling... etc.
+
+I've marked optimizations with a [t] to indicate things that I believe to
+be relatively trivial to implement in LLVM itself. The time consuming
+things to reimplement would be SSA based PRE, Strength reduction & loop
+unrolling... these would be the major things we would miss out on if we
+did LLVM creation from tree code [inlining and other high level
+optimizations are done on the tree representation].
+
+Given the lack of "strong" optimizations that would take a long time to
+reimplement, I am leaning a bit more towards creating LLVM from the tree
+code. Especially given that SGI has GPL'd their compiler, including many
+SSA based optimizations that could be adapted (besides the fact that their
+code looks MUCH nicer than GCC :)
+
+Even if we choose to do LLVM code emission from RTL, we will almost
+certainly want to move LLVM emission from step 8 down until at least CSE
+has been rerun... which causes me to wonder if the SSA generation code
+will still work (due to global variable dependencies and stuff). I assume
+that it can be made to work, but might be a little more involved than we
+would like.
+
+I'm continuing to look at the Tree -> RTL code. It is pretty gross
+because they do some of the translation a statement at a time, and some
+of it a function at a time... I'm not quite clear why and how the
+distinction is drawn, but it does not appear that there is a wonderful
+place to attach extra info.
+
+Anyways, I'm proceeding with the RTL -> LLVM conversion phase for now. We
+can talk about this more on Monday.
+
+Wouldn't it be nice if there were a obvious decision to be made? :)
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt b/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt
new file mode 100644
index 0000000..6c9e097
--- /dev/null
+++ b/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt
@@ -0,0 +1,71 @@
+Date: Fri, 1 Jun 2001 17:08:44 -0500 (CDT)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: RE: Interesting: GCC passes
+
+> That is very interesting. I agree that some of these could be done on LLVM
+> at link-time, but it is the extra time required that concerns me. Link-time
+> optimization is severely time-constrained.
+
+If we were to reimplement any of these optimizations, I assume that we
+could do them a translation unit at a time, just as GCC does now. This
+would lead to a pipeline like this:
+
+Static optimizations, xlation unit at a time:
+.c --GCC--> .llvm --llvmopt--> .llvm
+
+Link time optimizations:
+.llvm --llvm-ld--> .llvm --llvm-link-opt--> .llvm
+
+Of course, many optimizations could be shared between llvmopt and
+llvm-link-opt, but the wouldn't need to be shared... Thus compile time
+could be faster, because we are using a "smarter" IR (SSA based).
+
+> BTW, about SGI, "borrowing" SSA-based optimizations from one compiler and
+> putting it into another is not necessarily easier than re-doing it.
+> Optimization code is usually heavily tied in to the specific IR they use.
+
+Understood. The only reason that I brought this up is because SGI's IR is
+more similar to LLVM than it is different in many respects (SSA based,
+relatively low level, etc), and could be easily adapted. Also their
+optimizations are written in C++ and are actually somewhat
+structured... of course it would be no walk in the park, but it would be
+much less time consuming to adapt, say, SSA-PRE than to rewrite it.
+
+> But your larger point is valid that adding SSA based optimizations is
+> feasible and should be fun. (Again, link time cost is the issue.)
+
+Assuming linktime cost wasn't an issue, the question is:
+Does using GCC's backend buy us anything?
+
+> It also occurs to me that GCC is probably doing quite a bit of back-end
+> optimization (step 16 in your list). Do you have a breakdown of that?
+
+Not really. The irritating part of GCC is that it mixes it all up and
+doesn't have a clean seperation of concerns. A lot of the "back end
+optimization" happens right along with other data optimizations (ie, CSE
+of machine specific things).
+
+As far as REAL back end optimizations go, it looks something like this:
+
+1. Instruction combination: try to make CISCy instructions, if available
+2. Register movement: try to get registers in the right places for the
+architecture to avoid register to register moves. For example, try to get
+the first argument of a function to naturally land in %o0 for sparc.
+3. Instruction scheduling: 'nuff said :)
+4. Register class preferencing: ??
+5. Local register allocation
+6. global register allocation
+7. Spilling
+8. Local regalloc
+9. Jump optimization
+10. Delay slot scheduling
+11. Branch shorting for CISC machines
+12. Instruction selection & peephole optimization
+13. Debug info output
+
+But none of this would be usable for LLVM anyways, unless we were using
+GCC as a static compiler.
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt b/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt
new file mode 100644
index 0000000..1bc2eae
--- /dev/null
+++ b/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt
@@ -0,0 +1,30 @@
+Date: Wed, 20 Jun 2001 12:32:22 -0500
+From: Vikram Adve <vadve@cs.uiuc.edu>
+To: Chris Lattner <lattner@cs.uiuc.edu>
+Subject: .NET vs. our VM
+
+One significant difference between .NET CLR and our VM is that the CLR
+includes full information about classes and inheritance. In fact, I just
+sat through the paper on adding templates to .NET CLR, and the speaker
+indicated that the goal seems to be to do simple static compilation (very
+little lowering or optimization). Also, the templates implementation in CLR
+"relies on dynamic class loading and JIT compilation".
+
+This is an important difference because I think there are some significant
+advantages to have a much lower level VM layer, and do significant static
+analysis and optimization.
+
+I also talked to the lead guy for KAI's C++ compiler (Arch Robison) and he
+said that SGI and other commercial compilers have included options to export
+their *IR* next to the object code (i.e., .il files) and use them for
+link-time code generation. In fact, he said that the .o file was nearly
+empty and was entirely generated from the .il at link-time. But he agreed
+that this limited the link-time interprocedural optimization to modules
+compiled by the same compiler, whereas our approach allows us to link and
+optimize modules from multiple different compilers. (Also, of course, they
+don't do anything for runtime optimization).
+
+All issues to bring up in Related Work.
+
+--Vikram
+
diff --git a/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt b/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt
new file mode 100644
index 0000000..3e10416
--- /dev/null
+++ b/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt
@@ -0,0 +1,31 @@
+Date: Fri, 6 Jul 2001 16:56:56 -0500
+From: Vikram S. Adve <vadve@cs.uiuc.edu>
+To: Chris Lattner <lattner@cs.uiuc.edu>
+Subject: lowering the IR
+
+BTW, I do think that we should consider lowering the IR as you said. I
+didn't get time to raise it today, but it comes up with the SPARC
+move-conditional instruction. I don't think we want to put that in the core
+VM -- it is a little too specialized. But without a corresponding
+conditional move instruction in the VM, it is pretty difficult to maintain a
+close mapping between VM and machine code. Other architectures may have
+other such instructions.
+
+What I was going to suggest was that for a particular processor, we define
+additional VM instructions that match some of the unusual opcodes on the
+processor but have VM semantics otherwise, i.e., all operands are in SSA
+form and typed. This means that we can re-generate core VM code from the
+more specialized code any time we want (so that portability is not lost).
+
+Typically, a static compiler like gcc would generate just the core VM, which
+is relatively portable. Anyone (an offline tool, the linker, etc., or even
+the static compiler itself if it chooses) can transform that into more
+specialized target-specific VM code for a particular architecture. If the
+linker does it, it can do it after all machine-independent optimizations.
+This would be the most convenient, but not necessary.
+
+The main benefit of lowering will be that we will be able to retain a close
+mapping between VM and machine code.
+
+--Vikram
+
diff --git a/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt b/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt
new file mode 100644
index 0000000..9379081
--- /dev/null
+++ b/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt
@@ -0,0 +1,56 @@
+Date: Tue, 18 Sep 2001 00:38:37 -0500 (CDT)
+From: Chris Lattner <sabre@nondot.org>
+To: Vikram S. Adve <vadve@cs.uiuc.edu>
+Subject: Idea for a simple, useful link time optimization
+
+
+In C++ programs, exceptions suck, and here's why:
+
+1. In virtually all function calls, you must assume that the function
+ throws an exception, unless it is defined as 'nothrow'. This means
+ that every function call has to have code to invoke dtors on objects
+ locally if one is thrown by the function. Most functions don't throw
+ exceptions, so this code is dead [with all the bad effects of dead
+ code, including icache pollution].
+2. Declaring a function nothrow causes catch blocks to be added to every
+ call that isnot provably nothrow. This makes them very slow.
+3. Extra extraneous exception edges reduce the opportunity for code
+ motion.
+4. EH is typically implemented with large lookup tables. Ours is going to
+ be much smaller (than the "standard" way of doing it) to start with,
+ but eliminating it entirely would be nice. :)
+5. It is physically impossible to correctly put (accurate, correct)
+ exception specifications on generic, templated code. But it is trivial
+ to analyze instantiations of said code.
+6. Most large C++ programs throw few exceptions. Most well designed
+ programs only throw exceptions in specific planned portions of the
+ code.
+
+Given our _planned_ model of handling exceptions, all of this would be
+pretty trivial to eliminate through some pretty simplistic interprocedural
+analysis. The DCE factor alone could probably be pretty significant. The
+extra code motion opportunities could also be exploited though...
+
+Additionally, this optimization can be implemented in a straight forward
+conservative manner, allowing libraries to be optimized or individual
+files even (if there are leaf functions visible in the translation unit
+that are called).
+
+I think it's a reasonable optimization that hasn't really been addressed
+(because assembly is way too low level for this), and could have decent
+payoffs... without being a overly complex optimization.
+
+After I wrote all of that, I found this page that is talking about
+basically the same thing I just wrote, except that it is translation unit
+at a time, tree based approach:
+http://www.ocston.org/~jls/ehopt.html
+
+but is very useful from "expected gain" and references perspective. Note
+that their compiler is apparently unable to inline functions that use
+exceptions, so there numbers are pretty worthless... also our results
+would (hopefully) be better because it's interprocedural...
+
+What do you think?
+
+-Chris
+
diff --git a/docs/HistoricalNotes/2002-05-12-InstListChange.txt b/docs/HistoricalNotes/2002-05-12-InstListChange.txt
new file mode 100644
index 0000000..004edb0
--- /dev/null
+++ b/docs/HistoricalNotes/2002-05-12-InstListChange.txt
@@ -0,0 +1,55 @@
+Date: Sun, 12 May 2002 17:12:53 -0500 (CDT)
+From: Chris Lattner <sabre@nondot.org>
+To: "Vikram S. Adve" <vadve@cs.uiuc.edu>
+Subject: LLVM change
+
+There is a fairly fundemental change that I would like to make to the LLVM
+infrastructure, but I'd like to know if you see any drawbacks that I
+don't...
+
+Basically right now at the basic block level, each basic block contains an
+instruction list (returned by getInstList()) that is a ValueHolder of
+instructions. To iterate over instructions, we must actually iterate over
+the instlist, and access the instructions through the instlist.
+
+To add or remove an instruction from a basic block, we need to get an
+iterator to an instruction, which, given just an Instruction*, requires a
+linear search of the basic block the instruction is contained in... just
+to insert an instruction before another instruction, or to delete an
+instruction! This complicates algorithms that should be very simple (like
+simple constant propogation), because they aren't actually sparse anymore,
+they have to traverse basic blocks to remove constant propogated
+instructions.
+
+Additionally, adding or removing instructions to a basic block
+_invalidates all iterators_ pointing into that block, which is really
+irritating.
+
+To fix these problems (and others), I would like to make the ordering of
+the instructions be represented with a doubly linked list in the
+instructions themselves, instead of an external data structure. This is
+how many other representations do it, and frankly I can't remember why I
+originally implemented it the way I did.
+
+Long term, all of the code that depends on the nasty features in the
+instruction list (which can be found by grep'ing for getInstList()) will
+be changed to do nice local transformations. In the short term, I'll
+change the representation, but preserve the interface (including
+getInstList()) so that all of the code doesn't have to change.
+
+Iteration over the instructions in a basic block remains the simple:
+for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) ...
+
+But we will also support:
+for (Instruction *I = BB->front(); I; I = I->getNext()) ...
+
+After converting instructions over, I'll convert basic blocks and
+functions to have a similar interface.
+
+The only negative aspect of this change that I see is that it increases
+the amount of memory consumed by one pointer per instruction. Given the
+benefits, I think this is a very reasonable tradeoff.
+
+What do you think?
+
+-Chris
diff --git a/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt b/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt
new file mode 100644
index 0000000..2ca4611
--- /dev/null
+++ b/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt
@@ -0,0 +1,72 @@
+Changes:
+* Change the casting code to be const correct. Now, doing this is invalid:
+ const Value *V = ...;
+ Instruction *I = dyn_cast<Instruction>(V);
+ instead, the second line should be:
+ const Instruction *I = dyn_cast<Instruction>(V);
+
+* Change the casting code to allow casting a reference value thus:
+ const Value &V = ...;
+ Instruction &I = cast<Instruction>(V);
+
+ dyn_cast does not work with references, because it must return a null pointer
+ on failure.
+
+* Fundamentally change how instructions and other values are represented.
+ Before, every llvm container was an instance of the ValueHolder template,
+ instantiated for each container type. This ValueHolder was effectively a
+ wrapper around a vector of pointers to the sub-objects.
+
+ Now, instead of having a vector to pointers of objects, the objects are
+ maintained in a doubly linked list of values (ie each Instruction now has
+ Next & Previous fields). The containers are now instances of ilist (intrusive
+ linked list class), which use the next and previous fields to chain them
+ together. The advantage of this implementation is that iterators can be
+ formed directly from pointers to the LLVM value, and invalidation is much
+ easier to handle.
+
+* As part of the above change, dereferencing an iterator (for example:
+ BasicBlock::iterator) now produces a reference to the underlying type (same
+ example: Instruction&) instead of a pointer to the underlying object. This
+ makes it much easier to write nested loops that iterator over things, changing
+ this:
+
+ for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI)
+ for (BasicBlock::iterator II = (*BI)->begin(); II != (*BI)->end(); ++II)
+ (*II)->dump();
+
+ into:
+
+ for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI)
+ for (BasicBlock::iterator II = BI->begin(); II != BI->end(); ++II)
+ II->dump();
+
+ which is much more natural and what users expect.
+
+* Simplification of #include's: Before, it was necessary for a .cpp file to
+ include every .h file that it used. Now things are batched a little bit more
+ to make it easier to use. Specifically, the include graph now includes these
+ edges:
+ Module.h -> Function.h, GlobalVariable.h
+ Function.h -> BasicBlock.h, Argument.h
+ BasicBlock.h -> Instruction.h
+
+ Which means that #including Function.h is usually sufficient for getting the
+ lower level #includes.
+
+* Printing out a Value* has now changed: Printing a Value* will soon print out
+ the address of the value instead of the contents of the Value. To print out
+ the contents, you must convert it to a reference with (for example)
+ 'cout << *I' instead of 'cout << I;'. This conversion is not yet complete,
+ but will be eventually. In the mean time, both forms print out the contents.
+
+* References are used much more throughout the code base. In general, if a
+ pointer is known to never be null, it is passed in as a reference instead of a
+ pointer. For example, the instruction visitor class uses references instead
+ of pointers, and that Pass subclasses now all receive references to Values
+ instead of pointers, because they may never be null.
+
+* The Function class now has helper functions for accessing the Arguments list.
+ Instead of having to go through getArgumentList for simple things like
+ iterator over the arguments, now the a*() methods can be used to access them.
+
diff --git a/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt b/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt
new file mode 100644
index 0000000..fbe811d
--- /dev/null
+++ b/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt
@@ -0,0 +1,28 @@
+Date: Mon, 20 Jan 2003 00:00:28 -0600
+From: Brian R. Gaeke <gaeke@uiuc.edu>
+Subject: windows vs. llvm
+
+If you're interested, here are some of the major problems compiling LLVM
+under Cygwin and/or Mingw.
+
+1. Cygwin doesn't have <inttypes.h> or <stdint.h>, so all the INT*_MAX
+ symbols and standard int*_t types are off in limbo somewhere. Mingw has
+ <stdint.h>, but Cygwin doesn't like it.
+
+2. Mingw doesn't have <dlfcn.h> (because Windows doesn't have it.)
+
+3. SA_SIGINFO and friends are not around; only signal() seems to work.
+
+4. Relink, aka ld -r, doesn't work (probably an ld bug); you need
+ DONT_BUILD_RELINKED. This breaks all the tools makefiles; you just need to
+ change them to have .a's.
+
+5. There isn't a <values.h>.
+
+6. There isn't a mallinfo() (or, at least, it's documented, but it doesn't seem
+ to link).
+
+7. The version of Bison that cygwin (and newer Linux versions) comes with
+ does not like = signs in rules. Burg's gram.yc source file uses them. I think
+ you can just take them out.
+
diff --git a/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt b/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt
new file mode 100644
index 0000000..a745784
--- /dev/null
+++ b/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt
@@ -0,0 +1,137 @@
+Wed Jun 25 15:13:51 CDT 2003
+
+First-level instrumentation
+---------------------------
+
+We use opt to do Bytecode-to-bytecode instrumentation. Look at
+back-edges and insert llvm_first_trigger() function call which takes
+no arguments and no return value. This instrumentation is designed to
+be easy to remove, for instance by writing a NOP over the function
+call instruction.
+
+Keep count of every call to llvm_first_trigger(), and maintain
+counters in a map indexed by return address. If the trigger count
+exceeds a threshold, we identify a hot loop and perform second-level
+instrumentation on the hot loop region (the instructions between the
+target of the back-edge and the branch that causes the back-edge). We
+do not move code across basic-block boundaries.
+
+
+Second-level instrumentation
+---------------------------
+
+We remove the first-level instrumentation by overwriting the CALL to
+llvm_first_trigger() with a NOP.
+
+The reoptimizer maintains a map between machine-code basic blocks and
+LLVM BasicBlock*s. We only keep track of paths that start at the
+first machine-code basic block of the hot loop region.
+
+How do we keep track of which edges to instrument, and which edges are
+exits from the hot region? 3 step process.
+
+1) Do a DFS from the first machine-code basic block of the hot loop
+region and mark reachable edges.
+
+2) Do a DFS from the last machine-code basic block of the hot loop
+region IGNORING back edges, and mark the edges which are reachable in
+1) and also in 2) (i.e., must be reachable from both the start BB and
+the end BB of the hot region).
+
+3) Mark BBs which end in edges that exit the hot region; we need to
+instrument these differently.
+
+Assume that there is 1 free register. On SPARC we use %g1, which LLC
+has agreed not to use. Shift a 1 into it at the beginning. At every
+edge which corresponds to a conditional branch, we shift 0 for not
+taken and 1 for taken into a register. This uniquely numbers the paths
+through the hot region. Silently fail if we need more than 64 bits.
+
+At the end BB we call countPath and increment the counter based on %g1
+and the return address of the countPath call. We keep track of the
+number of iterations and the number of paths. We only run this
+version 30 or 40 times.
+
+Find the BBs that total 90% or more of execution, and aggregate them
+together to form our trace. But we do not allow more than 5 paths; if
+we have more than 5 we take the ones that are executed the most. We
+verify our assumption that we picked a hot back-edge in first-level
+instrumentation, by making sure that the number of times we took an
+exit edge from the hot trace is less than 10% of the number of
+iterations.
+
+LLC has been taught to recognize llvm_first_trigger() calls and NOT
+generate saves and restores of caller-saved registers around these
+calls.
+
+
+Phase behavior
+--------------
+
+We turn off llvm_first_trigger() calls with NOPs, but this would hide
+phase behavior from us (when some funcs/traces stop being hot and
+others become hot.)
+
+We have a SIGALRM timer that counts time for us. Every time we get a
+SIGALRM we look at our priority queue of locations where we have
+removed llvm_first_trigger() calls. Each location is inserted along
+with a time when we will next turn instrumentation back on for that
+call site. If the time has arrived for a particular call site, we pop
+that off the prio. queue and turn instrumentation back on for that
+call site.
+
+
+Generating traces
+-----------------
+
+When we finally generate an optimized trace we first copy the code
+into the trace cache. This leaves us with 3 copies of the code: the
+original code, the instrumented code, and the optimized trace. The
+optimized trace does not have instrumentation. The original code and
+the instrumented code are modified to have a branch to the trace
+cache, where the optimized traces are kept.
+
+We copy the code from the original to the instrumentation version
+by tracing the LLVM-to-Machine code basic block map and then copying
+each machine code basic block we think is in the hot region into the
+trace cache. Then we instrument that code. The process is similar for
+generating the final optimized trace; we copy the same basic blocks
+because we might need to put in fixup code for exit BBs.
+
+LLVM basic blocks are not typically used in the Reoptimizer except
+for the mapping information.
+
+We are restricted to using single instructions to branch between the
+original code, trace, and instrumented code. So we have to keep the
+code copies in memory near the original code (they can't be far enough
+away that a single pc-relative branch would not work.) Malloc() or
+data region space is too far away. this impacts the design of the
+trace cache.
+
+We use a dummy function that is full of a bunch of for loops which we
+overwrite with trace-cache code. The trace manager keeps track of
+whether or not we have enough space in the trace cache, etc.
+
+The trace insertion routine takes an original start address, a vector
+of machine instructions representing the trace, index of branches and
+their corresponding absolute targets, and index of calls and their
+corresponding absolute targets.
+
+The trace insertion routine is responsible for inserting branches from
+the beginning of the original code to the beginning of the optimized
+trace. This is because at some point the trace cache may run out of
+space and it may have to evict a trace, at which point the branch to
+the trace would also have to be removed. It uses a round-robin
+replacement policy; we have found that this is almost as good as LRU
+and better than random (especially because of problems fitting the new
+trace in.)
+
+We cannot deal with discontiguous trace cache areas. The trace cache
+is supposed to be cache-line-aligned, but it is not page-aligned.
+
+We generate instrumentation traces and optimized traces into separate
+trace caches. We keep the instrumented code around because you don't
+want to delete a trace when you still might have to return to it
+(i.e., return from a llvm_first_trigger() or countPath() call.)
+
+
diff --git a/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt b/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt
new file mode 100644
index 0000000..ec4b93f
--- /dev/null
+++ b/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt
@@ -0,0 +1,110 @@
+Thu Jun 26 14:43:04 CDT 2003
+
+Information about BinInterface
+------------------------------
+
+Take in a set of instructions with some particular register
+allocation. It allows you to add, modify, or delete some instructions,
+in SSA form (kind of like LLVM's MachineInstrs.) Then re-allocate
+registers. It assumes that the transformations you are doing are safe.
+It does not update the mapping information or the LLVM representation
+for the modified trace (so it would not, for instance, support
+multiple optimization passes; passes have to be aware of and update
+manually the mapping information.)
+
+The way you use it is you take the original code and provide it to
+BinInterface; then you do optimizations to it, then you put it in the
+trace cache.
+
+The BinInterface tries to find live-outs for traces so that it can do
+register allocation on just the trace, and stitch the trace back into
+the original code. It has to preserve the live-ins and live-outs when
+it does its register allocation. (On exits from the trace we have
+epilogues that copy live-outs back into the right registers, but
+live-ins have to be in the right registers.)
+
+
+Limitations of BinInterface
+---------------------------
+
+It does copy insertions for PHIs, which it infers from the machine
+code. The mapping info inserted by LLC is not sufficient to determine
+the PHIs.
+
+It does not handle integer or floating-point condition codes and it
+does not handle floating-point register allocation.
+
+It is not aggressively able to use lots of registers.
+
+There is a problem with alloca: we cannot find our spill space for
+spilling registers, normally allocated on the stack, if the trace
+follows an alloca(). What might be an acceptable solution would be to
+disable trace generation on functions that have variable-sized
+alloca()s. Variable-sized allocas in the trace would also probably
+screw things up.
+
+Because of the FP and alloca limitations, the BinInterface is
+completely disabled right now.
+
+
+Demo
+----
+
+This is a demo of the Ball & Larus version that does NOT use 2-level
+profiling.
+
+1. Compile program with llvm-gcc.
+2. Run opt -lowerswitch -paths -emitfuncs on the bytecode.
+ -lowerswitch change switch statements to branches
+ -paths Ball & Larus path-profiling algorithm
+ -emitfuncs emit the table of functions
+3. Run llc to generate SPARC assembly code for the result of step 2.
+4. Use g++ to link the (instrumented) assembly code.
+
+We use a script to do all this:
+------------------------------------------------------------------------------
+#!/bin/sh
+llvm-gcc $1.c -o $1
+opt -lowerswitch -paths -emitfuncs $1.bc > $1.run.bc
+llc -f $1.run.bc
+LIBS=$HOME/llvm_sparc/lib/Debug
+GXX=/usr/dcs/software/evaluation/bin/g++
+$GXX -g -L $LIBS $1.run.s -o $1.run.llc \
+$LIBS/tracecache.o \
+$LIBS/mapinfo.o \
+$LIBS/trigger.o \
+$LIBS/profpaths.o \
+$LIBS/bininterface.o \
+$LIBS/support.o \
+$LIBS/vmcore.o \
+$LIBS/transformutils.o \
+$LIBS/bcreader.o \
+-lscalaropts -lscalaropts -lanalysis \
+-lmalloc -lcpc -lm -ldl
+------------------------------------------------------------------------------
+
+5. Run the resulting binary. You will see output from BinInterface
+(described below) intermixed with the output from the program.
+
+
+Output from BinInterface
+------------------------
+
+BinInterface's debugging code prints out the following stuff in order:
+
+1. Initial code provided to BinInterface with original register
+allocation.
+
+2. Section 0 is the trace prolog, consisting mainly of live-ins and
+register saves which will be restored in epilogs.
+
+3. Section 1 is the trace itself, in SSA form used by BinInterface,
+along with the PHIs that are inserted.
+PHIs are followed by the copies that implement them.
+Each branch (i.e., out of the trace) is annotated with the
+section number that represents the epilog it branches to.
+
+4. All the other sections starting with Section 2 are trace epilogs.
+Every branch from the trace has to go to some epilog.
+
+5. After the last section is the register allocation output.
diff --git a/docs/HistoricalNotes/2007-OriginalClangReadme.txt b/docs/HistoricalNotes/2007-OriginalClangReadme.txt
new file mode 100644
index 0000000..611dc9d
--- /dev/null
+++ b/docs/HistoricalNotes/2007-OriginalClangReadme.txt
@@ -0,0 +1,178 @@
+//===----------------------------------------------------------------------===//
+// C Language Family Front-end
+//===----------------------------------------------------------------------===//
+ Chris Lattner
+
+I. Introduction:
+
+ clang: noun
+ 1. A loud, resonant, metallic sound.
+ 2. The strident call of a crane or goose.
+ 3. C-language family front-end toolkit.
+
+ The world needs better compiler tools, tools which are built as libraries. This
+ design point allows reuse of the tools in new and novel ways. However, building
+ the tools as libraries isn't enough: they must have clean APIs, be as
+ decoupled from each other as possible, and be easy to modify/extend. This
+ requires clean layering, decent design, and avoiding tying the libraries to a
+ specific use. Oh yeah, did I mention that we want the resultant libraries to
+ be as fast as possible? :)
+
+ This front-end is built as a component of the LLVM toolkit that can be used
+ with the LLVM backend or independently of it. In this spirit, the API has been
+ carefully designed as the following components:
+
+ libsupport - Basic support library, reused from LLVM.
+
+ libsystem - System abstraction library, reused from LLVM.
+
+ libbasic - Diagnostics, SourceLocations, SourceBuffer abstraction,
+ file system caching for input source files. This depends on
+ libsupport and libsystem.
+
+ libast - Provides classes to represent the C AST, the C type system,
+ builtin functions, and various helpers for analyzing and
+ manipulating the AST (visitors, pretty printers, etc). This
+ library depends on libbasic.
+
+
+ liblex - C/C++/ObjC lexing and preprocessing, identifier hash table,
+ pragma handling, tokens, and macros. This depends on libbasic.
+
+ libparse - C (for now) parsing and local semantic analysis. This library
+ invokes coarse-grained 'Actions' provided by the client to do
+ stuff (e.g. libsema builds ASTs). This depends on liblex.
+
+ libsema - Provides a set of parser actions to build a standardized AST
+ for programs. AST's are 'streamed' out a top-level declaration
+ at a time, allowing clients to use decl-at-a-time processing,
+ build up entire translation units, or even build 'whole
+ program' ASTs depending on how they use the APIs. This depends
+ on libast and libparse.
+
+ librewrite - Fast, scalable rewriting of source code. This operates on
+ the raw syntactic text of source code, allowing a client
+ to insert and delete text in very large source files using
+ the same source location information embedded in ASTs. This
+ is intended to be a low-level API that is useful for
+ higher-level clients and libraries such as code refactoring.
+
+ libanalysis - Source-level dataflow analysis useful for performing analyses
+ such as computing live variables. It also includes a
+ path-sensitive "graph-reachability" engine for writing
+ analyses that reason about different possible paths of
+ execution through source code. This is currently being
+ employed to write a set of checks for finding bugs in software.
+
+ libcodegen - Lower the AST to LLVM IR for optimization & codegen. Depends
+ on libast.
+
+ clang - An example driver, client of the libraries at various levels.
+ This depends on all these libraries, and on LLVM VMCore.
+
+ This front-end has been intentionally built as a DAG of libraries, making it
+ easy to reuse individual parts or replace pieces if desired. For example, to
+ build a preprocessor, you take the Basic and Lexer libraries. If you want an
+ indexer, you take those plus the Parser library and provide some actions for
+ indexing. If you want a refactoring, static analysis, or source-to-source
+ compiler tool, it makes sense to take those plus the AST building and semantic
+ analyzer library. Finally, if you want to use this with the LLVM backend,
+ you'd take these components plus the AST to LLVM lowering code.
+
+ In the future I hope this toolkit will grow to include new and interesting
+ components, including a C++ front-end, ObjC support, and a whole lot of other
+ things.
+
+ Finally, it should be pointed out that the goal here is to build something that
+ is high-quality and industrial-strength: all the obnoxious features of the C
+ family must be correctly supported (trigraphs, preprocessor arcana, K&R-style
+ prototypes, GCC/MS extensions, etc). It cannot be used if it is not 'real'.
+
+
+II. Usage of clang driver:
+
+ * Basic Command-Line Options:
+ - Help: clang --help
+ - Standard GCC options accepted: -E, -I*, -i*, -pedantic, -std=c90, etc.
+ - To make diagnostics more gcc-like: -fno-caret-diagnostics -fno-show-column
+ - Enable metric printing: -stats
+
+ * -fsyntax-only is currently the default mode.
+
+ * -E mode works the same way as GCC.
+
+ * -Eonly mode does all preprocessing, but does not print the output,
+ useful for timing the preprocessor.
+
+ * -fsyntax-only is currently partially implemented, lacking some
+ semantic analysis (some errors and warnings are not produced).
+
+ * -parse-noop parses code without building an AST. This is useful
+ for timing the cost of the parser without including AST building
+ time.
+
+ * -parse-ast builds ASTs, but doesn't print them. This is most
+ useful for timing AST building vs -parse-noop.
+
+ * -parse-ast-print pretty prints most expression and statements nodes.
+
+ * -parse-ast-check checks that diagnostic messages that are expected
+ are reported and that those which are reported are expected.
+
+ * -dump-cfg builds ASTs and then CFGs. CFGs are then pretty-printed.
+
+ * -view-cfg builds ASTs and then CFGs. CFGs are then visualized by
+ invoking Graphviz.
+
+ For more information on getting Graphviz to work with clang/LLVM,
+ see: http://llvm.org/docs/ProgrammersManual.html#ViewGraph
+
+
+III. Current advantages over GCC:
+
+ * Column numbers are fully tracked (no 256 col limit, no GCC-style pruning).
+ * All diagnostics have column numbers, includes 'caret diagnostics', and they
+ highlight regions of interesting code (e.g. the LHS and RHS of a binop).
+ * Full diagnostic customization by client (can format diagnostics however they
+ like, e.g. in an IDE or refactoring tool) through DiagnosticClient interface.
+ * Built as a framework, can be reused by multiple tools.
+ * All languages supported linked into same library (no cc1,cc1obj, ...).
+ * mmap's code in read-only, does not dirty the pages like GCC (mem footprint).
+ * LLVM License, can be linked into non-GPL projects.
+ * Full diagnostic control, per diagnostic. Diagnostics are identified by ID.
+ * Significantly faster than GCC at semantic analysis, parsing, preprocessing
+ and lexing.
+ * Defers exposing platform-specific stuff to as late as possible, tracks use of
+ platform-specific features (e.g. #ifdef PPC) to allow 'portable bytecodes'.
+ * The lexer doesn't rely on the "lexer hack": it has no notion of scope and
+ does not categorize identifiers as types or variables -- this is up to the
+ parser to decide.
+
+Potential Future Features:
+
+ * Fine grained diag control within the source (#pragma enable/disable warning).
+ * Better token tracking within macros? (Token came from this line, which is
+ a macro argument instantiated here, recursively instantiated here).
+ * Fast #import with a module system.
+ * Dependency tracking: change to header file doesn't recompile every function
+ that texually depends on it: recompile only those functions that need it.
+ This is aka 'incremental parsing'.
+
+
+IV. Missing Functionality / Improvements
+
+Lexer:
+ * Source character mapping. GCC supports ASCII and UTF-8.
+ See GCC options: -ftarget-charset and -ftarget-wide-charset.
+ * Universal character support. Experimental in GCC, enabled with
+ -fextended-identifiers.
+ * -fpreprocessed mode.
+
+Preprocessor:
+ * #assert/#unassert
+ * MSExtension: "L#param" stringizes to a wide string literal.
+ * Add support for -M*
+
+Traditional Preprocessor:
+ * Currently, we have none. :)
+