From e00906fbc222c19b7ab84a817b2be46b87484e99 Mon Sep 17 00:00:00 2001 From: Reid Spencer Date: Thu, 10 Aug 2006 20:15:58 +0000 Subject: Answer the most frequently asked question, about GEPs. The answer is sufficiently long that I placed it in a separate file but it links from the FAQ page. More might need to be added to GetElementPtr.html to address additional confusion surrounding GEP. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29594 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/FAQ.html | 2 + docs/GetElementPtr.html | 249 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 251 insertions(+) create mode 100644 docs/GetElementPtr.html (limited to 'docs') diff --git a/docs/FAQ.html b/docs/FAQ.html index 2ab00ac..6e0600b 100644 --- a/docs/FAQ.html +++ b/docs/FAQ.html @@ -60,6 +60,8 @@
  • What source languages are supported?
  • What support is there for higher level source language constructs for building a compiler?
  • +
  • I don't understand the GetElementPtr + instruction. Help!
  • Using the GCC Front End diff --git a/docs/GetElementPtr.html b/docs/GetElementPtr.html new file mode 100644 index 0000000..13b5138 --- /dev/null +++ b/docs/GetElementPtr.html @@ -0,0 +1,249 @@ + + + + + The Often Misunderstood GEP Instruction + + + + +
    + The Often Misunderstood GEP Instruction +
    + +
      +
    1. Introduction
    2. +
    3. The Questions +
        +
      1. Why is the extra 0 index required?
      2. +
      3. What is dereferenced by GEP?
      4. +
      5. Why can you index through the first pointer but not + subsequent ones?
      6. +
      7. Why don't GEP x,0,0,1 and GEP x,1 alias?
      8. +
      9. Why do GEP x,1,0,0 and GEP x,1 alias?
      10. +
    4. +
    5. Summary
    6. +
    + +
    +

    Written by: Reid Spencer.

    +
    + + + + + +
    +

    This document seeks to dispel the mystery and confusion surrounding LLVM's + GetElementPtr (GEP) instruction. Questions about the wiley GEP instruction are + probably the most frequently occuring questions once a developer gets down to + coding with LLVM. Here we lay out the sources of confusion and show that the + GEP instruction is really quite simple. +

    +
    + + + + +
    +

    When people are first confronted with the GEP instruction, they tend to + relate it to known concepts from other programming paradigms, most notably C + array indexing and field selection. However, GEP is a little different and + this leads to the following questions, all of which are answered in the + following sections.

    +
      +
    1. Why is the extra 0 index required?
    2. +
    3. What is dereferenced by GEP?
    4. +
    5. Why can you index through the first pointer but not + subsequent ones?
    6. +
    7. Why don't GEP x,0,0,1 and GEP x,1 alias?
    8. +
    9. Why do GEP x,1,0,0 and GEP x,1 alias?
    10. +
    +
    + + + + +
    +

    Quick answer: there are no superfluous indices.

    +

    This question arises most often when the GEP instruction is applied to a + global variable which is always a pointer type. For example, consider + this:

    +  %MyStruct = uninitialized global { float*, int }
    +  ...
    +  %idx = getelementptr { float*, int }* %MyStruct, long 0, ubyte 1
    +

    The GEP above yields an int* by indexing the int typed + field of the structure %MyStruct. When people first look at it, they + wonder why the long 0 index is needed. However, a closer inspection + of how globals and GEPs work reveals the need. Becoming aware of the following + facts will dispell the confusion:

    +
      +
    1. The type of %MyStruct is not { float*, int } + but rather { float*, int }*. That is, %MyStruct is a + pointer to a structure containing a pointer to a float and an + int.
    2. +
    3. Point #1 is evidenced by noticing the type of the first operand of + the GEP instruction (%MyStruct) which is + { float*, int }*.
    4. +
    5. The first index, long 0 is required to dereference the + pointer associated with %MyStruct.
    6. +
    7. The second index, ubyte 1 selects the second field of the + structure (the int).
    8. +
    +
    + + + +
    +

    Quick answer: nothing.

    +

    The GetElementPtr instruction dereferences nothing. That is, it doesn't + access memory in any way. That's what the Load instruction is for. GEP is + only involved in the computation of addresses. For example, consider this:

    +
    +  %MyVar = uninitialized global { [40 x int ]* }
    +  ...
    +  %idx = getelementptr { [40 x int]* }* %MyVar, long 0, ubyte 0, long 0, long 17
    +

    In this example, we have a global variable, %MyVar that is a + pointer to a structure containing a pointer to an array of 40 ints. The + GEP instruction seems to be accessing the 18th integer of of the structure's + array of ints. However, this is actually an illegal GEP instruction. It + won't compile. The reason is that the pointer in the structure must + be dereferenced in order to index into the array of 40 ints. Since the + GEP instruction never accesses memory, it is illegal.

    +

    In order to access the 18th integer in the array, you would need to do the + following:

    +
    +  %idx = getelementptr { [40 x int]* }* %, long 0, ubyte 0
    +  %arr = load [40 x int]** %idx
    +  %idx = getelementptr [40 x int]* %arr, long 0, long 17
    +

    In this case, we have to load the pointer in the structure with a load + instruction before we can index into the array. If the example was changed + to:

    +
    +  %MyVar = uninitialized global { [40 x int ] }
    +  ...
    +  %idx = getelementptr { [40 x int] }*, long 0, ubyte 0, long 17
    +

    then everything works fine. In this case, the structure does not contain a + pointer and the GEP instruction can index through the global variable pointer, + into the first field of the structure and access the 18th int in the + array there.

    +
    + + + +
    +

    Quick answer: Because its already present.

    +

    Having understood the previous question, a new + question then arises:

    +
    Why is it okay to index through the first pointer, but + subsequent pointers won't be dereferenced?
    +

    The answer is simply because + memory does not have to be accessed to perform the computation. The first + operand to the GEP instruction must be a value of a pointer type. The value + of the pointer is provided directly to the GEP instruction without any need + for accessing memory. It must, therefore be indexed like any other operand. + Consider this example:

    +
    +  %MyVar = unintialized global int
    +  ...
    +  %idx1 = getelementptr int* %MyVar, long 0
    +  %idx2 = getelementptr int* %MyVar, long 1
    +  %idx3 = getelementptr int* %MyVar, long 2
    +

    These GEP instructions are simply making address computations from the + base address of MyVar. They compute, as follows (using C syntax):

    +
      +
    • idx1 = &MyVar + 0
    • +
    • idx2 = &MyVar + 4
    • +
    • idx3 = &MyVar = 8
    • +
    +

    Since the type int is known to be four bytes long, the indices + 0, 1 and 2 translate into memory offsets of 0, 4, and 8, respectively. No + memory is accessed to make these computations because the address of + %MyVar is passed directly to the GEP instructions.

    +

    Note that the cases of %idx2 and %idx3 are a bit silly. + They are computing addresses of something of unknown type (and thus + potentially breaking type safety) because %MyVar is only one + integer long.

    +
    + + + +
    +

    Quick Answer: They compute different address locations.

    +

    If you look at the first indices in these GEP + instructions you find that they are different (0 and 1), therefore the address + computation diverges with that index. Consider this example:

    +
    +  %MyVar = global { [10 x int ] }
    +  %idx1 = getlementptr { [10 x int ] }* %MyVar, long 0, byte 0, long 1
    +  %idx2 = getlementptr { [10 x int ] }* %MyVar, long 1
    +

    In this example, idx1 computes the address of the second integer + in the array that is in the structure in %MyVar, that is MyVar+4. The + type of idx1 is int*. However, idx2 computes the + address of the next structure after %MyVar. The type of + idx2 is { [10 x int] }* and its value is equivalent + to MyVar + 40 because it indexes past the ten 4-byte integers + in MyVar. Obviously, in such a situation, the pointers don't + alias.

    +
    + + + +
    +

    Quick Answer: They compute the same address location.

    +

    These two GEP instructions will compute the same address because indexing + through the 0th element does not change the address. However, it does change + the type. Consider this example:

    +
    +  %MyVar = global { [10 x int ] }
    +  %idx1 = getlementptr { [10 x int ] }* %MyVar, long 1, byte 0, long 0
    +  %idx2 = getlementptr { [10 x int ] }* %MyVar, long 1
    +

    In this example, the value of %idx1 is %MyVar+40 and + its type is int*. The value of %idx2 is also + MyVar+40 but its type is { [10 x int] }*.

    +
    + + + + + +
    +

    In summary, here's some things to always remember about the GetElementPtr + instruction:

    +
      +
    1. The GEP instruction never accesses memory, it only provides pointer + computations.
    2. +
    3. The first operand to the GEP instruction is always a pointer and it must + be indexed.
    4. +
    5. There are no superfluous indices for the GEP instruction.
    6. +
    7. Trailing zero indices are superfluous for pointer aliasing, but not for + the types of the pointers.
    8. +
    9. Leading zero indices are not superfluous for pointer aliasing nor the + types of the pointers.
    10. +
    +
    + + + +
    +
    + Valid CSS! + Valid HTML 4.01! + The LLVM Compiler Infrastructure
    + Last modified: $Date$ +
    + + -- cgit v1.1