summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/gen8_draw_upload.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Eliminate brw->vs.prog_data pointer.Kenneth Graunke2016-10-051-16/+19
| | | | | | | | | | | | Just say no to: - brw->vs.base.prog_data = &brw->vs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_vs_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
* i965: wrap unused function in #ifndef NDEBUGTimothy Arceri2016-10-051-0/+2
| | | | | | | This function is only ever used by an assert() this fixes an unused function warning in release builds. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Unify vertex buffer setupTopi Pohjolainen2016-07-041-24/+17
| | | | | | | | On gen >= 8 one doesn't provide ending address but number of bytes available. This is relative to the given offset. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/draw: Use the real size for index buffersJason Ekstrand2016-05-231-1/+1
| | | | | | | Previously, we were using the size of the whole BO which may be substantially larger than the actual index buffer size. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/draw: Use the real size for vertex buffersJason Ekstrand2016-05-231-1/+1
| | | | | | | Previously, we were using the size of the BO which may be substantially larger than the actual vertex buffer size. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: passthru formats cannot be used width edge flag enabledAlejandro Piñeiro2016-05-171-0/+20
| | | | | | Add an assertion to detect this case. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Configure how to store *64*PASSTHRU vertex componentsAntia Puentes2016-05-171-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | From the Broadwell specification, structure VERTEX_ELEMENT_STATE description: "When SourceElementFormat is set to one of the *64*_PASSTHRU formats, 64-bit components are stored in the URB without any conversion. In this case, vertex elements must be written as 128 or 256 bits, with VFCOMP_STORE_0 being used to pad the output as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red component into the URB, Component 1 must be specified as VFCOMP_STORE_0 (with Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit vertex element, or Components 1-3 must be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex element. Likewise, use of R64G64B64_PASSTHRU requires Component 3 to be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex element." Uses 128-bits to write double and dvec2 vertex elements, and 256-bits for dvec3 and dvec4 vertex elements. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Signed-off-by: Antia Puentes <apuentes@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make all atoms to track BRW_NEW_BLORP by defaultKenneth Graunke2016-04-231-1/+4
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com
* i965: Add support for gl_DrawIDARB and enable extensionKristian Høgsberg Kristensen2015-12-291-2/+32
| | | | | | | | | | We have to break open a new vec4 for gl_DrawIDARB. We've used up all space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its own separate vertex buffer anyway. This is because we point the vb for base vertex and base instance into the draw parameter BO for indirect draw calls, but the draw id is generated by mesa in a different buffer. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Add support for gl_BaseVertexARB and gl_BaseInstanceARBKristian Høgsberg Kristensen2015-12-291-14/+21
| | | | | | | We already have gl_BaseVertexARB in the .x component of the SGVS vec4 and plug gl_BaseInstanceARB into the last free component (.y). Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Drop #include of main/glheader.h.Matt Turner2015-11-241-1/+0
| | | | | | It's never used. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/bdw: Fix 3DSTATE_VF_INSTANCING when the edge flag is usedNeil Roberts2015-08-221-2/+13
| | | | | | | | | | | | | | | | | | When the edge flag element is enabled then the elements are slightly reordered so that the edge flag is always the last one. This was confusing the code to upload the 3DSTATE_VF_INSTANCING state because that is uploaded with a separate loop which has an instruction for each element. The indices used in these instructions weren't taking into account the reordering so the state would be incorrect. v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope when gl_VertexID is used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Mark Janes <mark.a.janes@intel.com>
* i965: Swap the order of the vertex ID and edge flag attributesNeil Roberts2015-08-221-14/+42
| | | | | | | | | | | | | | | | | | | | The edge flag data on Gen6+ is passed through the fixed function hardware as an extra attribute. According to the PRM it must be the last valid VERTEX_ELEMENT structure. However if the vertex ID is also used then another extra element is added to source the VID. This made it so the vertex ID is in the wrong register in the vertex shader and the edge attribute is no longer in the last element. v2: Also implement for BDW+ v3 [by Ben]: Remove 10.5 tag. Too late. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Mark Janes <mark.a.janes@intel.com>
* i965/bdw: Fix setting the instancing state for the SGVS elementNeil Roberts2015-08-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When gl_VertexID or gl_InstanceID is used a 3DSTATE_VF_SGVS instruction is sent to create a sort of element to store the generated values. The last instruction in this chunk of code looks like it was trying to set the instancing state for the element using the 3DSTATE_VF_INSTANCING instruction. However it was sending brw->vb.nr_buffers instead of the element index. This instruction is supposed to take an element index and that is how it is used further down in the function so the previous code looks wrong. Perhaps previously the number of buffers coincidentally matched the number of enabled elements so the value was generally correct anyway. In a subsequent patch I want to change a bit how it chooses the SGVS element index so this needs to be fixed. v2 [by Ben] Remove stable 10.5 stable tag (it's too late now) Commit update as follows: The number of vertex buffers emitted is always <= the number of vertex elements. To maximize reuse (actually, to minimize relocations - according to the code comments), a vertex buffer is only emitted once, even when we setup multiple components (3DSTATE_VERTEX_ELEMENT) from that buffer. This meant that the previous code would use the wrong indexed element for these reuse cases. This patch by itself prevents hangs on BSW in the linked bug. It doesn't make the test pass, the remaining patches are needed for that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91610 Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Tested-by: Mark Janes <mark.a.janes@intel.com> Cc: <mesa-stable@lists.freedesktop.org>
* i965: Micro-optimize brw_get_index_typeIan Romanick2015-01-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the switch-statement, GCC 4.8.3 produces a small pile of code with a branch. 00000000 <brw_get_index_type>: 000000: 8b 54 24 04 mov 0x4(%esp),%edx 000004: b8 01 00 00 00 mov $0x1,%eax 000009: 81 fa 03 14 00 00 cmp $0x1403,%edx 00000f: 74 0d je 00001e <brw_get_index_type+0x1e> 000011: 31 c0 xor %eax,%eax 000013: 81 fa 05 14 00 00 cmp $0x1405,%edx 000019: 0f 94 c0 sete %al 00001c: 01 c0 add %eax,%eax 00001e: c3 ret However, this could be two instructions. 00000000 <brw_get_index_type>: 000000: 2d 01 14 00 00 sub $0x1401,%eax 000005: d1 e8 shr %eax 000007: 90 nop 000008: 90 nop 000009: 90 nop 00000a: 90 nop 00000b: c3 ret The function was also moved to the header so that it could be inlined at the two call sites. Without this, 32-bit also needs to pull the parameter from the stack. This means there is a push, a call, a move, and a ret added to a two instruction function. The above code shows the function with __attribute__((regparm=1)), but even this adds several extra instructions. There is also an extra instruction on 64-bit to move the parameter to %eax for the subtract. On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects Gl32Batch7: 32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40) 64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40) Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Delete brw_state_flags::cache and related code.Kenneth Graunke2014-12-021-2/+0
| | | | | | | | | It's been merged into brw_state_flags::brw for simplicity and efficiency. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Move BRW_NEW_*_PROG_DATA flags to .brw (not .cache).Kenneth Graunke2014-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | I put the BRW_NEW_*_PROG_DATA flags at the beginning so that brw_state_cache.c can still continue using 1 << brw_cache_id. I also added a comment explaining the difference between BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time to remember it. Non-mechanical changes: - brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache. - brw_state_upload.c - INTEL_DEBUG=state changes. - brw_context.h - bit definition merging. v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention state-based recompiles, and nix the "proper subset" claim, as it's false. (Caught by Kristian Høgsberg). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Rename CACHE_NEW_*_PROG to BRW_NEW_*_PROG_DATA.Kenneth Graunke2014-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only ones that are left are legitimately related to the program cache. Yet, it seems a bit wasteful to have an entire bitfield for only 7 bits. State upload is one of the hottest paths in the driver. For each atom in the list, we call check_state() to see if it needs to be emitted. Currently, this involves comparing three separate bitfields (mesa, brw, and cache). Consolidating the brw and cache bitfields would save a small amount of CPU overhead per atom. Broadwell, for example, has 57 state atoms, so this small savings can add up. CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the offset into the program cache BO (prog_offset). Since most uses refer to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name. Removing "cache" completely is a bit painful, so I decided to do it in several patches for easier review, and to separate mechanical changes from manual ones. This one simply renames things, and was made via: $ for file in *.[ch]; do sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \ -e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file done Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw! The next patch will remedy this flaw. It will also fix the alphabetization issues. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Matt Turner <mattst88@gmail.com>
* i965: Alphabetize brw_tracked_state flags and use a consistent style.Kenneth Graunke2014-11-291-2/+4
| | | | | | | | | | | | | | | | Most of the dirty flags were listed in some arbitrary order. Some used bonus parenthesis. Some put multiple flags on one line, others put one per line. Some used tabs instead of spaces...but only on some lines. This patch settles on one flag per line, in alphabetical order, using spaces instead of tabs, and sheds the unnecessary parentheses. Sorting was mostly done with vim's visual block feature and !sort, although I alphabetized short lists by hand; it was pretty manual. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/skl: Use new MOCS for SKLKristian Høgsberg2014-11-031-3/+5
| | | | | | | | | | | | On Skylake, the MOCS bits are an index into a table of 63 different, configurable cache configurations. As for previous GENs, we only care about WB and WT, which are available in the documented default set. Define SKL_MOCS_WB and SKL_MOCS_WT to the indices for those configucations and use those for the Skylake MOCS values. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Separate gl_InstanceID and gl_VertexID uploading.Kenneth Graunke2014-09-121-7/+15
| | | | | | | | | | | | | | | | We always uploaded them together, mostly out of laziness - both required an additional vertex element. However, gl_VertexID now also requires an additional vertex buffer for storing gl_BaseVertex; for non-indirect draws this also means uploading (a small amount of) data. This is extra overhead we don't need if the shader only uses gl_InstanceID. In particular, our clear shaders currently use gl_InstanceID for doing layered clears, but don't need gl_VertexID. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "10.3" <mesa-stable@lists.freedesktop.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Expose gl_BaseVertex via a vertex attribute.Kenneth Graunke2014-09-101-7/+33
| | | | | | | | | | | | | | | | | | | Now that we have the data available, we need to expose it to the shaders. We can reuse the same vertex element that we use for gl_VertexID, but we need to back it by an actual vertex buffer. A hardware restriction requires that vertex attributes coming from a buffer (STORE_SRC) must come before any other types (i.e. STORE_0). So, we have to make gl_BaseVertex be the .x component of the vertex attribute. This means moving gl_VertexID to a different component. I chose to move gl_VertexID and gl_InstanceID to the .z and .w components, respectively, to make room for gl_BaseInstance in the .y component (which would also come from a buffer, and therefore be STORE_SRC). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Calculate start/base_vertex_location after preparing vertices.Kenneth Graunke2014-09-101-0/+1
| | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Drop Broadwell perf_debugs about missing MOCS that aren't missing.Kenneth Graunke2014-06-151-2/+0
| | | | | | | | | I actually added MOCS support for these things, but forgot to delete the corresponding perf_debug() warnings. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: "10.2" <mesa-stable@lists.freedesktop.org>
* i965: Add missing MOCS setup for 3DSTATE_INDEX_BUFFER on Broadwell.Kenneth Graunke2014-06-151-3/+1
| | | | | | | | Somehow I missed this when adding all of the other MOCS values. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: "10.2" <mesa-stable@lists.freedesktop.org>
* i965: Stop setting up a 1:1 "attrib" member in our vertex inputs.Eric Anholt2014-04-111-1/+1
| | | | | | | | | It's just the array index, so we can just go look at the array and see which element we are. No significant performance difference (n=140) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Set Broadwell MOCS values everywhere it's possible.Kenneth Graunke2014-03-251-0/+1
| | | | | | | | | | | | | | This patch introduces two pre-canned MOCS values: BDW_MOCS_WB (write-back, all caches) and BDW_MOCS_WT (write-through, all caches). We use write-through caching for render targets, and write-back for all other data. (At least on Haswell, I believe write-back LLC/eLLC didn't work for scan-out buffers, while write-through did.) No performance analysis has been done on the impact of this patch. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Eric Anholt <eric@anholt.net>
* i965: Rework vertex uploads for Broadwell.Kenneth Graunke2014-01-311-0/+250
v2: Emit a dummy 3DSTATE_VF_SGVS packet when not needed. v3: Add WARN_ONCE and perf_debugs requested by Eric Anholt. v4: Program 3DSTATE_SGVS even in the no-elements case so gl_VertexID continues working. Fix 3DSTATE_VF_INSTANCING to not use an element index to access the buffers array. Some ARB_draw_indirect prep work. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>