summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_context.h
Commit message (Collapse)AuthorAgeFilesLines
* i965/sync: Rename intel_syncobj.c -> brw_sync.cChad Versace2016-10-051-1/+1
| | | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/sync: Replace 'intel' prefix with 'brw'Chad Versace2016-10-051-1/+1
| | | | | | | This is yet another patch for the great renaming begun long ago. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Only emit 1 viewport when possible.Kenneth Graunke2016-10-031-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | In core profile, we support up to 16 viewports. However, in the majority of cases, only 1 of them is actually used - we only need the others if the last shader stage prior to the rasterizer writes gl_ViewportIndex. Processing all 16 viewports adds additional CPU overhead, which hurts CPU-intensive workloads such as Glamor. This meant that switching to core profile actually penalized Glamor to an extent, which is unfortunate. This patch tracks the number of relevant viewports, switching between 1 and ctx->Const.MaxViewports if gl_ViewportIndex is written. A new BRW_NEW_VIEWPORT_COUNT flag tracks this. This could mean re-emitting viewport state when switching, but hopefully this is offset by doing 1/16th of the work in the common case. The new flag is also lighter weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case. According to Eric Anholt, x11perf -copypixwin10 performance improves by 11.5094% +/- 3.10841% (n=10) on his Skylake. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: get rid of duplicated values from gen_device_infoLionel Landwerlin2016-09-231-17/+0
| | | | | | | | Now that we have gen_device_info mutable, we can update its values and drop all copies we had in brw_context. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rename intelScreen to screen.Kenneth Graunke2016-09-201-2/+2
| | | | | | | | "intelScreen" is wordy and also doesn't fit our style guidelines. "screen" is shorter, which is nice, because we use it fairly often. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Track non-compressible sampling of renderbuffersTopi Pohjolainen2016-09-121-0/+10
| | | | | | | | | | | | | | | | | v3: - Actually set the flags when needed instead of falsely overwriting them (Jason). - Use more generic name for flag (dropped RENDERBUFFER) - Consult also shader images v4: - Consult only lossless compressd shader images v5: - Check the existence of renderbuffer before considering if it matches the given miptree Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Replace boolean rb surface state setup argument with flagsTopi Pohjolainen2016-09-121-1/+1
| | | | | | | | | | | And add plumbing to provide it all the way to surface state emitter. This is not used yet but will be in subsequent patches to carry additional constraints. v2 (Jason): Use uint32_t instead of int as the type Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* intel: Pull the guts of gen7_l3_state.c into a shared helperJason Ekstrand2016-09-031-2/+2
| | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-3/+3
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Move blorp into src/intel/blorpJason Ekstrand2016-08-291-1/+1
| | | | | | | | | At this point, blorp is completely driver agnostic and can be safely moved into its own folder. Soon, we hope to start using it for doing blits in the Vulkan driver. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Move the hiz_op enum to blorpJason Ekstrand2016-08-291-1/+1
| | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/gen6: Refactor gen6_upload_urbJason Ekstrand2016-08-291-0/+3
| | | | | | | This splits it into two functions very similar to gen7_upload_urb. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/blorp: Add a blorp_context struct and init/finish funcsJason Ekstrand2016-08-291-0/+3
| | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Embrace "unlimited" GTT mmap supportChris Wilson2016-08-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | From about kernel 4.9, GTT mmaps are virtually unlimited. A new parameter, I915_PARAM_MMAP_GTT_VERSION, is added to advertise the feature so query it and use it to avoid limiting tiled allocations to only fit within the mappable aperture. A couple of caveats: - fence support is still limited by stride to 262144 and the stride needs to be a multiple of tile_width (as before, and same limitation as the current 3D pipeline in hardware) - the max_gtt_map_object_size forcing untiled may be hiding a few bugs in handling of large objects, though none were spotted in piglits. See kernel commit 4cc6907501ed ("drm/i915: Add I915_PARAM_MMAP_GTT_VERSION to advertise unlimited mmaps"). v2: Include some commentary on mmap virtual space vs CPU addressable space. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* i965: Get rid of the do_lower_unnormalized_offsets passJason Ekstrand2016-07-221-1/+0
| | | | | | | | | We can do this in NIR now. No need to keep a GLSL pass lying around for it. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-dev@lists.freedesktop.org>
* compiler: Rename INTERP_QUALIFIER_* to INTERP_MODE_*.Kenneth Graunke2016-07-171-2/+2
| | | | | | | | | | | | | | | | | Likewise, rename the enum type to glsl_interp_mode. Beyond the GLSL front-end, talking about "interpolation modes" seems more natural than "interpolation qualifiers" - in the IR, we're removed from how exactly the source language specifies how to interpolate an input. Also, SPIR-V calls these "decorations" rather than "qualifiers". Generated by: $ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \ -e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \ -e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \; Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Dave Airlie <airlied@redhat.com>
* i965/context: Remove some unnecessary vfuncsJason Ekstrand2016-07-151-17/+0
| | | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Chad Versace <chad.versace@intel.com>
* i965: Use ISL for emitting buffer surface statesJason Ekstrand2016-07-151-8/+0
| | | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Chad Versace <chad.versace@intel.com>
* i965: Add an isl_device to the brw_contextJason Ekstrand2016-07-151-0/+4
| | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Chad Versace <chad.versace@intel.com>
* i965/urb: Allow blorp to record current settingsTopi Pohjolainen2016-07-041-10/+2
| | | | | | | | | | | | | This makes it possible to skip urb re-configuration if the subsequent renders agree with the settings. Also allows blorp to allocate the maximun amount of vs entries available. Core upload logic already knows how to calculate this. Helps one synthetic benchmark. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* glsl/mesa: split gl_shader in twoTimothy Arceri2016-06-301-4/+4
| | | | | | | | | | | | | | | | | There are two distinctly different uses of this struct. The first is to store GL shader objects. The second is to store information about a shader stage thats been linked. The two uses actually share few fields and there is clearly confusion about their use. For example the linked shaders map one to one with a program so can simply be destroyed along with the program. However previously we were calling reference counting on the linked shaders. We were also creating linked shaders with a name even though it is always 0 and called the driver version of the _mesa_new_shader() function unnecessarily for GL shader objects. Acked-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Move contents of brw_tex.c into intel_tex_validate.c.Kenneth Graunke2016-06-241-1/+1
| | | | | | | | | | | brw_tex.c is a tiny file containing a single function. It's closely tied to the validation logic in intel_tex_validate.c, so it makes sense to put both in the same file. While we're at it, update the function to our modern style. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Fix cross-primitive scratch corruption when changing the per-thread ↵Francisco Jerez2016-06-131-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | allocation. I haven't found any mention of this in the hardware docs, but experimentally what seems to be going on is that when the per-thread scratch slot size is changed between two pipelined draw calls, shader invocations using the old and new scratch size setting may end up being executed in parallel, causing their scratch offset calculations to be based in a different partitioning of the scratch space, which can cause their thread-local scratch space to overlap leading to cross-thread scratch corruption. I've been experimenting with alternative workarounds, like emitting a PIPE_CONTROL with DC flush and CS stall between draw (or dispatch compute) calls using different per-thread scratch allocation settings, or avoiding reuse of the scratch BO if the per-thread scratch allocation doesn't exactly match the original. Both seem to be as effective as this workaround, but they have potential performance implications, while this should be basically for free. Fixes over 40 failures in our CI system with spilling forced on (including CTS, dEQP and Piglit failures) on a number of different platforms from Gen4 to Gen9. The 'glsl-max-varyings' piglit test seems to be able to reproduce this bug consistently in the vertex shader on at least Gen4, Gen8 and Gen9 with spilling forced on. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Keep track of the per-thread scratch allocation in brw_stage_state.Francisco Jerez2016-06-131-0/+10
| | | | | | | | | | | | | | | | This will be used to find out what per-thread slot size a previously allocated scratch BO was used with in order to fix a hardware race condition without introducing additional stalls or memory allocations. Instead of calling brw_get_scratch_bo() manually from the various codegen functions, call a new helper function that keeps track of the per-thread scratch size and conditionally allocates a larger scratch BO. v2: Handle BO allocation manually instead of relying on brw_get_scratch_bo (Ken). Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Fix scratch overallocation if the original slot size was already a ↵Francisco Jerez2016-06-131-1/+1
| | | | | | | | | | | | | | power of two. The bitwise arithmetic trick used in brw_get_scratch_size() to clamp the scratch allocation to 1KB has the unintended side effect that it will cause us to allocate 2x the required amount of scratch space if the original per-thread scratch size happened to be already a power of two. Instead use the obvious MAX2 idiom to clamp the scratch allocation to the expected range. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Enable GL_KHR_robustnessKristian Høgsberg Kristensen2016-05-251-0/+2
| | | | | | | | | | | | | | | GL_KHR_robustness adds the GL_CONTEXT_LOST error and five new entry points that we already implement. This patch adds a new dispatch table that returns GL_CONTEXT_LOST from all entry points and implements the GL_LOSE_CONTEXT_ON_RESET strategy by setting that table when we learn that we've lost the context. With the GL_CONTEXT_LOST reporting in place and dispatch for the new entry points we can turn on GL_KHR_robustness. Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
* i965: Support textures with multiple planesKristian Høgsberg Kristensen2016-05-241-1/+1
| | | | Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Use ISL for surface format introspectionJason Ekstrand2016-05-231-2/+0
| | | | | With this, we can delete the surface format table in brw_surface_formats.c because all of the information we need is now in ISL.
* i965/draw: Use the real size for index buffersJason Ekstrand2016-05-231-0/+1
| | | | | | | Previously, we were using the size of the whole BO which may be substantially larger than the actual index buffer size. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/draw: Use the real size for vertex buffersJason Ekstrand2016-05-231-0/+1
| | | | | | | Previously, we were using the size of the BO which may be substantially larger than the actual vertex buffer size. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/draw: Account for BaseInstance in VBO boundsJason Ekstrand2016-05-231-0/+1
| | | | | Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/draw: Stop relying on min_index == -1 for invalid index boundsJason Ekstrand2016-05-231-0/+1
| | | | | | | | | | | The vbo layer passes an index_bounds_valid flag that we should be using instead. This also fixes a bug when min_index == -1 and basevertex != 0 where we were actually comparing min_index + basevertex == -1 which was false and we were getting the wrong buffer-sizing path. Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Send the minimal number of STATE_BASE_ADDRESS packets.Kenneth Graunke2016-05-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | STATE_BASE_ADDRESS stalls the whole pipeline, and the documentation cautions us to emit it as little as possible for better performance. We recently put some hacks in BLORP to try and avoid emitting it if it was already set correctly. However, this wasn't quite minimal: if BLORP is the first operation (i.e. glClear()), then it would emit it, and subsequent draw calls would emit it again. This caused a small drop in performance in GPUTest Triangle when switching from Meta to BLORP. Unlike most packets, STATE_BASE_ADDRESS isn't influenced by GL state: it needs to be emitted once per batch, before most other commands, or whenever we change the program cache BO. It's also valid in both the 3D and compute pipelines, which makes it even more unique. This patch removes it from the atom mechanism and instead directly calls it as part of every draw, compute dispatch, or BLORP operation. We introduce a new flag indicating that STATE_BASE_ADDRESS has already been emitted this batch, and if so, skip doing it again. When we make a new program cache BO, we simply reset the flag, so the next operation will emit it again. When we flush/reset the batch, we reset the flag. This guarantees that we'll emit STATE_BASE_ADDRESS only when we have to. It's also less code than the old atom mechanism. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Use blorp for all clearsJason Ekstrand2016-05-141-8/+0
| | | | | | | We used to use a meta path on gen8 but we haven't since c7cf17ae758. We might as well delete the meta path since blorp works on all gens. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Use blorp for all stencil blitsJason Ekstrand2016-05-141-8/+0
| | | | | | | We used to use a meta path because blorp didn't support 16x MSAA. Now it does, so we don't need the meta paths anymore. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Use blorp for all updownsample blitsJason Ekstrand2016-05-141-8/+0
| | | | | | | We used to use a meta path because blorp didn't support 16x MSAA. Now it does, so we don't need the meta paths anymore. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Move brw_get_rb_for_slice to brw_meta_utilJason Ekstrand2016-05-141-5/+0
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Reimplement ARB_transform_feedback2 on Haswell and later.Kenneth Graunke2016-05-091-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | My old implementation accumulated <start, end> pairs in a buffer, and eventually processed that data on the CPU. This meant flushing the batchbuffer and waiting for it to completely execute before we could map it, resulting in really long stalls. We could also run out of space in the buffer, and have to do this early. Instead, we can use Haswell's MI_MATH command to do the (end - start) subtraction, as well as the multiplication by 2 or 3 to convert from the number of primitives written to the number of vertices written. We still need to CS stall to read the counters, but otherwise everything is completely pipelined - there's no CPU<->GPU synchronization required. It also uses only 80 bytes in the buffer, no matter what. Improves performance in Manhattan on Skylake GT3e at 800x600 by 6.1086% +/- 0.954166% (n=9). At 1920x1080, improves performance by 2.82103% +/- 0.148596% (n=84). v2: Fix number of primitives -> number of vertices calculation for GL_TRIANGLES (I was multiplying by 4 instead of 3.) Caught by Jordan Justen. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Add a brw_load_register_reg64 helper.Kenneth Graunke2016-05-091-0/+2
| | | | | | | | | It appears that we can't do this in a single command (like we do for MI_LOAD_REGISTER_IMM) - the Skylake simulator gets rather grumpy about the command length if I try to combine them. No matter. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Implement ARB_query_buffer_object for HSW+Jordan Justen2016-05-041-0/+5
| | | | | | | | | | | | | | | v2: * Declare loop index variable at loop site (idr) * Make arrays of MI_MATH instructions 'static const' (idr) * Remove commented debug code (idr) * Updated comment in set_query_availability (Ken) * Replace switch with if/else in hsw_result_to_gpr0 (Ken) * Only divide GL_FRAGMENT_SHADER_INVOCATIONS_ARB by 4 on hsw and gen8 (Ken) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/gen6+: Add load register immediate helper functionsJordan Justen2016-05-041-0/+4
| | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/hsw+: Add support for copying a registerJordan Justen2016-05-041-0/+2
| | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen6+: Add support for storing immediate data into a bufferJordan Justen2016-05-041-0/+4
| | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add brw_store_register_mem32Jordan Justen2016-05-041-0/+2
| | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use offset instead of index in brw_store_register_mem64Jordan Justen2016-05-041-2/+2
| | | | | | | | | | This matches the byte based offset of brw_load_register_mem*. The function is also moved into intel_batchbuffer.c like brw_load_register_mem*. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Merge inst_info and opcode_desc tables.Matt Turner2016-05-031-7/+0
| | | | | | | I merged opcode_desc into inst_info (instead of the other way around) because inst_info was sorted by opcode number. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Introduce state flag for blorpTopi Pohjolainen2016-04-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | In the past, BLORP has clobbered all BRW_NEW_* state flags, to trigger re-emission of the entire 3D pipeline on the next draw. However, there are some packets BLORP simply leaves alone, so there's no need to re-emit them. Trying to reduce the set of dirty bits flagged after BLORP runs is tricky. Instead, we introduce a BRW_NEW_BLORP flag. This should be set on any atom which emits a packet that BLORP also emits. When BLORP runs, it will flag BRW_NEW_BLORP, causing those packets to get re-emitted. This also makes it easy to avoid re-emitting specific atoms - we can simply drop the BRW_NEW_BLORP flag on those. To start, we assume that all packets need to be re-emitted. This is the safest approach and closest to the existing code's behavior. Many of these are obviously not required, and can be dropped in subsequent patches. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/surface_state: Use libisl functions for image format loweringJason Ekstrand2016-04-211-2/+0
| | | | | | | This lets us delete some redundant code and keep all of the image_load_store format lowering logic in one place: libisl. Reviewed-by: Chad Versace <chad.versace@intel.com>
* i965/blorp: Re-introduce clear programsTopi Pohjolainen2016-04-211-2/+2
| | | | | | | This partially reverts 2f28a0dc23165123cf1e8b5942acad37878edd8a Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Allow texture surface state setup to be used by blorpTopi Pohjolainen2016-04-211-0/+1
| | | | | Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>