| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Just say no to:
- brw->vs.base.prog_data = &brw->vs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_vs_prog_data as needed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
|
|
|
|
|
|
|
|
| |
Now that we have gen_device_info mutable, we can update its values and drop
all copies we had in brw_context.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allocation.
I haven't found any mention of this in the hardware docs, but
experimentally what seems to be going on is that when the per-thread
scratch slot size is changed between two pipelined draw calls, shader
invocations using the old and new scratch size setting may end up
being executed in parallel, causing their scratch offset calculations
to be based in a different partitioning of the scratch space, which
can cause their thread-local scratch space to overlap leading to
cross-thread scratch corruption.
I've been experimenting with alternative workarounds, like emitting a
PIPE_CONTROL with DC flush and CS stall between draw (or dispatch
compute) calls using different per-thread scratch allocation settings,
or avoiding reuse of the scratch BO if the per-thread scratch
allocation doesn't exactly match the original. Both seem to be as
effective as this workaround, but they have potential performance
implications, while this should be basically for free.
Fixes over 40 failures in our CI system with spilling forced on
(including CTS, dEQP and Piglit failures) on a number of different
platforms from Gen4 to Gen9. The 'glsl-max-varyings' piglit test
seems to be able to reproduce this bug consistently in the vertex
shader on at least Gen4, Gen8 and Gen9 with spilling forced on.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, we only use ctx->NewDriverState.
I used this bash & sed command in the i965 directory:
for file in *.[ch] *.[ch]pp; do
sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file
done
Followed by manual changes to brw_state_upload.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
These stopped being necessary in commit ab973403e445cd8211dba4e87e0.
v2: Update commit message with a better explanation (thanks to Eric
Anholt for doing the git archaeology).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We use IEEE mode for GLSL programs, but need to use ALT mode for ARB
programs so that 0^0 == 1. The choice is based entirely on the shader
source language.
Previously, our code to determine which mode we wanted was duplicated
in 8 different places (VS and FS for Gen4-5, Gen6, Gen7, and Gen8).
The ctx->_Shader->CurrentProgram[stage] == NULL check was confusing
as well - we use CurrentProgram (non-derived state), but _Shader
(derived state). It also relies on knowing that ARB programs don't
use gl_shader_program structures today. The compiler already makes
this assumption in a few places, but I'd rather keep that assumption
out of the state upload code.
With this patch, we select the mode at compile time, and store that
choice in prog_data. The state upload code simply uses that decision.
This eliminates a BRW_NEW_*_PROGRAM dependency in the state upload code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit c0347705 changed the Gen6-7 code to use ctx->_Shader rather than
ctx->Shader, but neglected to change the Gen4-5 or Gen8+ code.
This might fix SSO related bugs, but ALT mode is only used for ARB
programs, so if there's an actual problem, it's likely no one would
run into it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I put the BRW_NEW_*_PROG_DATA flags at the beginning so that
brw_state_cache.c can still continue using 1 << brw_cache_id.
I also added a comment explaining the difference between
BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time
to remember it.
Non-mechanical changes:
- brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache.
- brw_state_upload.c - INTEL_DEBUG=state changes.
- brw_context.h - bit definition merging.
v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention
state-based recompiles, and nix the "proper subset" claim,
as it's false. (Caught by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only
ones that are left are legitimately related to the program cache. Yet,
it seems a bit wasteful to have an entire bitfield for only 7 bits.
State upload is one of the hottest paths in the driver. For each atom
in the list, we call check_state() to see if it needs to be emitted.
Currently, this involves comparing three separate bitfields (mesa, brw,
and cache). Consolidating the brw and cache bitfields would save a
small amount of CPU overhead per atom. Broadwell, for example, has
57 state atoms, so this small savings can add up.
CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the
offset into the program cache BO (prog_offset). Since most uses refer
to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name.
Removing "cache" completely is a bit painful, so I decided to do it in
several patches for easier review, and to separate mechanical changes
from manual ones. This one simply renames things, and was made via:
$ for file in *.[ch]; do
sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \
-e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file
done
Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw!
The next patch will remedy this flaw. It will also fix the
alphabetization issues.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This flag signifies that we've emitted a new SAMPLER_STATE table.
Given that we haven't cached those in years, CACHE_NEW_SAMPLER isn't
a great name. Putting it in the BRW_NEW_* hierarchy would make more
sense; BRW_NEW_SAMPLER_STATE_TABLE better reflects its actual purpose.
When this flag is raised, the pointer to the SAMPLER_STATE table has
changed, so we need to re-issue any packets which point to it (unit
state on Gen4-5, 3DSTATE_SAMPLER_STATE_POINTERS on Gen6, and the
per-stage variants on Gen7+).
Saves 2 * sizeof(void *) bytes per context, as we remove useless
aux_compare/aux_free function pointers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Marking brw_stage_state::sampler_count as CACHE_NEW_SAMPLER is wrong.
The number of samplers used by each program is actually computed at
draw time (brw_try_draw_prims), based purely on the currently bound
shader programs (gl_program::SamplersUsed).
CACHE_NEW_SAMPLER means that we've emitted a new SAMPLER_STATE table.
Although this could indicate that the number of samplers has changed,
it could also simply mean that the contents of the table has changed
(i.e. we've bound different textures).
The real reason these atoms depend on CACHE_NEW_SAMPLER is because they
include a pointer to the SAMPLER_STATE table. This was not commented.
So, move the comments to the appropriate place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Gen4-5, unit state is specified as indirect state, rather than
commands. If any unit state changes, we upload it via brw_state_batch
and arrange for 3DSTATE_PIPELINED_POINTERS to be re-emitted, which
updates pointers to all unit state at once.
Since there's only one command and state atom (brw_psp_urb_cs) that
needs to know about this, there's no benefit to having six separate
flags. We can combine CACHE_NEW_*_UNIT into a single flag.
We also haven't cached these in a long time, so it doesn't make sense
to use the "CACHE_NEW_" prefix. Instead, use the "BRW_NEW_" prefix.
This also saves 12 * sizeof(void *) bytes of memory per context, as
we remove useless aux_compare/aux_free functions for each CACHE bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the dirty flags were listed in some arbitrary order. Some used
bonus parenthesis. Some put multiple flags on one line, others put one
per line. Some used tabs instead of spaces...but only on some lines.
This patch settles on one flag per line, in alphabetical order, using
spaces instead of tabs, and sheds the unnecessary parentheses.
Sorting was mostly done with vim's visual block feature and !sort,
although I alphabetized short lists by hand; it was pretty manual.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reverts
* "i965: Modify state upload to allow 2 different sets of state atoms."
8e27a4d2b3e4e74e9a77446bce49607433d86be3
* "i965: Modify dirty bit handling to support 2 pipelines."
373143ed9187c4d4ce1e3c486b5dd0880d18ec8b
* "i965: Create a macro for checking a dirty bit."
c5bdf9be1eca190417998d548fd140c1eca37a54
Conflicts:
src/mesa/drivers/dri/i965/brw_context.h
* "i965: Create a macro for setting all dirty bits."
6f56e1424d923fd80c84090fbf4506c9eaaffea1
Conflicts:
src/mesa/drivers/dri/i965/brw_blorp.cpp
src/mesa/drivers/dri/i965/brw_state_cache.c
src/mesa/drivers/dri/i965/brw_state_upload.c
* "i965: Create a macro for setting a dirty bit."
88e3d404dad009d8cff5124cf8acee7daeaceb64
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
| |
All shader stages have these fields, so it makes sense to store them in
the common base structure, rather than duplicating them in each.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
| |
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
| |
I wanted to access this value from stage-generic code, so stop storing it
under two different names.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
| |
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
array.
These are replaced with
ctx->Shader.CurrentProgram[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}].
In patches to follow, this will allow us to replace a lot of ad-hoc
logic with a variable index into the array.
With the exception of the changes to mtypes.h, this patch was
generated entirely by the command:
find src -type f '(' -iname '*.c' -o -iname '*.cpp' ')' \
-print0 | xargs -0 sed -i \
-e 's/\.CurrentVertexProgram/.CurrentProgram[MESA_SHADER_VERTEX]/g' \
-e 's/\.CurrentGeometryProgram/.CurrentProgram[MESA_SHADER_GEOMETRY]/g' \
-e 's/\.CurrentFragmentProgram/.CurrentProgram[MESA_SHADER_FRAGMENT]/g'
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to
replace the old 'unsigned long offset' field. To preserve ABI, libdrm
continues to store the presumed offset in both locations.
On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses.
However, with a 32-bit userspace, the 'unsigned long offset' field will
only be 32-bit, which is not large enough to hold this value. We need
to use a proper uint64_t (like the kernel does).
Technically, a lot of this code doesn't affect Broadwell, so we could
leave it using the old field. But it makes sense to just switch to the
new, properly typed field.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the
old copyright name is creating unnecessary confusion, hence this change.
This was the sed script I used:
$ cat tg2vmw.sed
# Run as:
#
# git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed
#
# Rename copyrights
s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g
/Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./
s/TUNGSTEN GRAPHICS/VMWARE/g
# Rename emails
s/alanh@tungstengraphics.com/alanh@vmware.com/
s/jens@tungstengraphics.com/jowen@vmware.com/g
s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/
s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g
s/keithw\?@tungstengraphics.com/keithw@vmware.com/g
s/michel@tungstengraphics.com/daenzer@vmware.com/g
s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/
s/zack@tungstengraphics.com/zackr@vmware.com/
# Remove dead links
s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g
# C string src/gallium/state_trackers/vega/api_misc.c
s/"Tungsten Graphics, Inc"/"VMware, Inc"/
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
|
|
|
|
|
| |
Performed via:
$ for file in *; do sed -i 's/ *//g'; done
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
Before the series with 3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 to
dynamically assign our binding table indices, we didn't really track our
binding table count per shader, so we never filled in these fields.
Affects cairo-gl trace runtime by -2.47953% +/- 1.07281% (n=20)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
| |
This paves the way for sharing the code that will set up the vertex
and geometry shader pipeline state.
v2: Rename the base class to brw_stage_state.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both 3DSTATE_VS and 3DSTATE_GS have a dispatch_grf_start_reg control,
which determines the register where the hardware delivers data sourced
from the URB (push constants followed by per-vertex input data).
For vertex shaders, we always set dispatch_grf_start_reg to 1, since
R1 is always the first register available for push constants in vertex
shaders.
For geometry shaders, we'll need the flexibility to set
dispatch_grf_start_reg to different values depending on the behvaiour
of the geometry shader; if it accesses gl_PrimitiveIDIn, we'll need to
set it to 2 to allow the primitive ID to be delivered to the thread in
R1.
This patch eliminates the assumption that dispatch_grf_start_reg is
always 1. In vec4_visitor, we record the regnum that was passed to
vec4_visitor::setup_uniforms() in prog_data for later use. In
vec4_generator, we consult this value when converting an abstract
UNIFORM register to a concrete hardware register. And in the code
that emits 3DSTATE_VS, we set dispatch_grf_start_reg based on the
value recorded in prog_data.
This will allow us to set dispatch_grf_start_reg to the appropriate
value when compiling geometry shaders. Vertex shaders will continue
to always use a dispatch_grf_start_reg of 1.
v2: Make dispatch_grf_start_reg "unsigned" rather than "GLuint".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also upload separate sampler default/texture border color entries.
At the moment, this is completely idiotic: both tables contain exactly
the same contents, so we're simply wasting batch space and CPU time.
However, soon we'll only upload data for textures actually /used/ in
a particular stage, which will usually make the VS table empty and
very likely eliminate all redundancy. This is just a stepping stone.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we only have a single sampler state table shared among all
stages, so we just copy wm.sampler_count into vs.sampler_count.
In the future, each shader stage will have its own SAMPLER_STATE table,
at which point we'll need these separate sampler counts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
Fixes broken filter and lod selection for vertex texturing.
(txs/txf only worked properly because they ignore the sampler state
completely)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
|
|
|
|
|
|
|
| |
Fixes isinf(), isnan() from GLSL 1.30
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
Most functions no longer use intel_context, so this patch additionally
removes the local "intel" variables to avoid compiler warnings.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow the generic parts to be re-used for geometry shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Put urb_read_length and urb_entry_size in the generic struct.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
| |
Only the old backend used it.
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
| |
The code forces single program flow to be enabled on Ironlake, or
equivalently, disables multiple program flow. The comment was reversed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These fields control how many entries the hardware prefetches into the
state cache, so they only impact performance, not correctness. However,
it's not clear how to use this in a way that's beneficial.
According to the documentation, kernels "using a large number" of
entries may wish to program this to zero to avoid thrashing the cache;
it's unclear how many is too many. Also, Ironlake's WM was missing this
feature entirely---the count had to be zero.
The dirty bit tracking to handle this complicates the surface state
and binding table setup; removing it should simplify things and make
future refactoring easier. So just set 0 for the number of entries
rather than trying to compute and track it.
Appears to have no impact on Nexuiz and OpenArena on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|
|
|
|
|
|
|
| |
It is only needed in time for brw_psp_urb_cbs(), which is also an emit().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
|
|
|
|
|
|
|
|
|
| |
The inconsistency between vs_max_threads and max_vs_entries was rather
annoying. I could never seem to remember which one was reversed, which
made it harder to find quickly. "Max __ Threads" seems more natural.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In pre-GEN6, when using clip planes, both the vertex shader and the
clipper need access to the client-supplied clip planes, since the
vertex shader needs them to set the clip flags, and the clipper needs
them to determine where to insert new vertices.
With the old VS backend, we used a clever optimization to avoid
placing duplicate copies of these planes in the CURBE: we used the
same block of memory for both the clipper and vertex shader constants,
with the clip planes at the front of it, and then we instructed the
clipper to read just the initial part of this block containing the
clip planes.
This optimization was tricky, of dubious value, and not completely
working in the new VS backend, so I've removed it. Now, when using
the new VS backend, separate parts of the CURBE are used for the
clipper and the vertex shader. Note that this doesn't affect the
number of push constants available to the vertex shader, it simply
causes the CURBE to occupy a few more bytes of URB memory.
The old VS backend is unaffected. GEN6+, which does clipping entirely
in hardware, is also unaffected.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
| |
We were failing to relocate, so on the first draw run our scratch
would tend to get written to 0x0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
| |
|
|
|
|
|
|
|
|
|
| |
I want to make brw_state_dump.c handle more than just the last
statechange, so I want to keep track of what's in the batch state. By
using AUB file numbering for most of these packets, this may be
reusable for aub dumping.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
This was tricky. We were doing a use-before-initialize of
grf_reg_count, but the value usually got overwritten anyway -- when we
didn't have to do a relocation (typical), or on gen5 when we didn't
have relocations at all.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38771
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
| |
We're streaming VS state out now, not caching it.
|
|
|
|
|
|
|
|
|
|
| |
There will be a little bit of thrashing of the program cache BO as the
cache warms up, but once the application is in steady state, this
reduces relocations on gen5 and later.
On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
+/- 1.3% (n=6). No statistically significant performance difference
on nexuiz (n=5).
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an awful hack and will hurt performance on Ironlake, but we're
at a loss as to what's going wrong otherwise. This is the only common
variable we've found that avoids the problem on 4 applications
(CelShading, gnome-shell, Pill Popper, and my GLSL demo), while other
variables we've tried appear to only be confounding. Neither the
specifications nor the hardware team have been able to provide any
enlightenment, despite much searching.
https://bugs.freedesktop.org/show_bug.cgi?id=29172
Tested by: Chris Lord <chris@linux.intel.com> (Pill Popper)
Tested by: Ryan Lortie <desrt@desrt.ca> (gnome-shell)
|
|
|
|
|
|
|
| |
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
|
| |
|