| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Similar to brw_context(...), intel_texture_object(...), and so on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dispatch.
The eliminate_find_live_channel optimization eliminates
FIND_LIVE_CHANNEL instructions in cases where control flow is known to
be uniform, and replaces them with 'MOV 0', which in turn unblocks
subsequent elimination of the BROADCAST instruction frequently used on
the result of FIND_LIVE_CHANNEL. This is however not correct in
per-sample fragment shader dispatch because the PSD can dispatch a
fully unlit sample under certain conditions. Disable the optimization
in that case.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Add devinfo argument to brw_stage_has_packed_dispatch() to
implement hardware generation check.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generated by:
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
| |
The first thing to go in this new library is brw_device_info.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately due to the inconsistent meaning of some surface state
structure fields, we cannot re-use the same binding table entries for
sampling from and rendering into the same set of render buffers, so we
need to allocate a separate binding table block specifically for
render target reads if the non-coherent path is in use.
The slight noise is due to the change of
brw_assign_common_binding_table_offsets to return the next available
binding table index rather than void.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the following changes in this series are specific to the
non-coherent path, so I need some way to tell whether the coherent or
non-coherent path is in use. The flag defaults to the value of the
gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be
overridden easily on hardware that supports both framebuffer fetch
extensions in order to test the non-coherent path, like:
MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch
(Of course trying to force-enable the coherent framebuffer fetch
extension on hardware without native support won't work and lead to
assertion failures).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes several GL44-CTS.tessellation_shader (and GL45 and ES31) subcases:
- vertex_spacing
- tessellation_shader_point_mode.points_verification
- tessellation_shader_quads_tessellation.inner_tessellation_level_rounding
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the Vulkan driver, we have the generation number (a compile time
constant) but not necessarily the brw_device_info struct. I meant
to rework the function to take a generation number instead of a
brw_device_info pointer to accomodate this. But I forgot, and left
it taking a brw_device_info pointer, while making Vulkan pass the
generation number (8, 9, ...) directly. This led to crashes.
Brown paper bag fix for commit 87d062a94080373995170f51063a9649.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96504
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Skylake changes the representation of shared local memory size:
Size | 0 kB | 1 kB | 2 kB | 4 kB | 8 kB | 16 kB | 32 kB | 64 kB |
-------------------------------------------------------------------
Gen7-8 | 0 | none | none | 1 | 2 | 4 | 8 | 16 |
-------------------------------------------------------------------
Gen9+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
The old formula would substantially underallocate the amount of space.
This fixes GPU hangs on Skylake when running with full thread counts.
v2: Fix the Vulkan driver too, use a helper function, and fix the table
in the comments and commit message.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.
The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.
When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.
The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.
For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.
v4:
* Support older local ID push constant layout as well. (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
|
|
|
|
| |
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
v4:
* Force thread_local_id_index to -1 for now, and have
fs_visitor::setup_cs_payload look at thread_local_id_index. This
enables us to more easily cut over from the old local ID layout to
the new layout, as suggested by Jason.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
|
|
| |
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
| |
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we handle flipping and other gl_FragCoord transformations
via a uniform, these key fields have no users.
This patch actually eliminates the associated recompiles. The Tomb
Raider benchmark's minimum FPS increases from ~1 FPS to a reasonable
number.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
| |
This allows us to disable spilling for blorp shaders since blorp state
setup doesn't handle spilling. Without this, blorp fails hard if you run
with INTEL_DEBUG=spill.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Francisco Jerez <currojerez@riseup.net>
|
|
|
|
|
|
|
|
|
|
| |
When computing where the first non-payload GRF starts, we can't rely on
the number of attributes, as each attribute can be using 1 or 2 slots
depending on whether they are a dvec3/4 or other.
Instead, we need to use the number of slots used by the attributes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
The hardware packets organize kernel pointers and GRF start by slots that
don't map directly to dispatch width. This means that all of the state
setup code has to re-arrange the data from prog_data into these slots.
This logic has been duplicated 4 times in the GL driver and one more time
in the Vulkan driver. Let's just put it all in brw_fs.cpp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit reworks and simplifies the way we handle persample shading in
the shader key and prog_data. The previous approach had three different
key bits that had slightly different and hard-to-decern meanings while the
new bits are far more clear. This commit changes it to two easily
understood bits that communicate everything we need:
1) key->persample_interp: means that the user has requested persample
interpolation through the API. This is equivalent to having
SAMPLE_SHADING enabled and having MIN_SAMPLE_SHADING_VALUE set high
enough that you actually get multiple per-sample invocations.
2) key->multisample_fbo: means that the shader will be running on an
actual multi-sampled framebuffer.
This commit also adds a new "persample_dispatch" bit to prog_data which
indicates that the shader should be run in persample mode. This way the
state setup code doesn't have to look at the fragment program or GL state
and can just pull that data out of the prog_data.
In theory, this shuffle could mean more recompiles. However, in practice,
we were shoving enough state into the key before that we were probably
hitting a recompile on every per-sample shader anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
| |
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I'm going to need a key entry meaning "we have a multisample FBO,
and multisampling is enabled" in an upcoming patch. This is basically
wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID
system value is read.
The only use of wm_key->compute_sample_id is in emit_sampleid_setup(),
which is only called when handling the SAMPLE_ID system value. So we
can just eliminate the check and generalize the field.
v2: Also update the Vulkan driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
This was only used by the old gl_SampleID calculations. The new code
doesn't need to handle 2x specially.
v2: Delete it from the Vulkan driver, too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
| |
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SIN and COS instructions on Intel hardware can produce values
slightly outside of the [-1.0, 1.0] range for a small set of values.
Obviously, this can break everyone's expectations about trig functions.
According to an internal presentation, the COS instruction can produce
a value up to 1.000027 for inputs in the range (0.08296, 0.09888). One
suggested workaround is to multiply by 0.99997, scaling down the
amplitude slightly. Apparently this also minimizes the error function,
reducing the maximum error from 0.00006 to about 0.00003.
When enabled, fixes 16 dEQP precision tests
dEQP-GLES31.functional.shaders.builtin_functions.precision.
{cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.
at the cost of making every sin and cos call more expensive (about
twice the number of cycles on recent hardware). Enabling this
option has been shown to reduce GPUTest Volplosion performance by
about 10%.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
We really need to stop pulling information directly out of shaders for
state setup. For one thing, if we want any sort of an on-disk shader
cache, having all of this metadata in one place is going to be crucial.
Also, passing it all through prog_data cleans up the compiler <-> state
setup API substantially.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
| |
A future patch will want to use designated initalizers, which aren't
available in C++, but this is C.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenGL's dual color blending feature was specified so that an
implementation could support both multiple render targets (MRT) and
dual source blending. Fragment shader outputs specify both "location"
(the render target number) and "index" (either color 0 or 1).
I believe DirectX only has the notion of "location" - if using dual
color blending, location 0 or 1 will specify the operands. If not,
then location means the render target index. The two features can't
be used together.
As such, some applications mistakenly try to use <loc = 0, index = 0>
and <loc = 1, index = 0> in a shader used for dual color blending with
a single render target, rather than the correct <loc = 0, index = 0>
and <loc = 0, index = 1>.
In particular, Unigine Heaven 4.0 and Valley 1.0 suffer from this bug.
Unigine is aware of the problem, and quickly developed a fix, but has
not bothered to change the download link on their website to a working
copy in over a year. People were still using the broken version and
complaining. We tried working around this by disabling dual color
blending, but that apparently hurts performance, and people were once
again unhappy.
On i965, dual source blending is achieved by using different framebuffer
write messages than normal rendering. So, we have to compile different
code for the two cases. We're not being pedantic: we actually have to
know in order to function.
Normally, dual source blending is detectable in the shader: if a shader
has an output with index = 1, then it's meant for blending, not MRT.
With the broken inputs, they're indistinguishable, so we can only tell
by looking at the current GL state.
This patch implements a new drirc workaround:
export dual_color_blend_by_location=true
which makes the i965 driver detect when OpenGL state is configured for
dual source blending, and recompile the fragment shader to use the right
messages. In that case, we allow either location = 1 or index = 1 to
specify the second source for the blending equations.
It also re-enables GL_ARB_blend_func_extended for Unigine.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
|
|
|
|
|
|
|
|
|
|
| |
We have to break open a new vec4 for gl_DrawIDARB. We've used up all
space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its
own separate vertex buffer anyway. This is because we point the vb for
base vertex and base instance into the draw parameter BO for indirect
draw calls, but the draw id is generated by mesa in a different buffer.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
| |
We already have gl_BaseVertexARB in the .x component of the SGVS vec4
and plug gl_BaseInstanceARB into the last free component (.y).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GL_ARB_separate_shader_objects allows the application to mix-and-match
TCS and TES programs separately. This means that the interface between
the two stages isn't known until the final SSO pipeline is in place.
This isn't a great match for our hardware: the TCS and TES have to agree
on the Patch URB entry layout. Since we store data as per-patch slots
followed by per-vertex slots, changing the number of per-patch slots can
significantly alter the layout. This can easily happen with SSO.
To handle this, we store the [Patch]OutputsWritten and [Patch]InputsRead
bitfields in the TCS/TES program keys, introducing program recompiles.
brw_upload_programs() decides the layout for both TCS and TES, and
passes it to brw_upload_tcs/tes(), which store it in the key.
When creating the NIR for a shader specialization, we override
nir->info.inputs_read (and friends) to the program key's values.
Since everything uses those, no further compiler changes are needed.
This also replaces the hack in brw_create_nir().
To avoid recompiles, brw_precompile_tes() looks to see if there's a
TCS in the linked shader. If so, it accounts for the TCS outputs,
just as brw_upload_programs() would. This eliminates all recompiles
in the non-SSO case. In the SSO case, there should only be recompiles
when using a TCS and TES that have different input/output interfaces.
Fixes Piglit's mix-and-match-tcs-tes test.
v2: Pull the brw_upload_programs code into a brw_upload_tess_programs()
helper function (requested by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
When the application hasn't supplied a TCS, and we have to create one,
we need to know what VS outputs to copy to TES inputs.
To do this, we create a new program key field, and set it to the TES
InputsRead bitfield.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware provides us no decent way of getting at the number of input
vertices in the patch topology from the tessellation control shader.
It's actually very surprising - normally this sort of information would
be available in the thread payload.
For the precompile, we guess that the number of vertices will be the
same for both the input and output patches. This usually seems to be
the case.
On Gen8+, we could pass in an extra push constant containing this value.
We may be able to do that on Haswell too. It's quite a bit trickier on
Ivybridge, however.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TCS is the first tessellation shader stage, and the most
complicated. It has access to each of the control points in the input
patch, and computes a new output patch. There is one logical invocation
per output control point; all invocations run in parallel, and can
communicate by reading and writing output variables.
One of the main responsibilities of the TCS is to write the special
gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which
control how much new geometry the hardware tessellation engine will
produce. Otherwise, it simply writes outputs that are passed along
to the TES.
We run in SIMD4x2 mode, handling two logical invocations per EU thread.
The hardware doesn't properly manage the dispatch mask for us; it always
initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block
to handle an odd number of invocations, essentially falling back to
SIMD4x1 on the last thread.
v2: Update comments (requested by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The TES is essentially a post-tessellator VS, which has access to the
entire TCS output patch, and a special gl_TessCoord input. Otherwise,
they're very straightforward.
This patch implements SIMD8 tessellation evaluation shaders for Gen8+.
The tessellator can generate a lot of geometry, so operating in SIMD8
mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only
2 vertices per thread). I have another patch which implements SIMD4x2
mode for older hardware (or via an environment variable override).
We currently handle all inputs via the pull model.
v2: Improve comments (suggested by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
| |
Based on a patch by Chris Forbes, but largely rewritten by Ken.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
| |
This is a helper function for setting up the local invocation ID
payload according to the cs_prog_data generated by the compiler. It's
intended to be available to users of libi965_compiler so move it there.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This calculates a rather conservative partitioning of the L3 cache
based on the shaders currently bound to the pipeline and whether they
use SLM, atomics, images or scratch space. The result is intended to
be fine-tuned later on based on other pipeline state.
Note that the L3 partitioning calculated for VLV in the non-SLM non-DC
case differs from the hardware defaults in that it doesn't include a
DC partition and has twice as much RO cache space -- This is an
intentional functional change that improves performance in several
bandwidth-bound benchmarks on VLV (5% significance): SynMark
OglTexFilterAniso by 14.18%, SynMark OglTexFilterTri by 7.15%, Unigine
Heaven by 4.91%, SynMark OglShMapPcf by 2.15%, GpuTest Fur by 1.83%,
SynMark OglDrvRes by 1.80%, SynMark OglVsTangent by 1.71%, and a few
other benchmarks from the Finnish system by less than 1%.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
|
|
|
|
|
|
| |
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
3DSTATE_TE has partitioning, output topology, and domain fields,
each of which has several enumerated values. We'll also need to
switch on the domain, so enums (rather than #defines) seem like a
natural fit.
I chose to put these in brw_compiler.h because they'll be stored
in struct brw_tes_prog_data, which will live there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is needed for the FILE * type in brw_print_vue_map().
Apparently, all files that include brw_compiler.h already pick this up
via some include chain, so this isn't actually a build fix. However,
I have patches which introduce new consumers of brw_compiler.h that
fail to build because of the missing #include.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
|
|
|
|
|
|
|
|
|
| |
I was going to add scalar_tcs and scalar_tes flags, and then thought
better of it and decided to convert this to an array. Simpler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've been carrying around a patch to do this for the last few months,
and it's been exceedingly useful for debugging GS and tessellation
problems. I've caught lots of bugs by inspecting the interface
expectations of two adjacent stages.
It's not that much spam, so I figure we may as well just print it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwhile to avoid using this instruction for
the other sample counts. In order to do that this patch adds a new
member to brw_sampler_prog_key_data to track when a sampler refers to
a buffer with 16 samples.
Note that this isn't done for the vec4 backend because it wouldn't
change how many registers it uses.
Acked-by: Ben Widawsky <ben@bwidawsk.net>
|
|
|
|
|
|
|
| |
I changed this from COUNT to PAD in commit 268008f98c3810b9f276df985dc93ef.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
|
|
|
|
|
|
|
| |
It doesn't modify it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: remove useless source_stencil_to_render_target (Ken)
Squash in the actual packing function, which also got to
v2:
Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt)
Reorder the regioning to be in VWH order (Matt)
Don't retype src in the backend, just assert instead (Matt)
Rename the debug prints to something better (Matt)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|