| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
The maximum number of atomic buffer objects is somewhat arbitrary, we
can change it in the future easily if it turns out it's not enough...
v2: Add comments with the relevant mesa dirty bits. Fix usage of
BRW_NEW_UNIFORM_BUFFER in the GS ABO state atom.
v3: Update binding table layout diagrams.
v4: Resolve conflicts with the recent dynamic surface index assignment changes.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
On DOTA2, framerate on dota2-de1.dem in windowed mode on my laptop
improves by 7.69854% +/- 0.909163% (n=3). In a microbenchmark hitting
this code path (wall time of piglit vbo-subdata-many), runtime decreases
from 0.8 to 0.05 seconds.
v2: Use out of range start/end instead of separate bool for the active
flag (suggestion by Jordan), fix double-upload in the stalling path.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Supporting this extension turns out to simplify our code a bit over not
supporting this extension, once the glBufferSubData() synchronization code
lands.
v2: Use 16 byte alignment like we do for uniform buffers, due to unaligned
access penalties.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
|
|
|
|
|
|
|
|
|
|
|
| |
If glBufferData(), glBufferSubData(0, obj->Size), or similar happens, we
get a new drm_intel_bo for the buffer object, and thus need to re-upload
texture buffer state so we point at the new data.
Fixes the new piglit GL_ARB_texture_buffer_object/data-sync
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
It would be nice to be able to pack our binding table so that programs
that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1
binding table entries. To do that, we need the compiled program to have
information on where its surfaces go.
v2: Rename size to size_bytes to be more explicit.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to use a different surface format for gather4, which is
required for R32G32_FLOAT to work on Gen7.
V4: - Only emit alternate surface state for shaders which will actually
use it.
- Pass a simple 'for_gather' flag rather than a function pointer.
The callee can decide what w/a to apply.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has no effect currently, because intel_finalize_mipmap_tree() always
makes mt->first_level == tObj->BaseLevel.
The change I made before to handle it
(b1080cfbdb0a084122fcd662cd27b4748c5598fd) got very close to working, but
after fixing some unrelated bugs in the series, it still left
tex-miplevel-selection producing errors when testing textureLod(). The
problem is that for explicit LODs, the sampler's LOD clamping is ignored,
and only the surface's MIP clamping is respected. So we need to use
surface mip clamping, which applies on top of the sampler's mip clamping,
so the sampler change gets backed out.
Now actually tested with a non-regressing series producing a non-zero
computed baselevel.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
We know that the object's mt is equal to the firstimage's mt because it's
gone through intel_finalize_mipmap_tree(). Saves a lookup of firstimage
on pre-gen7.
v2: Merge in the warning fix that appeared later in the series (noted by
Chad)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was an embarassingly large amount of copy and pasted code,
and it wasn't particularly simple code either. By factoring it out
into a helper function, we consolidate the complexity.
v2: Properly NULL-check bo. Caught by Eric Anholt.
v3: Do the subtraction by 1 in gen7_emit_buffer_surface_state, rather
than making callers do it. This makes the buffer_size parameter
the actual size of the buffer. Suggested by Paul Berry.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
|
| |
The value that's split into width/height/depth needs to be the size of
the buffer minus one. This makes it consistent with the constant buffer
and shader time SURFACE_STATE setup code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code to upload the binding tables for each stage was scattered
across brw_{vs,gs,wm}_surface_state.c and brw_misc_state.c, which also
contain a lot of code to populate individual SURFACE_STATE structures.
This patch brings all the binding table upload code together, and splits
it out from the code which fills in SURFACE_STATE entries.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
|
|
|
| |
This is not quite the same: brw_upload_binding_table() also has code to
early-return if there are no entries, while the existing code did not.
The PS binding table is unlikely to be empty since it will have at least
one color buffer. If it ever is empty, early returning seems wise.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
|
|
| |
Specifying a miptree layout makes no sense for constant buffers.
This has no functional change since BRW_SURFACE_MIPMAPLAYOUT_BELOW is
just a #define for 0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|
|
|
|
|
|
|
|
| |
This gets the VS, GS, and PS all using the same data structure.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
| |
There is a slight functionality change. Previously we would compute a
common value for num_samplers for all stages, and populate that many
entries in each stage's surf_offset table regardless of how many
samplers each stage used. Now we only populate the number of entries
in the surf_offset table corresponding to the number of samplers
actually used by the stage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously these functions would accept a pointer to the binding table
and an index indicating which entry in the binding table should be
updated. Now they merely take a pointer to the binding table entry to
be updated.
This will make it easier to generalize brw_texture_surfaces to support
geometry shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
| |
This paves the way for sharing the code that will set up the vertex
and geometry shader pipeline state.
v2: Rename the base class to brw_stage_state.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Defines that previously referred to VS now refer to VEC4, since they
will be shared by the user-programmable vertex shader and geometry
shader stages.
Defines that previously referred to the Gen6 geometry shader stage
(which is only used for transform feedback) are now renamed to
explicitly refer to Gen6, to avoid confusion with the Gen7
user-programmable geometry shader stage.
Based on work by Eric Anholt <eric@anholt.net>.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
|
|
|
|
|
|
|
|
| |
Thanks to Ken for trawling through my neglected public branches and
finding the bug in this change (inside a megacommit) that made me abandon
this work.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Computing the minimum size was easy, and done at compile-time for no
extra overhead here. Making the binding table smaller wastes less batch
space.
Adding the CACHE_NEW_WM_PROG dirty bit isn't strictly necessary, since
other atoms depend on it and flag BRW_NEW_SURFACES. However, it's best
to add it for clarity and safety. It shouldn't add any new overhead.
v2: Use binding_table_size, rather than max_surface_index.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SURF_INDEX_DRAW() has been the identity function since the dawn of time,
and both the shader code and binding table upload code relied on that,
simply using X rather than SURF_INDEX_DRAW(X).
Even if that continues to be true, using the macro clarifies the code.
The comment about draw buffers needing to be first in order for
headerless render target writes to work turned out to be wrong; with
this change, SURF_INDEX_DRAW can be changed to arbitrary indices and
everything continues working.
The confusion was over the "Render Target Index" field in the FB write
message header. If it were a binding table index, then RT 0 would have
to be at index 0 for headerless FB writes to work. However, it's
actually an index into the blend state table, so there's no problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Paul Berry <stereotype441@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Presumably, this comment exists to justify the usage of
I915_GEM_DOMAIN_SAMPLER for this relocation. At one point, this was
necessary to ensure that the right flushing was done to keep caches
coherent. These days, the kernel just flushes everything, so I don't
think it matters.
Still, the comment is interesting, so leave it in place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
| |
This makes brw_context inherit directly from gl_context; that was the
only thing left in intel_context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
Most functions no longer use intel_context, so this patch additionally
removes the local "intel" variables to avoid compiler warnings.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes brw_context available in every function that used
intel_context. This makes it possible to start migrating fields from
intel_context to brw_context.
Surprisingly, this actually removes some code, as functions that use
OUT_BATCH don't need to declare "intel"; they just use "brw."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Paul Berry <stereotype441@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
brw_wm_surface_state.c has gotten rather large and unwieldy. At this
point, it consists of two separate portions:
1. Surface format code
This includes the giant table of surface formats and what features
they support on each generation, as well as the code to translate
between Mesa formats and hardware formats.
This is used across all generations.
2. Binding table (SURFACE_STATE) related code.
This is the code to generate SURFACE_STATE entries for renderbuffers,
textures, transform feedback buffers, constant buffers, and so on, as
well as the code to assemble them into binding tables.
This is only used on Gen4-6; gen7_surface_state.c has Gen7+ code.
Since the two are logically separate, and one is reused on every
generation while the other is not, it makes a lot of sense to split
them out. It should also make finding code easier.
No code is changed by this patch. I simply copied the file then deleted
portions of both.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
| |
We've already computed what the dimensions of the miptree are, and stored
it in the miptree.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch includes code to update the fast color clear state
appropriately when rendering occurs. The state will also need to be
updated when a fast clear or a resolve operation is performed; those
state updates will be added when the fast clear and resolve operations
are added.
v2: Create a new function, intel_miptree_used_for_rendering() to
handle updating the fast color clear state when rendering occurs.
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
|
|
| |
This makes it more consistent with intel_miptree_get_tile_offsets().
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
|
|
|
|
|
|
|
|
|
| |
Right now, the callers in i965 don't expect a nonzero page offset to
actually occur (since that's being handled elsewhere), but it seems
like a trap to leave it this way.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
|
|
|
|
|
|
|
|
|
| |
We keep having to pass the attachments around with our gl_renderbuffers
because that's the only way to find what the gl_renderbuffer actually
refers to. This is a step toward removing that (though drivers still need
the Zoffset as well).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
| |
This should have no change on driver operation, but it means that when you
wonder why some format isn't supported natively, you can just look at the
table above, instead of wondering if maybe there's an appropriate entry in
the surface formats table that is already supported.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
| |
Previously we would expand it to RGBA_FLOAT16. This format now comes out
as framebuffer incomplete, but it seems worth the memory savings if that's
what people are asking for (and GL3 does list it under "texture-only"
color formats)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
| |
This is a step on the way to removing some of our code for forcing alpha
to 1, but I want easy bisecting so I'll add groups of formats separately.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
| |
v2: move the flagging from intel_bufferobj_data to intel_bufferobj_alloc_buffer
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
|
|
|
|
|
|
| |
I'm not filling them all in, to prevent any breakage in this commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately the surface formats table is now splattered across multiple
chapters. All surface format enums from brw_defines.h are present, but
only support for them that is mentioned in the public specs is included
here.
v2 (from Ken): Mark R32G32B32A32_SFIXED as unsupported on Ivybridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
| |
v2: const-qualify ctx, and add a comment about the function (recommended
by Brian and Kenneth).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It appears that Z16 on Intel hardware is in fact slower than Z24, so
people are getting surprisingly hurt when trying to use Z16 as a
performance-versus-precision tradeoff, or when they're targeting GLES2 and
that's all you get.
GL 3.0+ have Z16 on the list of required exact format sizes, but GLES
doesn't, so choose the better-performing layout in that case. Improves
GLB 2.7 trex performance at 1920x1080 by 10.7% +/- 1.1% (n=3) on my IVB
system.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to load vec4s, since loading a vec4 instead of a dword is
basically no increased latency. But for variable indexed access, the
previous requirement of aligned vec4s for a sampler LD was hard to
implement.
Note that this change only affects those messages that use the surface
format, like sampler LDs, but not to the untyped data cache loads we've
used in other cases.
No significant performance difference on my GLSL demo with uniforms forced
to take the varying pull constants path (n=4).
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
| |
This puts the rounding-up logic into the function itself instead of all
the callers having to manage it. Also drop an "unused" comment in gen4,
as the stride *is* used for texbos (and will be for uniforms soon).
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 33599433c7 began setting the texture swizzle mode to XYZ1 for
RED, RG, and RGB textures in order to force alpha to 1.0 in case we
actually stored the texture as RGBA.
This had a unforseen performance implication: the shader precompile
assumes that the texture swizzle mode will be XYZW for non-shadow
sampler types. By setting it to XYZ1, this means every shader used with
a RED, RG, or RGB texture has to be recompiled. This is a very common
case.
Unfortunately, there's no way to improve the precompile, since RGBA
textures still need XYZW, and there's no way to know by looking at
the shader source what texture formats might be used.
However, we only need to smash alpha to 1.0 if the texture's memory
format actually has alpha bits. If not, the sampler already returns 1.0
for us without any special swizzling. XRGB8888, for example, is a very
common case where this occurs.
This partially fixes a performance regression since commit 33599433c7.
More work is required to fully fix it in all cases. This at least helps
Warsow.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since apps typically begin rendering with a call to glClear(), it is
likely that when brw_workaround_depthstencil_alignment() moves a
miplevel to a temporary buffer, it can avoid doing a blit, since the
contents of the miplevel are about to be erased.
This patch adds the necessary plumbing to determine when
brw_workaround_depthstencil_alignment() is being called as a
consequence of glClear(), and avoids the unnecessary blit when it is
safe to do so.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Eliminate unnecessary call to _mesa_is_depthstencil_format(). Fix
handling of depth buffer in depth/stencil format.
v3: Use correct bitfields for clear_mask. Fix handling of depth
buffer in depth/stencil format when hardware uses separate stencil.
When invalidating, make sure we still reassociate the image to the new
miptree.
Reviewed-by: Eric Anholt <eric@anholt.net>
|