summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
Commit message (Collapse)AuthorAgeFilesLines
...
* i965: Implement ABO surface state emission.Francisco Jerez2013-10-291-0/+50
| | | | | | | | | | | | The maximum number of atomic buffer objects is somewhat arbitrary, we can change it in the future easily if it turns out it's not enough... v2: Add comments with the relevant mesa dirty bits. Fix usage of BRW_NEW_UNIFORM_BUFFER in the GS ABO state atom. v3: Update binding table layout diagrams. v4: Resolve conflicts with the recent dynamic surface index assignment changes. Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965: Try to avoid stalls on the GPU when doing glBufferSubData().Eric Anholt2013-10-231-4/+9
| | | | | | | | | | | | On DOTA2, framerate on dota2-de1.dem in windowed mode on my laptop improves by 7.69854% +/- 0.909163% (n=3). In a microbenchmark hitting this code path (wall time of piglit vbo-subdata-many), runtime decreases from 0.8 to 0.05 seconds. v2: Use out of range start/end instead of separate bool for the active flag (suggestion by Jordan), fix double-upload in the stalling path. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Add support for GL_ARB_texture_buffer_range.Eric Anholt2013-10-231-4/+11
| | | | | | | | | | | Supporting this extension turns out to simplify our code a bit over not supporting this extension, once the glBufferSubData() synchronization code lands. v2: Use 16 byte alignment like we do for uniform buffers, due to unaligned access penalties. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
* i965: Fix texture buffer rendering after a whole buffer replacement.Eric Anholt2013-10-231-0/+2
| | | | | | | | | | | If glBufferData(), glBufferSubData(0, obj->Size), or similar happens, we get a new drm_intel_bo for the buffer object, and thus need to re-upload texture buffer state so we point at the new data. Fixes the new piglit GL_ARB_texture_buffer_object/data-sync Cc: "9.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.Eric Anholt2013-10-151-34/+31
| | | | | | | | | | | It would be nice to be able to pack our binding table so that programs that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1 binding table entries. To do that, we need the compiled program to have information on where its surfaces go. v2: Rename size to size_bytes to be more explicit. Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965: Emit a second set of SURFACE_STATE for gather4 from textures.Chris Forbes2013-10-031-6/+32
| | | | | | | | | | | | | This allows us to use a different surface format for gather4, which is required for R32G32_FLOAT to work on Gen7. V4: - Only emit alternate surface state for shaders which will actually use it. - Pass a simple 'for_gather' flag rather than a function pointer. The callee can decide what w/a to apply. Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Totally switch around how we handle nonzero baselevel-first_level.Eric Anholt2013-09-301-2/+3
| | | | | | | | | | | | | | | | | | | This has no effect currently, because intel_finalize_mipmap_tree() always makes mt->first_level == tObj->BaseLevel. The change I made before to handle it (b1080cfbdb0a084122fcd662cd27b4748c5598fd) got very close to working, but after fixing some unrelated bugs in the series, it still left tex-miplevel-selection producing errors when testing textureLod(). The problem is that for explicit LODs, the sampler's LOD clamping is ignored, and only the surface's MIP clamping is respected. So we need to use surface mip clamping, which applies on top of the sampler's mip clamping, so the sampler change gets backed out. Now actually tested with a non-regressing series producing a non-zero computed baselevel. Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* i965: Always look up from the object's mt when setting up texturing state.Eric Anholt2013-09-301-3/+1
| | | | | | | | | | | We know that the object's mt is equal to the firstimage's mt because it's gone through intel_finalize_mipmap_tree(). Saves a lookup of firstimage on pre-gen7. v2: Merge in the warning fix that appeared later in the series (noted by Chad) Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* i965: Refactor Gen4-6 SURFACE_STATE setup for buffer surfaces.Kenneth Graunke2013-09-191-61/+39
| | | | | | | | | | | | | | This was an embarassingly large amount of copy and pasted code, and it wasn't particularly simple code either. By factoring it out into a helper function, we consolidate the complexity. v2: Properly NULL-check bo. Caught by Eric Anholt. v3: Do the subtraction by 1 in gen7_emit_buffer_surface_state, rather than making callers do it. This makes the buffer_size parameter the actual size of the buffer. Suggested by Paul Berry. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Fix off by one errors in texture buffer size calculations.Kenneth Graunke2013-09-191-1/+1
| | | | | | | | | The value that's split into width/height/depth needs to be the size of the buffer minus one. This makes it consistent with the constant buffer and shader time SURFACE_STATE setup code. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Move binding table code to a new file, brw_binding_tables.c.Kenneth Graunke2013-09-191-25/+0
| | | | | | | | | | | | The code to upload the binding tables for each stage was scattered across brw_{vs,gs,wm}_surface_state.c and brw_misc_state.c, which also contain a lot of code to populate individual SURFACE_STATE structures. This patch brings all the binding table upload code together, and splits it out from the code which fills in SURFACE_STATE entries. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Use brw_upload_binding_table() for the pixel shader as well.Kenneth Graunke2013-09-191-18/+5
| | | | | | | | | | | This is not quite the same: brw_upload_binding_table() also has code to early-return if there are no entries, while the existing code did not. The PS binding table is unlikely to be empty since it will have at least one color buffer. If it ever is empty, early returning seems wise. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Remove MIPLAYOUT_BELOW from Gen4-6 constant buffer surface state.Kenneth Graunke2013-09-171-1/+0
| | | | | | | | | | Specifying a miptree layout makes no sense for constant buffers. This has no functional change since BRW_SURFACE_MIPMAPLAYOUT_BELOW is just a #define for 0. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965: Use brw_stage_state for WM data as well.Kenneth Graunke2013-09-131-20/+20
| | | | | | | | This gets the VS, GS, and PS all using the same data structure. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965/gs: add geometry shader support to brw_texture_surfaces.Paul Berry2013-08-311-0/+7
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gs: generalize brw_texture_surfaces in preparation for gs.Paul Berry2013-08-311-31/+33
| | | | | | | | | | | There is a slight functionality change. Previously we would compute a common value for num_samplers for all stages, and populate that many entries in each stage's surf_offset table regardless of how many samplers each stage used. Now we only populate the number of entries in the surf_offset table corresponding to the number of samplers actually used by the stage. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Modify signature to update_texture_surface functions.Paul Berry2013-08-311-11/+9
| | | | | | | | | | | | Previously these functions would accept a pointer to the binding table and an index indicating which entry in the binding table should be updated. Now they merely take a pointer to the binding table entry to be updated. This will make it easier to generalize brw_texture_surfaces to support geometry shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move data from brw->vs into a base class if gs will also need it.Paul Berry2013-08-311-2/+2
| | | | | | | | | This paves the way for sharing the code that will set up the vertex and geometry shader pipeline state. v2: Rename the base class to brw_stage_state. Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* i965/gs: Update defines related to GS surface organization.Paul Berry2013-08-311-2/+2
| | | | | | | | | | | | | | | | Defines that previously referred to VS now refer to VEC4, since they will be shared by the user-programmable vertex shader and geometry shader stages. Defines that previously referred to the Gen6 geometry shader stage (which is only used for transform feedback) are now renamed to explicitly refer to Gen6, to avoid confusion with the Gen7 user-programmable geometry shader stage. Based on work by Eric Anholt <eric@anholt.net>. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* i965: Switch gen4-6 to using the sampler's base level for GL BASE_LEVEL.Eric Anholt2013-08-301-13/+3
| | | | | | | | Thanks to Ken for trawling through my neglected public branches and finding the bug in this change (inside a megacommit) that made me abandon this work. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Make the FS binding table as small as possible.Kenneth Graunke2013-08-191-6/+5
| | | | | | | | | | | | | | Computing the minimum size was easy, and done at compile-time for no extra overhead here. Making the binding table smaller wastes less batch space. Adding the CACHE_NEW_WM_PROG dirty bit isn't strictly necessary, since other atoms depend on it and flag BRW_NEW_SURFACES. However, it's best to add it for clarity and safety. It shouldn't add any new overhead. v2: Use binding_table_size, rather than max_surface_index. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use SURF_INDEX_DRAW() for drawbuffer binding table indices.Kenneth Graunke2013-08-191-6/+6
| | | | | | | | | | | | | | | | | | | | | SURF_INDEX_DRAW() has been the identity function since the dawn of time, and both the shader code and binding table upload code relied on that, simply using X rather than SURF_INDEX_DRAW(X). Even if that continues to be true, using the macro clarifies the code. The comment about draw buffers needing to be first in order for headerless render target writes to work turned out to be wrong; with this change, SURF_INDEX_DRAW can be changed to arbitrary indices and everything continues working. The confusion was over the "Render Target Index" field in the FB write message header. If it were a binding table index, then RT 0 would have to be at index 0 for headerless FB writes to work. However, it's actually an index into the blend state table, so there's no problem. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Cc: Paul Berry <stereotype441@gmail.com>
* i965: Cite the 965 PRM for "the data cache is the sampler cache".Kenneth Graunke2013-07-151-3/+3
| | | | | | | | | | | | Presumably, this comment exists to justify the usage of I915_GEM_DOMAIN_SAMPLER for this relocation. At one point, this was necessary to ensure that the right flushing was done to keep caches coherent. These days, the kernel just flushes everything, so I don't think it matters. Still, the comment is interesting, so leave it in place. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Delete intel_context entirely.Kenneth Graunke2013-07-091-7/+7
| | | | | | | | | | This makes brw_context inherit directly from gl_context; that was the only thing left in intel_context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-14/+7
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Move intel_context::batch to brw_context.Kenneth Graunke2013-07-091-6/+6
| | | | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Move intel_context::bufmgr to brw_context.Kenneth Graunke2013-07-091-1/+1
| | | | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Move intel_context::vtbl to brw_context.Kenneth Graunke2013-07-091-25/+21
| | | | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Pass brw_context to functions rather than intel_context.Kenneth Graunke2013-07-091-7/+5
| | | | | | | | | | | | | | This makes brw_context available in every function that used intel_context. This makes it possible to start migrating fields from intel_context to brw_context. Surprisingly, this actually removes some code, as functions that use OUT_BATCH don't need to declare "intel"; they just use "brw." Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Split surface format code into a new file (brw_surface_formats.c).Kenneth Graunke2013-06-281-702/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | brw_wm_surface_state.c has gotten rather large and unwieldy. At this point, it consists of two separate portions: 1. Surface format code This includes the giant table of surface formats and what features they support on each generation, as well as the code to translate between Mesa formats and hardware formats. This is used across all generations. 2. Binding table (SURFACE_STATE) related code. This is the code to generate SURFACE_STATE entries for renderbuffers, textures, transform feedback buffers, constant buffers, and so on, as well as the code to assemble them into binding tables. This is only used on Gen4-6; gen7_surface_state.c has Gen7+ code. Since the two are logically separate, and one is reused on every generation while the other is not, it makes a lot of sense to split them out. It should also make finding code easier. No code is changed by this patch. I simply copied the file then deleted portions of both. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Stop recomputing the miptree's size from the texture image.Eric Anholt2013-06-261-6/+3
| | | | | | | We've already computed what the dimensions of the miptree are, and stored it in the miptree. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Drop unused argument to translate_tex_format().Eric Anholt2013-06-261-2/+0
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen7+: Create an enum for keeping track of fast color clear state.Paul Berry2013-06-121-0/+2
| | | | | | | | | | | | | This patch includes code to update the fast color clear state appropriately when rendering occurs. The state will also need to be updated when a fast clear or a resolve operation is performed; those state updates will be added when the fast clear and resolve operations are added. v2: Create a new function, intel_miptree_used_for_rendering() to handle updating the fast color clear state when rendering occurs. Reviewed-by: Eric Anholt <eric@anholt.net>
* intel: add layered parameter to update_renderbuffer_surfaceJordan Justen2013-06-021-1/+5
| | | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>
* intel: Rename intel_renderbuffer_tile_offsets.Eric Anholt2013-05-281-2/+2
| | | | | | | | This makes it more consistent with intel_miptree_get_tile_offsets(). Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Paul Berry <stereotype441@gmail.com>
* intel: Make intel_miptree_get_tile_offsets return a page offset.Eric Anholt2013-05-281-3/+3
| | | | | | | | | Right now, the callers in i965 don't expect a nonzero page offset to actually occur (since that's being handled elsewhere), but it seems like a trap to leave it this way. Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Paul Berry <stereotype441@gmail.com>
* mesa: Track the TexImage being rendered to in the gl_renderbuffer.Eric Anholt2013-05-171-1/+1
| | | | | | | | | We keep having to pass the attachments around with our gl_renderbuffers because that's the only way to find what the gl_renderbuffer actually refers to. This is a step toward removing that (though drivers still need the Zoffset as well). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Fill in brw_format_for_mesa_format for some non-rendering formats.Eric Anholt2013-05-151-18/+18
| | | | | | | | | This should have no change on driver operation, but it means that when you wonder why some format isn't supported natively, you can just look at the table above, instead of wondering if maybe there's an appropriate entry in the surface formats table that is already supported. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use native RGB_FLOAT16 support when available.Eric Anholt2013-05-151-1/+1
| | | | | | | | | Previously we would expand it to RGBA_FLOAT16. This format now comes out as framebuffer incomplete, but it seems worth the memory savings if that's what people are asking for (and GL3 does list it under "texture-only" color formats) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use the Mesa surface formats for float RGB surfaces.Eric Anholt2013-05-151-2/+2
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use the new XRGB UNORM formats.Eric Anholt2013-05-151-3/+3
| | | | | | | This is a step on the way to removing some of our code for forcing alpha to 1, but I want easy bisecting so I'll add groups of formats separately. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* mesa: add & use a new driver flag for UBO updates instead of _NEW_BUFFER_OBJECTMarek Olšák2013-05-111-3/+2
| | | | | | | v2: move the flagging from intel_bufferobj_data to intel_bufferobj_alloc_buffer Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Sync brw_format_for_mesa_format() table with new Mesa formats.Eric Anholt2013-05-081-1/+31
| | | | | | I'm not filling them all in, to prevent any breakage in this commit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Update the surface formats table from the current specs.Eric Anholt2013-05-081-0/+65
| | | | | | | | | | | Unfortunately the surface formats table is now splattered across multiple chapters. All surface format enums from brw_defines.h are present, but only support for them that is mentioned in the public specs is included here. v2 (from Ken): Mark R32G32B32A32_SFIXED as unsupported on Ivybridge. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* mesa: Make a Mesa core function for sRGB render encoding handling.Eric Anholt2013-04-301-20/+7
| | | | | | | | v2: const-qualify ctx, and add a comment about the function (recommended by Brian and Kenneth). Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
* i965: Disable Z16 on contexts that don't require it.Eric Anholt2013-04-291-1/+14
| | | | | | | | | | | | | | It appears that Z16 on Intel hardware is in fact slower than Z24, so people are getting surprisingly hurt when trying to use Z16 as a performance-versus-precision tradeoff, or when they're targeting GLES2 and that's all you get. GL 3.0+ have Z16 on the list of required exact format sizes, but GLES doesn't, so choose the better-performing layout in that case. Improves GLB 2.7 trex performance at 1920x1080 by 10.7% +/- 1.1% (n=3) on my IVB system. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make the fragment shader pull constants index by dwords, not vec4s.Eric Anholt2013-04-011-5/+8
| | | | | | | | | | | | | | | | | We want to load vec4s, since loading a vec4 instead of a dword is basically no increased latency. But for variable indexed access, the previous requirement of aligned vec4s for a sampler LD was hard to implement. Note that this change only affects those messages that use the surface format, like sampler LDs, but not to the untyped data cache loads we've used in other cases. No significant performance difference on my GLSL demo with uniforms forced to take the varying pull constants path (n=4). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make the constant surface interface take a normal byte size.Eric Anholt2013-04-011-9/+7
| | | | | | | | | This puts the rounding-up logic into the function itself instead of all the callers having to manage it. Also drop an "unused" comment in gen4, as the stride *is* used for texbos (and will be for uniforms soon). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Don't use texture swizzling to force alpha to 1.0 if unnecessary.Kenneth Graunke2013-03-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 33599433c7 began setting the texture swizzle mode to XYZ1 for RED, RG, and RGB textures in order to force alpha to 1.0 in case we actually stored the texture as RGBA. This had a unforseen performance implication: the shader precompile assumes that the texture swizzle mode will be XYZW for non-shadow sampler types. By setting it to XYZ1, this means every shader used with a RED, RG, or RGB texture has to be recompiled. This is a very common case. Unfortunately, there's no way to improve the precompile, since RGBA textures still need XYZW, and there's no way to know by looking at the shader source what texture formats might be used. However, we only need to smash alpha to 1.0 if the texture's memory format actually has alpha bits. If not, the sampler already returns 1.0 for us without any special swizzling. XRGB8888, for example, is a very common case where this occurs. This partially fixes a performance regression since commit 33599433c7. More work is required to fully fix it in all cases. This at least helps Warsow. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Carl Worth <cworth@cworth.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Avoid unnecessary copy when depthstencil workaround invoked by clear.Paul Berry2013-03-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Since apps typically begin rendering with a call to glClear(), it is likely that when brw_workaround_depthstencil_alignment() moves a miplevel to a temporary buffer, it can avoid doing a blit, since the contents of the miplevel are about to be erased. This patch adds the necessary plumbing to determine when brw_workaround_depthstencil_alignment() is being called as a consequence of glClear(), and avoids the unnecessary blit when it is safe to do so. Reviewed-by: Chad Versace <chad.versace@linux.intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> v2: Eliminate unnecessary call to _mesa_is_depthstencil_format(). Fix handling of depth buffer in depth/stencil format. v3: Use correct bitfields for clear_mask. Fix handling of depth buffer in depth/stencil format when hardware uses separate stencil. When invalidating, make sure we still reassociate the image to the new miptree. Reviewed-by: Eric Anholt <eric@anholt.net>