summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/gen6_clip_state.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Eliminate brw->wm.prog_data pointer.Kenneth Graunke2016-10-051-1/+1
| | | | | | | | | | | | Just say no to: - brw->wm.base.prog_data = &brw->wm.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_wm_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
* i965: Eliminate brw->gs.prog_data pointer.Kenneth Graunke2016-10-051-4/+6
| | | | | | | | | | | | Just say no to: - brw->gs.base.prog_data = &brw->gs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_gs_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
* i965: Eliminate brw->tes.prog_data pointer.Kenneth Graunke2016-10-051-4/+4
| | | | | | | | | | | | Just say no to: - brw->tes.base.prog_data = &brw->tes.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_tes_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
* i965: Eliminate brw->vs.prog_data pointer.Kenneth Graunke2016-10-051-1/+1
| | | | | | | | | | | | Just say no to: - brw->vs.base.prog_data = &brw->vs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_vs_prog_data as needed. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
* i965: Add missing BRW_NEW_VS_PROG_DATA to 3DSTATE_CLIP.Kenneth Graunke2016-10-041-0/+2
| | | | | | Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Only emit 1 viewport when possible.Kenneth Graunke2016-10-031-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | In core profile, we support up to 16 viewports. However, in the majority of cases, only 1 of them is actually used - we only need the others if the last shader stage prior to the rasterizer writes gl_ViewportIndex. Processing all 16 viewports adds additional CPU overhead, which hurts CPU-intensive workloads such as Glamor. This meant that switching to core profile actually penalized Glamor to an extent, which is unfortunate. This patch tracks the number of relevant viewports, switching between 1 and ctx->Const.MaxViewports if gl_ViewportIndex is written. A new BRW_NEW_VIEWPORT_COUNT flag tracks this. This could mean re-emitting viewport state when switching, but hopefully this is offset by doing 1/16th of the work in the common case. The new flag is also lighter weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case. According to Eric Anholt, x11perf -copypixwin10 performance improves by 11.5094% +/- 3.10841% (n=10) on his Skylake. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Merge gen7_clip_state atom into gen6_clip_state atom.Kenneth Graunke2016-08-311-18/+0
| | | | | | | | | | | | | | | | | The original motivation was that gen6_clip_state ignored _NEW_POLYGON as it didn't care about early culling. The only other change was that Gen6 ignored BRW_NEW_TES_PROG_DATA as it doesn't have tessellation shaders, but listening to this is harmless as it'll never be signalled. Now that we've added _NEW_POLYGON for is_drawing_lines/points, we can merge the two as the distinction is meaningless. This actually fixes a bug, though: Gen8+ was using the gen6_clip_state atom because it doesn't care about early culling, but it also needs BRW_NEW_TES_PROG_DATA, which was missing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Use gs_prog_data in is_drawing_points/lines().Kenneth Graunke2016-08-311-8/+8
| | | | | | | | State upload code should use prog_data rather than poking at core Mesa shader data structures wherever possible. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Fix missing dirty bits related to is_drawing_points/lines.Kenneth Graunke2016-08-311-0/+5
| | | | | | | | | | calculate_attr_overrides() uses is_drawing_points(), which depends on tessellation and geometry program state, as well as polygon state. v2: Add missing _NEW_POLYGON as well. Caught by Iago Toral. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/state: Move is_drawing_lines/points to gen6_clip_state.cJason Ekstrand2016-08-191-1/+53
| | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rename brw_wm_barycentric_interp_mode to brw_barycentric_mode.Kenneth Graunke2016-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | brw_wm_barycentric_interp_mode is wordy, brw_barycentric_mode is less typing and suffers from fewer line wrapping problems. The enum values themselves don't really benefit from "WM" in the name, either. Put "BARYCENTRIC" first instead of at the end and drop "WM". Generated by: for file in *.c *.cpp *.h; do sed -i \ -e 's/brw_wm_barycentric_interp_mode/brw_barycentric_mode/g' \ -e 's/BRW_WM_\([A-Z_]*\)_BARYCENTRIC/BRW_BARYCENTRIC_\1/g' \ -e 's/BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT/BRW_BARYCENTRIC_MODE_COUNT/g' \ $file; done with a few whitespace changes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Drop perf_debug about rasterizer discard in SOL vs. clipper.Kenneth Graunke2016-06-161-3/+5
| | | | | | | | | | | I recently experimented with performing rasterizer discard in the SOL unit instead of the clipper, and as far as I can tell, it's basically the same performance. The clipper comes directly after SOL anyway, and setting the clipper to REJECT_ALL should be pretty darn cheap. Keep the perf_debug on Sandybridge, where the GS actually does work. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add support for GL_ARB_cull_distanceKristian Høgsberg Kristensen2016-05-131-0/+2
| | | | | Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Clamp "Maximum VP Index" to 1 when gl_ViewportIndex isn't written.Kenneth Graunke2016-05-091-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fs_visitor::emit_urb_writes skips writing the VUE header for shaders that don't write gl_PointSize, gl_Layer, or gl_ViewportIndex. This leaves their values uninitialized. Kristian's nearby comment says: "But often none of the special varyings that live there are written and in that case we can skip writing to the vue header, provided the corresponding state properly clamps the values further down the pipeline." However, we were clamping gl_ViewportIndex to [0, 15], so we would end up using a random viewport. To fix this, detect when the shader doesn't write gl_ViewportIndex, and clamp it to [0, 0]. The vec4 backend always writes zeros to the VUE header, so it doesn't suffer from this problem. With vec4-style HWord writes, we can write the header and position together in a single message. In the FS world, we would need 4 extra MOVs of 0 and a longer message, or a separate OWord write. It's likely cheaper to just clamp the value. Fixes DiRT Showdown and Bioshock Infinite, which only rendered half of the screen - the lower left of two triangles. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93054 Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Make all atoms to track BRW_NEW_BLORP by defaultKenneth Graunke2016-04-231-2/+4
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com
* i965: Stop XY clipping point and line primitives.Kenneth Graunke2016-03-181-1/+7
| | | | | | | | | | | | | | | | | | | | | | | Wide points and lines are not supposed to be clipped by the viewport. Rather, they should be rendered, and any fragments outside of the viewport should be discarded. The traditional use case for this behavior is rendering moving wide point particles. When the center of the point approaches the viewport edge, clipping would make it pop out of view early. Fixes: - dEQP-GLES2.functional.clipping.point.wide_point_clip - dEQP-GLES3.functional.clipping.point.wide_point_clip - dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_center - dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_corner - dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_center - dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_corner Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94453 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94454 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Use _mesa_geometric_ functions appropriatelyKevin Rogovin2015-06-171-3/+7
| | | | | | | | | | | | | | Change references to gl_framebuffer::Width, Height, MaxNumLayers and Visual::samples to use the _mesa_geometry_ convenience functions for those places where the geometry of the gl_framebuffer is needed (in contrast to the geometry of the intersection of the attachments of the gl_framebuffer). This patch is to pave the way to enable GL_ARB_framebuffer_no_attachments on Gen7 and higher in i965. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
* i965: Implement support for ARB_clip_control.Mathias Fröhlich2015-04-051-2/+5
| | | | | | | | | | | | | Switch between the two clip space definitions already available in hardware. Update winding order dependent state according to the clip control state. This change did not introduce new piglit quick.test regressions on an Ivybridge Mobile and a GM45 Express chipset. Also it enables and passes the clip-control and clip-control-depth-precision tests on these two chipsets. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
* i965: Move BRW_NEW_*_PROG_DATA flags to .brw (not .cache).Kenneth Graunke2014-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | I put the BRW_NEW_*_PROG_DATA flags at the beginning so that brw_state_cache.c can still continue using 1 << brw_cache_id. I also added a comment explaining the difference between BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time to remember it. Non-mechanical changes: - brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache. - brw_state_upload.c - INTEL_DEBUG=state changes. - brw_context.h - bit definition merging. v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention state-based recompiles, and nix the "proper subset" claim, as it's false. (Caught by Kristian Høgsberg). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Rename CACHE_NEW_*_PROG to BRW_NEW_*_PROG_DATA.Kenneth Graunke2014-12-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only ones that are left are legitimately related to the program cache. Yet, it seems a bit wasteful to have an entire bitfield for only 7 bits. State upload is one of the hottest paths in the driver. For each atom in the list, we call check_state() to see if it needs to be emitted. Currently, this involves comparing three separate bitfields (mesa, brw, and cache). Consolidating the brw and cache bitfields would save a small amount of CPU overhead per atom. Broadwell, for example, has 57 state atoms, so this small savings can add up. CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the offset into the program cache BO (prog_offset). Since most uses refer to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name. Removing "cache" completely is a bit painful, so I decided to do it in several patches for easier review, and to separate mechanical changes from manual ones. This one simply renames things, and was made via: $ for file in *.[ch]; do sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \ -e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file done Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw! The next patch will remedy this flaw. It will also fix the alphabetization issues. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Matt Turner <mattst88@gmail.com>
* i965: Alphabetize brw_tracked_state flags and use a consistent style.Kenneth Graunke2014-11-291-2/+7
| | | | | | | | | | | | | | | | Most of the dirty flags were listed in some arbitrary order. Some used bonus parenthesis. Some put multiple flags on one line, others put one per line. Some used tabs instead of spaces...but only on some lines. This patch settles on one flag per line, in alphabetical order, using spaces instead of tabs, and sheds the unnecessary parentheses. Sorting was mostly done with vim's visual block feature and !sort, although I alphabetized short lists by hand; it was pretty manual. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Disable guardband clipping in the smaller-than-viewport case.Kenneth Graunke2014-09-101-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | Apparently guardband clipping doesn't work like we thought: objects entirely outside fthe guardband are trivially rejected, regardless of their relation to the viewport. Normally, the guardband is larger than the viewport, so this is not a problem. However, when the viewport is larger than the guardband, this means that we would discard primitives which were wholly outside of the guardband, but still visible. We always program the guardband to 8K x 8K to enforce the restriction that the screenspace bounding box of a single triangle must be no more than 8K x 8K. So, if the viewport is larger than that, we need to disable guardband clipping. Fixes ES3 conformance tests: - framebuffer_blit_functionality_negative_height_blit - framebuffer_blit_functionality_negative_width_blit - framebuffer_blit_functionality_negative_dimensions_blit - framebuffer_blit_functionality_magnifying_blit - framebuffer_blit_functionality_multisampled_to_singlesampled_blit v2: Mention the acronym expansion for TA/TR/MC in the comments. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Disable clipping when rendering 3DPRIM_RECTLIST primitivesKristian Høgsberg2014-08-151-1/+7
| | | | | | | | The clipper doesn't support clipping 3DPRIM_RECTLIST primitives and must be turned off when we use them. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/guardband: Enable for all viewport dimensions (GEN8+)Ben Widawsky2014-08-101-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal of guardband clipping is to try to avoid 3d clipping because it is an expensive operation. When guardband clipping is disabled, all geometry that intersects the viewport is sent to the FF 3d clipper. Objects which are entirely enclosed within the viewport are said to be "trivially accepted" while those entirely outside of the viewport are, "trivially rejected". When guardband clipping is turned on the above behavior is changed such that if the geometry is within the guardband, and intersects the viewport, it skips the 3d clipper. Prior to GEN8, this was problematic if the viewport was smaller than the screen as it could allow for rendering to occur outside of the viewport. That could be mitigated if the programmer specified a scissor region which was less than or equal to the viewport - but this is not required for correctness in OpenGL. In theory you could be clever with the guardband so as not to invoke this problem. We do not do this, and have no data that suggests we should bother (nor the converse data). With viewport extents in place on GEN8, it should be safe to turn on guardband clipping for all cases While here, add a comment to the code which confused me thoroughly. v2: Update grammar in commit message. Reword comments based on Ken's suggestion. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Use unreachable() instead of unconditional assert().Matt Turner2014-07-011-2/+1
| | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Add missing newlines to a few perf_debug messages.Kenneth Graunke2014-06-151-2/+2
| | | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: "10.2" <mesa-stable@lists.freedesktop.org>
* i965: Update 3DSTATE_CLIP for Broadwell.Kenneth Graunke2014-01-311-2/+7
| | | | | | | | | | | Broadwell's winding order, polygon fill, and viewport Z test fields have moved to DWord 1 of 3DSTATE_RASTER. v2: Add a perf_debug for a future optimization and improve commit message (both suggested by Eric Anholt). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Consider all viewports before enabling guardband clippingIan Romanick2014-01-201-5/+9
| | | | | Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Set the maximum VPIndexIan Romanick2014-01-201-1/+2
| | | | | | | | | | | | | At various stages the hardware clamps the gl_ViewportIndex to these values. Setting them to zero effectively makes gl_ViewportIndex be ignored. This is acutally useful in blorp (so that we don't have to modify all of the viewport / scissor state). v2: Use INTEL_MASK to create GEN6_CLIP_MAX_VP_INDEX_MASK. Suggested by Ken. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* mesa: Convert gl_context::Viewport to gl_context::ViewportArrayCourtney Goeltzenleuchter2014-01-201-4/+4
| | | | | | | | | | | Only element 0 of the array is used anywhere at this time, so there should be no changes. v4: Split out from a single megapatch. Suggested by Ken. Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com> Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* mesa: Converty gl_viewport_attrib::X, ::Y, ::Width, and ::Height to floatCourtney Goeltzenleuchter2014-01-201-2/+2
| | | | | | | | | | v4: Split out from a single megapatch. Suggested by Ken. Also make meta's save_state::ViewportX, ::ViewportY, ::ViewportW, and ::ViewportH to match gl_viewport_attrib. Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com> Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Fix clears of layered framebuffers with mismatched layer counts.Paul Berry2014-01-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, Mesa enforced the following rule (from ARB_geometry_shader4's list of criteria for framebuffer completeness): * If any framebuffer attachment is layered, all attachments must have the same layer count. For three-dimensional textures, the layer count is the depth of the attached volume. For cube map textures, the layer count is always six. For one- and two-dimensional array textures, the layer count is simply the number of layers in the array texture. { FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_ARB } However, when ARB_geometry_shader4 was adopted into GL 3.2, this rule was dropped; GL 3.2 permits different attachments to have different layer counts. This patch brings Mesa in line with GL 3.2. In order to ensure that layered clears properly clear all layers, we now have to keep track of the maximum number of layers in a layered framebuffer. Fixes the following piglit tests in spec/!OpenGL 3.2/layered-rendering: - clear-color-all-types 1d_array mipmapped - clear-color-all-types 1d_array single_level - clear-color-mismatched-layer-count - framebuffer-layer-count-mismatch Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* mesa: Track number of layers in layered framebuffers.Paul Berry2013-11-211-1/+1
| | | | | | | | | | | | | | | | | | | | In order to properly clear layered framebuffers, we need to know how many layers they have. The easiest way to do this is to record it in the gl_framebuffer struct when we check framebuffer completeness. This patch replaces the gl_framebuffer::Layered boolean with a gl_framebuffer::NumLayers integer, which is 0 if the framebuffer is not layered, and equal to the number of layers otherwise. v2: Remove gl_framebuffer::Layered and make gl_framebuffer::NumLayers always have a defined value. Fix factor of 6 error in the number of layers in a cube map array. Cc: "10.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Combine gen6_clip_state.c and gen7_clip_state.c.Kenneth Graunke2013-11-041-2/+42
| | | | | | | | | | The changes between Gen6-7 are minimal, and can easily be solved with an extra generation check. This cuts a lot of duplicated code. It also helps prevent even more duplication for Broadwell. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Delete intel_context entirely.Kenneth Graunke2013-07-091-2/+1
| | | | | | | | | | This makes brw_context inherit directly from gl_context; that was the only thing left in intel_context. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Chris Forbes <chrisf@ijw.co.nz> Acked-by: Paul Berry <stereotype441@gmail.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965: Handle rasterizer discard in the clipper rather than GS on Gen6.Kenneth Graunke2013-05-201-1/+10
| | | | | | | | | | | | | | | | This has more of a negative impact than the previous patch, as on Gen6 passing primitives through to the clipper means we actually have to make the GS thread write them to the URB. I don't see another good solution though, and rasterizer discard is not the most common of cases, so hopefully it won't be too terrible. v2: Add a perf_debug; resolve rebase conflicts on the brw dirty flags; remove the rasterizer_discard field from brw_gs_prog_key. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> [v1] Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965: Disable clipper statistics when meta operations are in progress.Kenneth Graunke2013-05-201-2/+3
| | | | | | | | | | | | | | | | | We don't currently use the clipper statistics, but we'll soon use CL_INVOCATIONS_COUNT to implement the GL_PRIMITIVES_GENERATED query. The number of primitives generated is not supposed to be altered during operations such as glGenerateMipmap. Prevents spec/EXT_transform_feedback/generatemipmap prims_generated from breaking when we start using pipeline statistics registers to implement the GL_PRIMITIVES_GENERATED query in a few commits. v2: Use the BRW_NEW_META_IN_PROGRESS flag for correct state handling. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> [v1] Reviewed-by: Paul Berry <stereotype441@gmail.com>
* i965: Disable the GB clip test when a limited viewport is set.Eric Anholt2012-11-191-2/+12
| | | | | | | | | | | | | | The theory of the guardband is that you extend the clip volume to avoid expensive clipping computation, and just let fragments outside the viewport get clipped by the drawable's bounds. But if a smaller-than-window-size viewport is set, and we don't also happen to have a scissor set, then rendering could incorrectly extend outside of the viewport when it should have been clipped to the viewport. Fixes the new piglit triangle-guardband-viewport test. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: This is a candidate for the 9.0 branch.
* i965: Use fewer temporary variables in clip setup.Eric Anholt2012-11-191-13/+8
| | | | | | | | When you're comparing to the spec, you're trying to immediately see what numbered dword of the packet your bit ends up in. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> NOTE: This is a candidate for the 9.0 branch.
* i965/msaa: Adapt clip setup for centroid noperspective interpolation.Paul Berry2012-06-251-1/+1
| | | | | | | | | | | | | | | | | | To save time, we only instruct the clip stage of the pipeline to compute noperspective barycentric coordinates if those coordinates are needed by the fragment shader. Previously, we would determine whether the coordinates were needed by seeing whether the fragment shader used the BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC interpolation mode. However, with MSAA, it's possible that the fragment shader might use BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC instead. In the future, when we support ARB_sample_shading, it might use BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC. This patch modifies the upload_clip_state() functions to check for all three possible noperspective interpolation modes. Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Implement guardband clipping on Sandybridge.Kenneth Graunke2012-05-151-0/+1
| | | | | | | | | | | Improves performance in Citybench: - 320x240: 19.8008% +/- 0.937818% - 1280x480: 6.53856% +/- 0.859083% No apparent difference in OpenArena nor Xonotic. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965/gen6+: Avoid recomputing whether we use noperspective.Eric Anholt2012-02-211-26/+5
| | | | | | | Improves VS state change microbenchmark performance 2.38246% +/- 1.15046% (n=20). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rewrite the HiZ opChad Versace2012-02-071-19/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The HiZ op was implemented as a meta-op. This patch reimplements it by emitting a special HiZ batch. This fixes several known bugs, and likely a lot of undiscovered ones too. ==== Why the HiZ meta-op needed to die ==== The HiZ op was implemented as a meta-op, which caused lots of trouble. All other meta-ops occur as a result of some GL call (for example, glClear and glGenerateMipmap), but the HiZ meta-op was special. It was called in places that Mesa (in particular, the vbo and swrast modules) did not expect---and were not prepared for---state changes to occur (for example: glDraw; glCallList; within glBegin/End blocks; and within swrast_prepare_render as a result of intel_miptree_map). In an attempt to work around these unexpected state changes, I added two hooks in i965: - A hook for glDraw, located in brw_predraw_resolve_buffers (which is called in the glDraw path). This hook detected if a predraw resolve meta-op had occurred, and would hackishly repropagate some GL state if necessary. This ensured that the meta-op state changes would not intefere with the vbo module's subsequent execution of glDraw. - A hook for glBegin, implemented by brwPrepareExecBegin. This hook resolved all buffers before entering a glBegin/End block, thus preventing an infinitely recurring call to vbo_exec_FlushVertices. The vbo module calls vbo_exec_FlushVertices to flush its vertex queue in response to GL state changes. Unfortunately, these hooks were not sufficient. The meta-op state changes still interacted badly with glPopAttrib (as discovered in bug 44927) and with swrast rendering (as discovered by debugging gen6's swrast fallback for glBitmap). I expect there are more undiscovered bugs. Rather than play whack-a-mole in a minefield, the sane approach is to replace the HiZ meta-op with something safer. ==== How it was killed ==== This patch consists of several logical components: 1. Rewrite the HiZ op by replacing function gen6_resolve_slice with gen6_hiz_exec and gen7_hiz_exec. The new functions do not call a meta-op, but instead manually construct and emit a batch to "draw" the HiZ op's rectangle primitive. The new functions alter no GL state. 2. Add fields to brw_context::hiz for the new HiZ op. 3. Emit a workaround flush when toggling 3DSTATE_VS.VsFunctionEnable. 4. Kill all dead HiZ code: - the function gen6_resolve_slice - the dirty flag BRW_NEW_HIZ - the dead fields in brw_context::hiz - the state packet manipulation triggered by the now removed brw_context::hiz::op - the meta-op workaround in brw_predraw_resolve_buffers (discussed above) - the meta-op workaround brwPrepareExecBegin (discussed above) Note: This is a candidate for the 8.0 branch. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Paul Berry <stereotype441@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327 Reported-by: xunx.fang@intel.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44927 Reported-by: chao.a.chen@intel.com Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
* i965/gen6: Manipulate state batches for HiZ meta-ops [v4]Chad Versace2011-11-221-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A lot of the state manipulation is handled by the meta-op state setup. However, some batches need manual intervention. v2: Do not special-case the 3DSTATE_DEPTH_STENCIL.Depth_Test_Enable bit for HiZ in gen6_upload_depth_stencil(). The HiZ meta-op sets ctx->Depth.Test, just read the value from that. v3: Add a new dirty flag, BRW_STATE_HIZ, for brw_tracked_state. Flag it immediately before and after executing the HiZ operation in gen6_resolve_slice(). Add the flag to the the dirty bits for the following state packets: gen6_clip_state gen6_depth_stencil_state gen6_sf_state gen6_wm_state v4: - Add BRW_NEW_STATE_HIZ to the dirty bit table in brw_state_upload.c. This is needed for INTEL_DEBUG=state. - Align brw dirty bit for gen6_depth_stencil_state. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
* i965/gen6+: Add support for noperspective interpolation.Paul Berry2011-10-271-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This required the following changes: - WM setup now makes the appropriate set of barycentric coordinates (perspective vs. noperspective) available to the fragment shader, based on whether the shader requires perspective interpolation, noperspective interpolation, both, or neither. - The fragment shader backend now uses the appropriate set of barycentric coordiantes when interpolating, based on the interpolation mode returned by ir_variable::determine_interpolation_mode(). - SF setup now uses gl_fragment_program::InterpQualifier to determine which attributes are to be flat shaded (as opposed to the old logic, which only flat shaded colors). - CLIP setup now ensures that the clipper outputs non-perspective barycentric coordinates when they are needed by the fragment shader. Fixes the remaining piglit tests of interpolation qualifiers that were failing: - interpolation-flat-*-smooth-none - interpolation-flat-other-flat-none - interpolation-noperspective-* - interpolation-smooth-gl_*Color-flat-* Reviewed-by: Eric Anholt <eric@anholt.net>
* i965 Gen6+: De-compact clip planes.Paul Berry2011-10-061-26/+2
| | | | | | | | | | | | | | | | | | | Previously, if the user enabled a non-consecutive set of clip planes (e.g. 0, 1, and 3), the driver would compact them down to a consecutive set starting at 0. This optimization was of dubious value, and complicated the implementation of gl_ClipDistance. This patch changes the driver so that with Gen6 and later chipsets, we no longer compact the clip planes. However, we still discard any clip planes beyond the highest number that is in use, so performance should not be affected for applications that use clip planes consecutively from 0. With chipsets previous to Gen6, we still compact the clip planes, since the pre-Gen6 clipper thread relies on this behavior. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
* mesa: Create _mesa_bitcount_64() to replace i965's brw_count_bits()Paul Berry2011-10-061-1/+1
| | | | | | | | | | | | | | | | The i965 driver already had a function to count bits in a 64-bit uint (brw_count_bits()), but it was buggy (it only counted the bottom 32 bits) and it was clumsy (it had a strange and broken fallback for non-GCC-like compilers, which fortunately was never used). Since Mesa already has a _mesa_bitcount() function, it seems better to just create a _mesa_bitcount_64() function rather than special-case this in the i965 driver. This patch creates the new _mesa_bitcount_64() function and rewrites all of the old brw_count_bits() calls to refer to it. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: allow for nonconsecutive elements of gl_ClipDistance to be enabled.Paul Berry2011-09-281-2/+26
| | | | | | | | | | | | | | | | | | | | | | When using user-defined clipping planes, the i965 driver compacts the array of clipping planes so that disabled clipping planes do not appear in it--this saves precious push constant space and makes it easier to generate the pre-GEN6 clip program. As a result, when enabling clipping planes in GEN6+ hardware, we always enable clipping planes 0 through n-1 (where n is the number of clipping planes enabled), regardless of which clipping planes the user actually requested. However, we can't do this when using gl_ClipDistance, because it would be prohibitively complex to compact the gl_ClipDistance array inside the user-supplied vertex shader. So, when enabling clipping planes in GEN6+ hardware, if gl_ClipDistance is in use, we need to pass the user-supplied enable flags directly through to the hardware rather than just enabling the first n planes. Fixes Piglit test vs-clip-distance-enables. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Add _NEW_LIGHT to Gen6 clip state dirty bits.Kenneth Graunke2011-05-171-1/+2
| | | | | | | | | ctx->Light.ProvokingVertex depends on _NEW_LIGHT. Found by inspection. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* i965: Increase Sandybridge point size clamp in the clip state.Kenneth Graunke2011-02-241-1/+1
| | | | | | | | | 255.875 matches the hardware documentation. Presumably this was a typo. NOTE: This is a candidate for the 7.10 branch, along with commit 2bfc23fb86964e4153f57f2a56248760f6066033. Reviewed-by: Eric Anholt <eric@anholt.net>