summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* Fix a few typosZoë Blade2015-04-2717-18/+18
| | | | Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* radeonsi: set an optimal value for DB_Z_INFO.ZRANGE_PRECISIONMarek Olšák2015-04-271-7/+2
| | | | | | Required because of a VI hw bug. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
* radeonsi: remove deprecated and useless registersMarek Olšák2015-04-271-10/+0
| | | | Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
* radeonsi: remove useless includesMarek Olšák2015-04-271-3/+0
| | | | Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
* gallium/radeon: print winsys info with R600_DEBUG=infoMarek Olšák2015-04-272-0/+28
| | | | Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
* gallium/radeon: don't crash when getting out-of-bounds TEMP referencesMarek Olšák2015-04-231-0/+6
| | | | Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
* softpipe: fix stencil write to use an integer valueDave Airlie2015-04-231-1/+1
| | | | | | | | | | | This fixes a number of regressions since 61393bdcdc3b63624bf6e9730444f5e9deeedfc8 u_tile: fix stencil texturing tests under softpipe Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89960 Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* freedreno: misc minor cleanupsRob Clark2015-04-223-9/+10
| | | | Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno/a4xx: (partial) gl_FragCoord.zwRob Clark2015-04-221-5/+11
| | | | | | | | | The bit to enable .z is still commented out, as it is triggering gpu hangs in 0ad. But at least gl_FragCoord.w works now, and we know what bits we are *supposed* to set for .z (with that uncommented all piglit fragcoord tests are passing). Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno/a4xx: primitive-restartRob Clark2015-04-221-0/+5
| | | | | | This was the missing bit to get dolphin-emu working on a4xx. Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno/nir: sysval fixesRob Clark2015-04-222-5/+12
| | | | Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno/a4xx: wire up integer texture samplingRob Clark2015-04-223-5/+44
| | | | | | | Similar to a3xx, the compiler needs to know the return type of the sam, etc, instructions. Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno/a4xx: formats updates/fixesRob Clark2015-04-223-32/+84
| | | | | | | Update formats table with new formats that Ilia has figured out, and fix sampling from srgb texture and integer vbo's. Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno: update generated headersRob Clark2015-04-226-20/+123
| | | | Signed-off-by: Rob Clark <robclark@freedesktop.org>
* android: use LOCAL_SHARED_LIBRARIES over TARGET_OUT_HEADERSEmil Velikov2015-04-226-14/+9
| | | | | | | | | ... to manage the LIBDRM*_CFLAGS. The former is the recommended approach by the Android build system developers while the latter has been depreciated for quite some time. Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
* ilo: remove unused include from Android.mkEmil Velikov2015-04-221-3/+0
| | | | | Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
* freedreno/a3xx: enable polymode setting with non-fill modesIlia Mirkin2015-04-181-0/+4
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* freedreno/a3xx: fix integer and 32-bit float border colorsIlia Mirkin2015-04-181-1/+30
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* freedreno/a3xx: add support for float R/RG render targetsIlia Mirkin2015-04-181-4/+6
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* freedreno/ir3/nir: few little fixesRob Clark2015-04-171-21/+28
| | | | | | | | | isaml needs to scale up coords based on LoD. Also fix bogus bary.f varying # when there are non-bary frag shader inputs. And use sub.s of a positive immediate rather than add.s of negative (since CP is better about figuring out that those can be collapsed into the cat2 instr). Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno/ir3/nir: lower if/elseRob Clark2015-04-176-8/+381
| | | | | | | For now, completely flatten if/else blocks. That will almost certainly change once we have flow control. Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno/a4xx: support for large shadersRob Clark2015-04-171-3/+26
| | | | Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno: update generated headersRob Clark2015-04-178-42/+354
| | | | Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno/ir3/nir: UBO supportRob Clark2015-04-172-0/+52
| | | | Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno/ir3: move out helperRob Clark2015-04-172-24/+23
| | | | | | We'll also want it in NIR f/e for implementing UBO support. Signed-off-by: Rob Clark <robclark@freedesktop.org>
* freedreno/a4xx: sysvals and UBOsRob Clark2015-04-173-24/+56
| | | | | | | | | | Basically just sync up the cmdstream emit parts to match the changes already done on a3xx. Also, fix scheduling for mem instructions. This is needed on a4xx, and I am a bit surprised it isn't needed for a3xx. Signed-off-by: Rob Clark <robclark@freedesktop.org>
* radeonsi: add a debug option to compile shaders when they're createdMarek Olšák2015-04-163-0/+6
| | | | Tested-by: Tom Stellard <thomas.stellard@amd.com>
* radeonsi: remove bogus r600-- tripleEmil Velikov2015-04-161-2/+0
| | | | | | | | | | As mentioned by Michel Dänzer for LLVM >= 3.6 we create the LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi context. For older LLVM or hardware (r600) the triple is always r600-- and is created at a later stage - radeon_llvm_compile() Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
* r600g/sb: Skip empty ALU clause while schedulingGlenn Kennard2015-04-161-0/+3
| | | | | | | | | Fixes assert triggered by ext_transform_feedback-intervening-read output use_gs piglit test. Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* vc4: Don't try to use color load/stores to blit across format changes.Eric Anholt2015-04-151-0/+3
| | | | | | We could potentially support the right combination of 8888 to 565, but the important thing for now is to not mix up our orderings of 8888. Fixes fbo-copyteximage regressions.
* vc4: Don't try to use color load/stores to do depth/stencil blits.Eric Anholt2015-04-151-0/+3
| | | | | Fixes regressions in fbo-generatemipmap-formats on depth/stencil (which does blits to work around baselevel/lastlevel).
* vc4: Update the shadow texture for public textures on every draw.Eric Anholt2015-04-153-7/+20
| | | | | We don't know who else has written to it, so we'd better update it every time. This makes the gears spin in X again.
* vc4: Hook up VC4_DEBUG=perf to some useful printfs.Eric Anholt2015-04-153-1/+16
|
* radeon/llvm: Improve codegen for KILL_IFTom Stellard2015-04-141-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than emitting one kill instruction per component of KILL_IF's src reg, we now or the components of the src register together and use the result as a condition for just one kill instruction. shader-db stats (bonaire): 979 shaders Totals: SGPRS: 34872 -> 34848 (-0.07 %) VGPRS: 20696 -> 20676 (-0.10 %) Code Size: 749032 -> 748452 (-0.08 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 12288 -> 12288 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 1184 -> 1160 (-2.03 %) VGPRS: 600 -> 580 (-3.33 %) Code Size: 13200 -> 12620 (-4.39 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Increases: SGPRS: 2 (0.00 %) VGPRS: 0 (0.00 %) Code Size: 0 (0.00 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Decreases: SGPRS: 5 (0.01 %) VGPRS: 5 (0.01 %) Code Size: 25 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) *** BY PERCENTAGE *** Max Increase: SGPRS: 32 -> 40 (25.00 %) VGPRS: 0 -> 0 (0.00 %) Code Size: 0 -> 0 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 32 -> 24 (-25.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 116 -> 96 (-17.24 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave *** BY UNIT *** Max Increase: SGPRS: 64 -> 72 (12.50 %) VGPRS: 0 -> 0 (0.00 %) Code Size: 0 -> 0 (0.00 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 32 -> 24 (-25.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 424 -> 356 (-16.04 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeon/llvm: Run LLVM's instruction combining passTom Stellard2015-04-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should improve code quality in general and will help with some future changes to how we emit kill instructions. shader-db shows a few regressions, but these don't seem to be the result of deficiencies in instcombine. They're mostly caused by the scheduler making different decisions than before. shader-db stats (bonaire): 979 shaders Totals: SGPRS: 35056 -> 34872 (-0.52 %) VGPRS: 20624 -> 20696 (0.35 %) Code Size: 764372 -> 749032 (-2.01 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 12288 -> 12288 (0.00 %) bytes per wave Totals from affected shaders: SGPRS: 13264 -> 13072 (-1.45 %) VGPRS: 8248 -> 8316 (0.82 %) Code Size: 486320 -> 470992 (-3.15 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 11264 -> 11264 (0.00 %) bytes per wave Increases: SGPRS: 6 (0.01 %) VGPRS: 20 (0.02 %) Code Size: 14 (0.01 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Decreases: SGPRS: 32 (0.03 %) VGPRS: 8 (0.01 %) Code Size: 244 (0.25 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) *** BY PERCENTAGE *** Max Increase: SGPRS: 32 -> 48 (50.00 %) VGPRS: 12 -> 20 (66.67 %) Code Size: 216 -> 224 (3.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 40 -> 32 (-20.00 %) VGPRS: 16 -> 12 (-25.00 %) Code Size: 368 -> 280 (-23.91 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave *** BY UNIT *** Max Increase: SGPRS: 32 -> 48 (50.00 %) VGPRS: 28 -> 36 (28.57 %) Code Size: 39320 -> 40132 (2.07 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Max Decrease: SGPRS: 72 -> 64 (-11.11 %) VGPRS: 48 -> 40 (-16.67 %) Code Size: 6272 -> 5852 (-6.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Scratch: 0 -> 0 (0.00 %) bytes per wave Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* radeonsi: Add header and footer to shader stat dumpTom Stellard2015-04-141-2/+4
| | | | | | This makes it easier to parse. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* vc4: Add a blitter path using just the render thread.Eric Anholt2015-04-131-0/+127
| | | | | This accelerates the path for generating the shadow tiled texture when asked to sample from a raster texture (typical in glamor).
* vc4: Allow submitting jobs with no bin CL in validation.Eric Anholt2015-04-134-11/+19
| | | | | | For blitting, we want to fire off an RCL-only job. This takes a bit of tweaking in our validation and the simulator support (and corresponding new code in the kernel).
* vc4: Move the blit code to a separate file.Eric Anholt2015-04-134-64/+92
| | | | | There will be other blit code showing up, and it seems like the place you'd look.
* vc4: Separate out a bit of code for submitting jobs to the kernel.Eric Anholt2015-04-134-90/+139
| | | | | | I want to be able to have multiple jobs being set up at the same time (for example, a render job to do a little fixup blit in the course of doing a render to the main FBO).
* vc4: When asked to sample from a raster texture, make a shadow tiled copy.Eric Anholt2015-04-131-2/+9
| | | | | | | | | | | So, it turns out my simulator doesn't *quite* match the hardware. And the errata about raster textures tells you most of what's wrong, but there's still stuff wrong after that. Instead, if we're asked to sample from raster, we'll just blit it to a tiled temporary. Raster textures should only be screen scanout, and word is that it's faster to copy to tiled using the tiling engine first than to texture from an entire raster texture, anyway.
* vc4: Fix off-by-one in branch target validation.Eric Anholt2015-04-131-1/+1
|
* vc4: Use NIR-level lowering for idiv.Eric Anholt2015-04-131-11/+1
| | | | This fixes the idiv tests in piglit.
* vc4: Add a bunch of type conversions.Eric Anholt2015-04-131-0/+12
| | | | | | These are required to get piglit's idiv tests working. The unsigned<->float conversions are wrong, but are good enough to get piglit's small ranges of values working.
* vc4: Use the blit interface for updating shadow textures.Eric Anholt2015-04-131-13/+31
| | | | | This lets us plug in a better blit implementation and have it impact the shadow update, too.
* vc4: Remove dead fields from vc4_surface.Eric Anholt2015-04-131-3/+0
|
* vc4: Skip sending down the clear colors if not clearing.Eric Anholt2015-04-131-5/+7
|
* vc4: Sync with kernel changes to relax BCL versus RCL validation.Eric Anholt2015-04-131-22/+3
| | | | There was no reason to tie the two packets' values together.
* vc4: Fix another space allocation mistake.Eric Anholt2015-04-131-0/+1
| | | | | | We're over-allocating our BCL in vc4_draw.c, so this never mattered. However, new RCL-only blit support might end up here without having set up any BCL contents.
* vc4: Add missed accounting for the size of the semaphore.Eric Anholt2015-04-131-0/+2
| | | | This wouldn't have mattered except in the worst case scenario RCL setup.