summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
Commit message (Collapse)AuthorAgeFilesLines
* i965/tiled_memcpy: don't unconditionally use __builtin_bswap32Jonathan Gray2016-04-211-1/+14
| | | | | | | | | | Use the defines Mesa configure sets to indicate presence of the bswap32 builtins. This lets i965 work on OpenBSD again after the changes that were made in 0a5d8d9af42fd77fce1492d55f958da97816961a. Signed-off-by: Jonathan Gray <jsg@jsg.id.au> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/tiled_memcpy: Fix rgba8_copy_16_aligned_dst() typoKristian Høgsberg Kristensen2016-04-121-4/+4
| | | | | | | | | Copy and paste error in commit eafeb8db66dae7619ff3cb039706b990d718cba7: i965/tiled_memcpy: Unroll bytes==64 case. Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/tiled_memcpy: Unroll bytes==64 case.Matt Turner2016-04-121-0/+16
| | | | Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* i965/tiled_memcpy: Provide SSE2 for RGBA8 <-> BGRA8 swizzle.Roland Scheidegger2016-04-121-3/+40
| | | | | | | | | | | | | | | The existing code uses SSSE3, and because it isn't compiled in a separate file compiled with that, it is usually not used (that, of course, could be fixed...), whereas SSE2 is always present with 64-bit builds. This should be pretty much as fast as the pshufb version, albeit those code paths aren't really used on chips without llc in any case. v2: fix andnot argument order, add comments v3: use pshuflw/hw instead of shifts (suggested by Matt Turner), cut comments v4: [mattst88] Rebase Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/tiled_memcpy: Move SSSE3 code back into inline functions.Matt Turner2016-04-121-18/+24
| | | | | | This will make adding SSE2 code a lot cleaner. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* i965/tiled_memcpy: Optimize RGBA -> BGRA swizzle.Matt Turner2016-04-121-8/+11
| | | | | | | Replaces four byte loads and four byte stores with a load, bswap, rotate, store; or a movbe, rotate, store. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* i965/tiled_memcopy: Get rid of the direction parameter to get_memcpyJason Ekstrand2016-04-081-2/+1
| | | | | | | | | Now that we can use the much simpler rgba8_copy function, we don't need to hand different functions out based on direction. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Chad Versace <chad.versace@intel.com>
* i965/tiled_memcpy: Rework the RGBA -> BGRA mem_copy functionsJason Ekstrand2016-04-081-76/+63
| | | | | | | | | | | | | This splits the two copy functions into three: One for unaligned copies, one for aligned sources, and one for aligned destinations. Thanks to the previous commit, we are now guaranteed that the aligned ones will *only* operate on aligned memory so they should be safe. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93962 Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Chad Versace <chad.versace@intel.com>
* i965/tiled_memcopy: Add aligned mem_copy parameters to the [de]tiling functionsJason Ekstrand2016-04-081-32/+43
| | | | | | | | | | | | | | | | | | Each of the [de]tiling functions has three mem_copy calls: 1) Left edge to tile boundary 2) Tile boundary to tile boundary in a loop 3) Tile boundary to right edge Copies 2 and 3 start at a tile edge so the pointer to tiled memory is guaranteed to be at least 16-byte aligned. Copy 1, on the other hand, starts at some arbitrary place in the tile so it doesn't have any such alignment guarantees. Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Chad Versace <chad.versace@intel.com>
* i965: Enable tiled mem_copy with sRGB-formatted resourcesNanley Chery2016-02-241-2/+6
| | | | | | | | | | | RGBA8 and BGRA8 unorm formats are compatible with the various mem_copy functions. Their sRGB counterparts are also compatible because they're also color-renderable (of importance when the specified resource is a readbuffer) and they share the same physical layout. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* Revert "i965: Provide sse2 version for rgba8 <-> bgra8 swizzle"Roland Scheidegger2016-02-021-57/+12
| | | | | | | | | | This reverts commit ab30426e335116e29473faaafe8b57ec760516ee. Apparently the memory isn't quite as aligned when this gets called as it should be, causing crashes. (Albeit this looks independent from this code, should crash just as well if ssse3 is enabled when compiling without this patch.) https://bugs.freedesktop.org/show_bug.cgi?id=93962
* i965: Provide sse2 version for rgba8 <-> bgra8 swizzleRoland Scheidegger2016-02-021-12/+57
| | | | | | | | | | | | | The existing code used ssse3, and because it isn't compiled in a separate file compiled with that, it is usually not used (that, of course, could be fixed...), whereas sse2 is always present at least with 64bit builds. This should be pretty much as fast as the pshufb version, albeit those code paths aren't really used on chips without llc in any case. v2: fix andnot argument order, add comments v3: use pshuflw/hw instead of shifts (suggested by Matt Turner), cut comments Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Fix typos in licenseIan Romanick2015-09-101-2/+2
| | | | | | | | | | | | | | | | grep -lr 'sub license' | while read f; do \ sed --in-place -e 's/sub license/sublicense/' $f ;\ done grep -lr 'NON-INFRINGEMENT' | while read f; do \ sed --in-place -e 's/NON-INFRINGEMENT/NONINFRINGEMENT/' $f ;\ done As noted by Matt, both of these changes match the MIT license text found at http://opensource.org/licenses/MIT. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>
* i965: Mark paths in linear <-> tiled functions as unreachable().Matt Turner2015-03-171-0/+16
| | | | | | | | | text data bss dec hex filename 9663 0 0 9663 25bf intel_tiled_memcpy.o before 8215 0 0 8215 2017 intel_tiled_memcpy.o after Reviewed-by: Carl Worth <cworth@cworth.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Tell intel_get_memcpy() which direction the memcpy() is going.Matt Turner2015-03-051-38/+86
| | | | | | | | | | | | | | | | | | | | | | The SSSE3 swizzling code was written for fast uploads to the GPU and assumed the destination was always 16-byte aligned. When we began using this code for fast downloads as well we didn't do anything to account for the fact that the destination pointer given by glReadPixels() or glGetTexImage() is not guaranteed to be suitably aligned. With SSSE3 enabled (at compile-time), some applications would crash when an SSE aligned-store instruction tried to store to an unaligned destination (or an assertion that the destination is aligned would trigger). To remedy this, tell intel_get_memcpy() whether we're uploading or downloading so that it can select whether to assume the destination or source is aligned, respectively. Cc: 10.5 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89416 Tested-by: Uriy Zhuravlev <stalkerg@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/tiled_memcpy: Support a signed linear pitchJason Ekstrand2015-01-301-15/+15
| | | | Reviewed-by: Chad Versace <chad.versace@intel.com>
* i965/tiled_memcpy: Add tiled-to-linear pathsSisinty Sasmita Patra2015-01-261-0/+272
| | | | | | | | | | | | | | | | | This commit addes tiled copy functions for coping from tiled memory to linear memory. These are very similar to the existing linear-to-tiled paths. v2: Jason Ekstrand <jason.ekstrand@intel.com> - New commit message - Various whitespace fixes - Added ptrdiff_t casts as done in commit 225a09790 v3: Jason Ekstrand <jason.ekstrand@intel.com> - Fixed a comment Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Chad Versace <chad.versace@intel.com>
* i965: Refactor tiled memcpy functions and move them into their own fileSisinty Sasmita Patra2015-01-261-0/+450
This commit refactors the tiled_memcpy code in intel_tex_subimage.c and moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled respectively. The *_faster functions are similarly renamed. There was also a bit of logic to select between the the libc provided memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle. This was moved into an intel_get_memcpy function so that rgba8_copy can live (and be inlined) in intel_tiled_memcpy.c. v2: Jason Ekstrand <jason.ekstrand@intel.com> - Better commit message - Fix up the copyright on the intel_tiled_memcpy files - Various whitespace fixes - Moved a bunch of stuff that did not need to be exposed from intel_tiled_memcpy.h to intel_tiled_memcpy.c - Added proper documentation for intel_get_memcpy - Incorperated the ptrdiff_t tweaks from commit 225a09790 v3: Jason Ekstrand <jason.ekstrand@intel.com> - Fixed a comment - Move the tile size constants into the .c file Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Chad Versace <chad.versace@intel.com>