| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
| |
| |
| |
| | |
If last instruction is an CF_INST_ALU we don't need to emit an
additional CF_INST_POP for stack clean up after an IF ELSE ENDIF.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
tgsi_helper_copy is used on several occasions to copy a temporary result
into the real destination register to emulate writemasks for OP3 and
reduction operations. According to R600 ISA that's unnecessary.
This patch fixes this use for MAD, CMP and DP4.
|
| | |
|
| |
| |
| |
| |
| | |
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
This avoid any issue when context is free and we still try to
access fence through radeon structure.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
If you want to enable noop set GALLIUM_NOOP=1 as an env variable.
You need first to enable noop wrapping for your driver see change
to src/gallium/targets/dri-r600/ in this commit as an example.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
|
| |
| |
| |
| | |
Performance++.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| | |
still a bit of work to do, the winsys gen setting is a bit of a hack.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
https://bugs.freedesktop.org/show_bug.cgi?id=32912
The fix is to call update_derived_state before user buffer uploads.
I've also moved some code around.
Unfortunately, there are still some ZMASK-related bugs which cause
misrendering, i.e. flushing doesn't always work and glean/fbo fails.
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The motivation behind this rework is to get some speed by reducing
CPU overhead. The performance increase depends on many factors,
but it's measurable (I think it's about 10% increase in Torcs).
This commit replaces libdrm's radeon_cs_gem with our own implemention.
It's optimized specifically for r300g, but r600g could use it as well.
Reloc writes and space checking are faster and simpler than their
counterparts in libdrm (the time complexity of all the functions
is O(1) in nearly all scenarios, thanks to hashing).
(libdrm's radeon_bo_gem is still being used in the driver.)
It works like this:
cs_add_reloc(cs, buf, read_domain, write_domain) adds a new relocation and
also adds the size of 'buf' to the used_gart and used_vram winsys variables
based on the domains, which are simply or'd for the accounting purposes.
The adding is skipped if the reloc is already present in the list, but it
accounts any newly-referenced domains.
cs_validate is then called, which just checks:
used_vram/gart < vram/gart_size * 0.8
The 0.8 number allows for some memory fragmentation. If the validation
fails, the pipe driver flushes CS and tries do the validation again,
i.e. it validates only that one operation. If it fails again, it drops
the operation on the floor and prints some nasty message to stderr.
cs_write_reloc(cs, buf) just writes a reloc that has been added using
cs_add_reloc. The read_domain and write_domain parameters have been removed,
because we already specify them in cs_add_reloc.
The space checking has been tested by putting small values in vram/gart_size
variables.
|
| |
| |
| |
| |
| | |
Occurred because the code assumed that buf->domain would remain
equal to old_domain.
|
| |
| |
| |
| | |
There are actually applications that profit immensely from this.
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
... instead of the next fence to be emitted. This way we have a
chance to reclaim the storage earlier.
|
| |
| |
| |
| |
| |
| |
| | |
Invalid after be1af4394e060677b7db6bbb8e3301e38a3363da (user buffer
creation with width0 == ~0).
Signed-off-by: Marek Olšák <maraeo@gmail.com>
|
| |
| |
| |
| | |
r600_upload_const_buffer().
|
| | |
|
| | |
|
| |
| |
| |
| | |
r600_bc_add_alu_type().
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
This only uploads the [min_index, max_index] range instead of [0, userbuf size],
which greatly speeds up user buffer uploads.
This is also a prerequisite for atomizing vertex arrays in st/mesa.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This provides an upload facility for the constant buffers since Marek's
constants in user buffers changes.
gears at least work on my evergreen now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
| |
| |
| |
| | |
This adds support for Barts, Turks, and Caicos asics.
|
| |
| |
| |
| |
| |
| |
| | |
The draw module set new state that didn't require swtnl which caused need_swtnl to
be unset. This caused the call from to svga_update_state(svga, SVGA_STATE_SWTNL_DRAW)
from the vbuf backend to overwrite the vdecls we setup there to be overwritten with
the real buffers vdecls.
|
| |
| |
| |
| | |
https://bugs.freedesktop.org/show_bug.cgi?id=32634
|
| |
| |
| |
| |
| |
| | |
It's no longer needed because the upload buffer remains mapped while the CS
is being filled (openarena, ut2004 and others that this code was for do not
use VBOs by default).
|
| |
| |
| |
| | |
because the upload buffers are reused for subsequent draw operations.
|
| |
| |
| |
| | |
So that a state tracker can unreference them after set_vertex_buffers.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Added a parameter to specify a minimum offset that should be returned.
r300g needs this to better implement user buffer uploads. This weird
requirement comes from the fact that the Radeon DRM doesn't support negative
offsets.
- Added a parameter to notify a driver that the upload flush occured.
A driver may skip buffer validation if there was no flush, resulting
in a better performance.
- Added a new upload function that returns a pointer to the upload buffer
directly, so that the buffer can be filled e.g. by the translate module.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
From the r600 ISA:
Each ALU clause can lock up to four sets of constants
into the constant cache. Each set (one cache line) is
16 128-bit constants. These are split into two groups.
Each group can be from a different constant buffer
(out of 16 buffers). Each group of two constants consists
of either [Line] and [Line+1] or [line + loop_ctr]
and [line + loop_ctr +1].
For supporting more than 64 constants, we need to
break the code into multiple ALU clauses based
on what sets of constants are needed in that clause.
Note: This is a candidate for the 7.10 branch.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
|
|\ \ |
|
| | | |
|
| | | |
|