Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update stfs-writer to the latest master #44

Open
wants to merge 861 commits into
base: stfs-writer
Choose a base branch
from

Conversation

epozzobon
Copy link

As mentioned on #43, I made an attempt at updating the stfs-writer to the latest commit in the master branch.
This seems to work on the games I play, but I would appreciate some more testing.

Triang3l and others added 30 commits May 22, 2022 21:46
While the alpha of the texture data is not used at all (replaced with blue using the view swizzle), still make the shader code state the intention more explicitly if the format is decompressed for use as signed. Unsigned 1.0 is 0xFF, while signed 1.0 is 0x7F.
The resolution scale is now taken into account when copying from the mip tail.
Triang3l and others added 20 commits April 9, 2023 18:07
Keep the current lane active as it may be needed for derivatives.
There's no limit on the number of memory exports in a shader on the real
Xenos, and exports can be done anywhere, including in loops. Now, instead
of deferring the exports to the end of the shader, and assuming that export
allocs are executed only once, Xenia flushes exports when it reaches an
alloc (allocs terminate memory exports on Xenos, as well as individual ALU
instructions with `serialize`, but not handling this case for simplicity,
it's only truly mandatory to flush memory exports before starting a new
one), the end of the shader, or a pixel with outstanding exports is killed.

To know which eM# registers need to be flushed to the memory, traversing
the successors of each exec potentially writing any eM#, and specifying
that certain eM# registers might have potentially been written before each
reached control flow instruction, until a flush point or the end of the
shader is reached.

Also, some games export to sub-32bpp formats. These are now supported via
atomic AND clearing the bits of the dword to replace followed by an atomic
OR inserting the new byte/short.
There can be jumps across an exece, so the code beyond it may still be
executed.
I don't know of any title that utilizes this instruction, but I went
ahead and implemented it for completeness.

Verified the implementation with `instr__gen_vaddcuw` from xenia-project#1348. Can be
grabbed with:
```
git checkout origin/gen_tests -- src\xenia\cpu\ppc\testing\*vaddcuw.s
```
Other half of xenia-project#2125. I don't know of any title that utilizes this instruction, but I went ahead and implemented it for completeness.

Verified the implementation with `instr__gen_vsubcuw` from xenia-project#1348. Can be grabbed with:
```
git checkout origin/gen_tests -- src\xenia\cpu\ppc\testing\*vsubcuw.s
```
AVX512 has native unsigned integer comparisons instructions, removing
the need to XOR the most-significant-bit with a constant in memory to
use the signed comparison instructions. These instructions only write to
a k-mask register though and need an additional call to `vpmovm2*` to
turn the mask-register into a vector-mask register.

As of Icelake:
`vpcmpu*` is all L3/T1
`vpmovm2d` is L1/T0.33
`vpmovm2{b,w}` is L3/T0.33

As of Zen4:
`vpcmpu*` is all L3/T0.50
`vpmovm2*` is all L1/T0.25
Plus: limit it to 64 entries
Thanks to Bo98 for pointing that out
@epozzobon epozzobon force-pushed the stfs-writer branch 2 times, most recently from 00aba94 to 06ed9ab Compare July 8, 2023 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.