Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize/add static math functions #354

Merged
merged 147 commits into from
Mar 20, 2022
Merged
Show file tree
Hide file tree
Changes from 146 commits
Commits
Show all changes
147 commits
Select commit Hold shift + click to select a range
9f00417
Improve cpp_tireal_pi_montecarlo
runer112 Jun 1, 2021
48f64cb
Provide optimized static versions of __smuls and __smulu
runer112 Jun 1, 2021
d61333f
Touch up smulu.src
runer112 Jun 1, 2021
8db0d93
Provide optimized static versions of __imuls and __imulu
runer112 Jun 1, 2021
508fa7a
Rework doc in smulu.src and imulu.src
runer112 Jun 2, 2021
51130a7
Mark static versions of __smuls and __smulu as "fast"
runer112 Jun 2, 2021
4dc950e
Reimplement proper static versions of __smuls and __smulu
runer112 Jun 2, 2021
76980b9
Mark static versions of __imuls and __imulu as "fast"
runer112 Jun 2, 2021
76f2366
Reimplement proper static versions of __imuls and __imulu
runer112 Jun 2, 2021
370c07f
Fix some docs
runer112 Jun 2, 2021
41aaf2c
Reformat asm
runer112 Jun 2, 2021
0e7425d
This looks nicer
runer112 Jun 2, 2021
857a45c
Implement __lmuls_fast and __lmulu_fast
runer112 Jun 2, 2021
9326b14
Provide optimized static versions of __lmuls and __lmulu
runer112 Jun 2, 2021
36ebb4b
Provide optimized versions of __sand, __sor, and __sxor
runer112 Jun 3, 2021
62f6507
This order may eventually better allow for hijacking
runer112 Jun 3, 2021
becfdac
Provide optimized versions of __iand, __ior, and __ixor
runer112 Jun 3, 2021
7312406
Provide optimized versions of __land, __lor, and __lxor
runer112 Jun 3, 2021
f9e57aa
Provide optimized versions of __scmpzero, __icmpzero, __lcmpzero, and…
runer112 Jun 3, 2021
f6f95e2
Provide optimized versions of __ladd and __lladd
runer112 Jun 3, 2021
2da942f
Don't forget to propagate carry
runer112 Jun 3, 2021
d1337cd
Provide optimized versions of __lsub and __llsub
runer112 Jun 3, 2021
8a37d3f
Optimize __lladd_fast
runer112 Jun 3, 2021
673ba00
Fix stack frame bugs
runer112 Jun 3, 2021
dfa34fd
Provide optimized versions of __lland, __llor, and __llxor
runer112 Jun 3, 2021
c5ca508
Implement enough junk for my long long test program to link
runer112 Jun 4, 2021
ba8d751
Fix __llsub
runer112 Jun 4, 2021
e7d9047
Fix __lldivu_b and __llmulu_b
runer112 Jun 4, 2021
9f0220c
Fix __llcmpu and __llcmpzero
runer112 Jun 4, 2021
1217ded
Optimize __bpopcnt, __spopcnt, __ipopcnt, and __lpopcnt
runer112 Jun 4, 2021
e238918
Implement __llpopcnt
runer112 Jun 4, 2021
9deabc1
Reimagine more optimized versions of __spopcnt, __ipopcnt, __lpopcnt,…
runer112 Jun 4, 2021
10545ba
Hijack __lpopcnt from __ipopcnt
runer112 Jun 4, 2021
3a22ff5
Fix and optimize __sbitrev, __ibitrev, and __lbitrev
runer112 Jun 5, 2021
feeaf4e
Implement __llbitrev
runer112 Jun 5, 2021
c33d063
Optimize __snot, __inot, and __lnot
runer112 Jun 5, 2021
1d148f8
Implement __llnot
runer112 Jun 5, 2021
a64bcf8
Tweak __inot and __lnot
runer112 Jun 5, 2021
300a782
Optimize __sneg, __ineg, and __lneg
runer112 Jun 5, 2021
a3254dd
Fix __lneg
runer112 Jun 5, 2021
3541c34
Implement __llneg
runer112 Jun 5, 2021
43616cb
Implement suboptimal __llshl
runer112 Jun 5, 2021
892d3c2
Slightly optimize __llshl
runer112 Jun 5, 2021
c9f518f
Fix and slightly optimize __llshrs
runer112 Jun 5, 2021
ce4b81e
Implement suboptimal __llshru
runer112 Jun 5, 2021
d962415
Provide (slightly) optimized versions of __bshl, __sshl, __ishl, and …
runer112 Jun 6, 2021
5db7e50
Fix and slightly optimize __llshrs and __llshru
runer112 Jun 6, 2021
1580ad1
Provide slightly optimized versions of __bshrs, __bshru, __sshrs, __s…
runer112 Jun 6, 2021
1f79f76
Provide optimized versions of _tolower and _toupper
runer112 Jun 7, 2021
f3e66a6
Remove some leftover, unused equates
runer112 Jun 7, 2021
5f33720
Fix copy-pate oversight
runer112 Jun 7, 2021
1fd73f6
Provide optimized versions of __ladd_b
runer112 Jun 13, 2021
49a332b
Implement __lladd_b
runer112 Jun 13, 2021
153d9e9
Implement some optimized shifts by one
runer112 Jun 13, 2021
83a8159
Implement optimized add and sub 1
runer112 Jun 13, 2021
8771240
Slightly optimize __lladd_b
runer112 Jun 13, 2021
8d35d37
Implement really slow __llmuls/__llmulu
runer112 Jul 17, 2021
f3f65ee
Fix __llmuls/__llmulu
runer112 Jul 17, 2021
89d9409
Fix __llneg and __llneg_fast
runer112 Jul 18, 2021
844a7ae
Optimize _labs
runer112 Jan 23, 2022
53b4694
Fix a bug in __lldivu_b?
runer112 Jan 23, 2022
14f1ba6
Implement _llabs
runer112 Jan 23, 2022
cf69f60
Implement atrociously slow __lldivu and __llremu
runer112 Jan 23, 2022
3ccc786
Fix a bug in __lldvrmu
runer112 Jan 23, 2022
5035e09
Implement atrociously slow __lldivs
runer112 Jan 23, 2022
6f11028
Implement atrociously slow __llrems
runer112 Jan 23, 2022
6c1b60a
Add the math test program I've been using this whole time
runer112 Jan 23, 2022
87cf376
Use fixed (right) output alignment
runer112 Jan 23, 2022
5de94c6
Hackily combine unary and binary op testing
runer112 Jan 23, 2022
968d69c
Merge branch 'master' into opt-static-math
runer112 Jan 23, 2022
1dde047
Fix whereami on Windows
runer112 Jan 23, 2022
702cc3b
OUTPUT_MAP is now enabled by default
runer112 Jan 23, 2022
50d84e7
Fix linking to the OS's __snot
runer112 Jan 23, 2022
1369b76
HAS_FLASH_FUNCTIONS -> STATIC_CRT
runer112 Jan 23, 2022
c57f412
Remove an unnecessary adl assumption
runer112 Jan 23, 2022
00730df
Merge branch 'master' into opt-static-math
runer112 Jan 23, 2022
253254d
Add some EOF newlines
runer112 Jan 23, 2022
f5bfa60
Fix the calling convention used by _tolower and _toupper
runer112 Jan 23, 2022
3d1d8d9
Optimize __llneg_fast
runer112 Feb 6, 2022
7e4c905
Add math_test autotest
runer112 Feb 6, 2022
768d79d
Optimize __bdivu, __bdivs, __bremu, and __brems
runer112 Feb 6, 2022
23debde
Fix __bdvrmu
runer112 Feb 7, 2022
656b861
Rename __brem_common to the more correct __bdvrms_common
runer112 Feb 7, 2022
8526651
Fix inputs to __bremu and __brems
runer112 Feb 7, 2022
08bce2c
Fix extern reference
runer112 Feb 9, 2022
fdf226e
Fix __bdvrms_common
runer112 Feb 9, 2022
17a9971
Reorder code in __bdvrms_common
runer112 Feb 9, 2022
bcbe16b
Implement optimized __idivu, __idivs, __iremu, and __irems
runer112 Feb 9, 2022
7c8954e
Fix __idvrmu
runer112 Feb 9, 2022
d7e7f34
Add missing adl assumes
runer112 Feb 9, 2022
d696fb3
Implement optimized __ldivu and __lremu
runer112 Feb 12, 2022
94e8b05
Implement less optimal __ldivu and __ldivs
runer112 Feb 12, 2022
8c379ae
Let's make this valid C
runer112 Feb 12, 2022
1b5ead2
Fix indentation
runer112 Feb 12, 2022
2de19c7
Fix the C version of __ldvrmu
runer112 Feb 13, 2022
933fffb
Implement optimized __ldivs and __lrems
runer112 Feb 13, 2022
a7f8cca
__bdvrms_common -> __bdvrms
runer112 Feb 13, 2022
92a3c1d
Slightly optimize __bdvrms
runer112 Feb 13, 2022
3f8b26d
Share code common to __ldivs and __lrems
runer112 Feb 13, 2022
7614597
Disable OS-linked __lrems due to a bug
runer112 Feb 13, 2022
7955498
Implement __idvrms
runer112 Feb 13, 2022
857bfa3
Optimize div
runer112 Feb 16, 2022
eb2c8e0
Test div and ldiv
runer112 Feb 16, 2022
248cd02
Normalize test macro parameter order
runer112 Feb 16, 2022
c8e79e5
Slightly optimize __ldivu
runer112 Feb 16, 2022
fef3723
Optimize part of __lldivs for size
runer112 Feb 17, 2022
04c2db8
Optimize llabs slightly
runer112 Feb 17, 2022
3603dea
Optimize __lldivs slightly
runer112 Feb 17, 2022
285cd00
Fix comments in __lldvrmu
runer112 Feb 17, 2022
1414ea7
Implement lldiv and imaxdiv
runer112 Feb 17, 2022
6346a37
Test lldiv
runer112 Feb 17, 2022
b9c561a
Add *div results to autotest
runer112 Feb 17, 2022
e708b96
Fix definition of imaxdiv_t
runer112 Feb 17, 2022
11c20b8
Optimize ldiv
runer112 Feb 19, 2022
fea7ef5
Optimize ldiv slightly
runer112 Feb 19, 2022
4cdb4ea
add simple strtoll and strtoull
mateoconlechuga Mar 5, 2022
b33ebac
`atos` isn't a thing
runer112 Mar 5, 2022
6c32d24
Merge branch 'master' into opt-static-math
runer112 Mar 5, 2022
3bd439b
add source files to a section
mateoconlechuga Mar 5, 2022
d721e34
Add a missing section
runer112 Mar 5, 2022
c8de1c8
merge 'master' into 'opt-static-math'
mateoconlechuga Mar 5, 2022
3e58485
Optimize lcmps
runer112 Mar 7, 2022
c2ae8af
Optimize scmpzero, icmpzero, lcmpzero, and llcmpzero
runer112 Mar 7, 2022
a3f9ef2
Fix llcmpu
runer112 Mar 7, 2022
f8f54c2
Implement llcmps
runer112 Mar 7, 2022
1356316
add missing lcmps label
mateoconlechuga Mar 7, 2022
94052f6
Preserve AF in lland, llor, and llxor
runer112 Mar 8, 2022
210b033
Fix lldivs and llrems
runer112 Mar 8, 2022
6f3a13c
Implement atoll
runer112 Mar 8, 2022
2df25d8
Fix lldivs and llrems more
runer112 Mar 8, 2022
6beb48e
Remove duplicate data
runer112 Mar 18, 2022
11123a1
Clear any args before launching autotest
runer112 Mar 18, 2022
5817366
Fix bremu and brems
runer112 Mar 18, 2022
9564ec6
Add a missing section
runer112 Mar 18, 2022
0546bcf
Fix brems more
runer112 Mar 19, 2022
d156c4d
Probably fix llshrs_fast
runer112 Mar 19, 2022
1fd3f95
Implement llshru_fast
runer112 Mar 19, 2022
862fc4f
Fix lnot_fast
runer112 Mar 19, 2022
be8f4e4
Fix brems again (and bdivs)
runer112 Mar 19, 2022
66259f4
Fix brems, hopefully for good
runer112 Mar 19, 2022
dc5443b
Fix llshrs_fast and llshru_fast
runer112 Mar 19, 2022
0fdd2da
Optimize llshl slightly
runer112 Mar 19, 2022
1d9865b
Implement optimized sdivu, sdivs, sremu, and srems
runer112 Mar 19, 2022
77f9eeb
Fix sdivu and sdivs to preserve A
runer112 Mar 19, 2022
ffbe5bb
Fix llshrs_fast and llshru_fast for shift by zero
runer112 Mar 19, 2022
03e8e0a
Implement size-optimized bctlz, sctlz, ictlz, lctlz, and llctlz
runer112 Mar 20, 2022
0a443b2
Merge branch 'master' into opt-static-math
runer112 Mar 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 27 additions & 24 deletions examples/standalone/cpp_tireal_pi_montecarlo/src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,54 +2,57 @@
#include <tice.h>
#include <tireal.hpp>

#define ITER_MAX 15000

using namespace ti::literals;

static char buf[24] = "PI is about ";
#define buf_offset 12

int main(void)
{
int count = 0; /* points in the unit circle's first quadrant */

/* Clear the screen */
os_ClrHomeFull();

/* Set the random seed based off the real time clock */
srand(rtc_Time());

os_SetCursorPos(0, 0);
unsigned i = 0;
constexpr unsigned iMax = 10'000;
unsigned count = 0; /* points in the unit circle's first quadrant */

auto print = [&]()
{
const auto piApprox = ti::real(count) / i * 4;

char buf[24] = "PI is about ";
constexpr size_t bufOffset = 12;
piApprox.toCString(buf + bufOffset);

for (int i = 0; i < ITER_MAX; i++)
os_PutStrFull(buf);
os_NewLine();
};

while (++i <= iMax)
{
const ti::real x = ti::real(rand()) / RAND_MAX;
const ti::real y = ti::real(rand()) / RAND_MAX;
const ti::real z = x*x + y*y;
const auto x = ti::real(rand()) / RAND_MAX;
const auto y = ti::real(rand()) / RAND_MAX;
const auto z = x*x + y*y;
if (z <= 1)
{
count++;
}
if (i % 150 == 0) // Just to print some things along the way...
{
(ti::real(count) / ITER_MAX * 4).toCString(buf+buf_offset);
os_PutStrFull(buf);
os_NewLine();
}

if (os_GetCSC())
{
break;
}

if (i % 100 == 0)
{
print();
}
}

(ti::real(count) / ITER_MAX * 4).toCString(buf+buf_offset);
os_PutStrFull(buf);
os_NewLine();
print();

os_NewLine();
os_PutStrFull("Press any key to quit");
os_NewLine();

while (!os_GetCSC());

return 0;
}
228 changes: 228 additions & 0 deletions examples/standalone/math_test/autotest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
{
"transfer_files": [
"bin/DEMO.8xp"
],
"target": {
"name": "DEMO",
"isASM": true
},
"sequence": [
"key|clear",
"delay|1000",
"key|0",
"key|enter",
"action|launch",
"hashWait|not",
"key|enter",
"hashWait|neg",
"key|enter",
"hashWait|abs",
"key|enter",
"hashWait|bitrev",
"key|enter",
"hashWait|popcnt",
"key|enter",
"hashWait|and",
"key|enter",
"hashWait|or",
"key|enter",
"hashWait|xor",
"key|enter",
"hashWait|add",
"key|enter",
"hashWait|sub",
"key|enter",
"hashWait|shl",
"key|enter",
"hashWait|shru",
"key|enter",
"hashWait|shrs",
"key|enter",
"hashWait|mulu",
"key|enter",
"hashWait|divu",
"key|enter",
"hashWait|remu",
"key|enter",
"hashWait|divs",
"key|enter",
"hashWait|rems",
"key|enter",
"hashWait|div_q",
"key|enter",
"hashWait|div_r",
"key|enter",
"hashWait|done"
],
"hashes": {
"not": {
"description": "not",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"0B3D9374"
]
},
"neg": {
"description": "neg",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"10A84E1B"
]
},
"abs": {
"description": "abs",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"D7807035"
]
},
"bitrev": {
"description": "bitrev",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"BBADF82E"
]
},
"popcnt": {
"description": "popcnt",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"62628040"
]
},
"and": {
"description": "and",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"32DEF9D5"
]
},
"or": {
"description": "or",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"4E247920"
]
},
"xor": {
"description": "xor",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"257F7002"
]
},
"add": {
"description": "add",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"CD575A4D"
]
},
"sub": {
"description": "sub",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"EE1C5453"
]
},
"shl": {
"description": "shl",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"FD8CA2EC"
]
},
"shru": {
"description": "shru",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"9754CB4F"
]
},
"shrs": {
"description": "shrs",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"AB5BE5CB"
]
},
"mulu": {
"description": "mulu",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"C6A6FF7D"
]
},
"divu": {
"description": "divu",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"F973C020"
]
},
"remu": {
"description": "remu",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"5196C205"
]
},
"divs": {
"description": "divs",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"2A13A348"
]
},
"rems": {
"description": "rems",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"F45969EF"
]
},
"div_q": {
"description": "div_q",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"EAFFC6D3"
]
},
"div_r": {
"description": "div_r",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"B9786654"
]
},
"done": {
"description": "done",
"start": "vram_start",
"size": "vram_16_size",
"expected_CRCs": [
"101734A5",
"FFAF89BA"
]
}
}
}
21 changes: 21 additions & 0 deletions examples/standalone/math_test/makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# ----------------------------
# Makefile Options
# ----------------------------

NAME ?= DEMO
ICON ?= icon.png
DESCRIPTION ?= "CE C Toolchain Demo"
COMPRESSED ?= NO
ARCHIVED ?= NO
# STATIC_CRT ?= NO

CFLAGS ?= -Os -mllvm -inline-threshold=100 -Wall -Wextra
CXXFLAGS ?= -Os -mllvm -inline-threshold=100 -Wall -Wextra

# ----------------------------

ifndef CEDEV
$(error CEDEV environment path variable is not set)
endif

include $(CEDEV)/meta/makefile.mk
7 changes: 7 additions & 0 deletions examples/standalone/math_test/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Math test

Exercises a bunch of math functions.

---

This demo is part of the CE C SDK Toolchain.
Loading