Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tf 214 #220

Open
wants to merge 140 commits into
base: master
Choose a base branch
from
Open

Tf 214 #220

wants to merge 140 commits into from

Conversation

Ergodice
Copy link

@Ergodice Ergodice commented Nov 8, 2023

No description provided.

Arcturai and others added 30 commits February 21, 2022 15:22
dense function combines dydense/dyrelu/linearscaling/gating into one function
Add logit gating, dense_layer, stop file, make dyrelu slopes/intercepts trainable
Weight gen (simple_gen) generates attention weights from each square by compressing then doing a batched dense to 64; buckets divides the training data based on material left.
Dytalking heads at this stage dynamically generates the projection matrices for the attention weights (same for all square pairs). Fixed set_visible_devices error by initializing tensorflow first in TFProcess and making DyDense temperature an instance attribute.
DyDense layers had issue saving sublayers so the design approach of the squeeze-excite layers is used, i.e., sublayers are moved outside into a function.
Fullgen compresses the tokens and combines them to extract global information into attention weights.
Dytalking heads adds residual to matrix specifying linear transformation
Removed old modules which were not useful including yaml references, also removed legacy resnet code. Added arc's encoding with option in yaml and also added example.yaml
update Readme to describe talking heads, fullgen, and dynamic kernel methods. Removed leelalogs and configs, except for example.yaml.
Fixed typo in tfprocess and Readme, made some config stuff oprtional, fixed arc encoding, removed fullgen bias
Added search_loss, which is one over the prediction for the best move, and confident_accuracy, which is the accuracy for positions where there is a clear best move. Removed simple gating. Also updated README to include info on auxiliary losses.
Smolgen is more efficient version of fullgen, also added square relu which adds 0.5% pol acc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants