Enhanced testing, added workflow caching, added lambda, fixed deployment-failing state_update.py bug #11

olsonadr · 2022-05-25T06:14:42Z

Merging feature branch "3-split-similarity-lambda-into-sub-lambdas" into main repo!

Major changes here are:

Added virtual environment and npm caching in Actions workflows that drastically speeds up repeated work (cached per-branch, keyed by versions and file-hashes)
Fixed issue with state_update.py failing on long input text by truncating to 2000 words (max is 4096 but GitHub Actions runs out of memory, and we thought optimizing that to raise this limit could be another PR).
Expanded testing to ensure that AWS architecture matches what is expected (must modify expected architecture template in tests/unit/testing_materials/expected_template.json before arbitrarily changing this (hopefully enforcing test-based-dev)
Renamed lambda similar to text-to-db-similar
Created new lambda (using test-based-dev) embed-to-db-similar that does the same as above but with embedding input (potentially to be updated in soon to be PR based on your issue API Does Not Return a Web Format #9 )
Refactored and cleaned some code for readability, error handling, and documentation

Next steps are:

Add 3 lambdas to: (wanted to start with what we have and get things to you fast, and I'll refocus on these next two tonight or tomorrow morning)
- generate embedding from input text
- to get cosine sim between to embeddings directly
- get cosine sim between to input texts directly (generating embeddings under the hood)
Potentially switch up embed formatting in api responses
Work to raise the truncation word limit (2000->4096)

As a warning, the first test runs may take some time as there are more lambdas and more tests, and first deployment takes a very long time (~30 mins) because it has to delete the infrastructure associated with the similar lambda and replace it with 2 new lambdas

…bdas into 5-testing-suite

Co-authored-by: Nick Olson <[email protected]>

…bdas into 5-testing-suite

Co-authored-by: Nick Olson <[email protected]>

…bdas into 5-testing-suite

…ting in similar.py. Updated state.csv, and renamed old one for testing purposes.

…y, cleaned up testrun.py and formatted apitest.py

Co-authored-by: Nick Olson <[email protected]>

- GET request query strings were wrapped in multiple quotes, so we now literal_eval up to 10 times - GET requests were being passed with { \" } in the url instead of { " } around the text/embed parameter Also added... - Added descriptive exceptions for bad statusCode's on api testing

… test

…ine model max length, and thus truncate abnormally long tensors of tokens, solving an indexing error encountered while updating incident 195. Removed manual truncation in similar.py. Refactored the names of a couple constants in similar.py.

…te-abnormally-long-tokens

Truncate abnormally long tokens

olsonadr · 2022-05-25T06:22:43Z

Im not sure why testing failed immediately on AWS credentials here. Do you expect this?

smcgregor · 2022-05-25T06:25:08Z

Pull requests likely are not getting the environment variables since that is a security risk. Should I squash/merge this to main and we sort it out there?

olsonadr · 2022-05-25T06:28:50Z

misc_utilities/create_embedding.py

@@ -0,0 +1,154 @@
+# Imports


This file is used mostly for dev and scratch work, and generated the embeddings used in the example invokes with embeddings in the testing_materials folder

olsonadr · 2022-05-25T06:29:46Z