Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check how the application works with postgresql #267

Open
PonteIneptique opened this issue Oct 5, 2021 · 26 comments
Open

Check how the application works with postgresql #267

PonteIneptique opened this issue Oct 5, 2021 · 26 comments
Assignees

Comments

@PonteIneptique
Copy link
Member

Is your feature request related to a problem? Please describe.

The current app shows limits with SQLite: it's slow, and does not benefit from a fast SQL engine when having different users doing multiple writes.

Describe the solution you'd like

  • Check that tests pass with postgresql
  • Identify or fix the issues raised by tests with postgresql
  • Document the issues :)
@FrFerry
Copy link
Contributor

FrFerry commented Mar 1, 2022

Here is what to do to load a sqlite dump in a postgresql database:

  • First, delete some inconsistencies in the dump by open it with sqlite and use these commands:
DELETE FROM word_token WHERE corpus IN (SELECT DISTINCT wt.corpus FROM word_token as wt LEFT OUTER JOIN corpus as c ON c.id = wt.corpus WHERE c.id is NULL);

DELETE FROM token_history WHERE word_token_id IN (SELECT DISTINCT th.word_token_id FROM token_history as th LEFT OUTER JOIN word_token as wt ON wt.id = th.corpus WHERE wt.id is NULL);

DELETE FROM token_history WHERE corpus IN (SELECT DISTINCT th.corpus FROM token_history as th LEFT OUTER JOIN corpus as c ON c.id = th.corpus WHERE c.id is NULL);

DELETE FROM bookmark WHERE corpus_id IN (SELECT DISTINCT b.corpus_id FROM bookmark as b LEFT OUTER JOIN corpus as c ON c.id = b.corpus_id WHERE c.id is NULL);

DELETE FROM bookmark WHERE token_id IN (SELECT DISTINCT b.token_id FROM bookmark as b LEFT OUTER JOIN word_token as wt ON wt.id = b.token_id WHERE wt.id is NULL);

DELETE FROM favorite WHERE corpus_id IN (SELECT DISTINCT f.corpus_id FROM favorite as f LEFT OUTER JOIN corpus as c ON c.id = f.corpus_id WHERE c.id is NULL);

DELETE FROM corpus_custom_dictionary WHERE corpus IN (SELECT DISTINCT ccd.corpus FROM corpus_custom_dictionary as ccd LEFT OUTER JOIN corpus as c ON c.id = ccd.corpus WHERE c.id is NULL);
  • Then, create an empty postgresql database
  • initialize it with python manage.py db-create
  • install pgloader (https://github.com/dimitri/pgloader)
  • modify the file pyrrha_pgload.txt with information about source and target databases (the file is here:
    pyrrha_pgload.txt )
    d the file with this comment)
  • run the command pgloader pyrrha_pgload.txt
  • check if there is no error. The command should print some warnings, because postgresql have more constraints than sqlite

(I added .txt in file name only because github asked it, it's only a file with a command in it)

@MrGecko MrGecko self-assigned this Mar 1, 2022
@MrGecko
Copy link
Contributor

MrGecko commented Mar 28, 2022

@FrFerry I have only one error during the pgloader moment:

2022-03-28T13:09:13.673334+02:00 ERROR Database error 22001: valeur trop longue pour le type character varying(64)
CONTEXT: COPY corpus, ligne 401, colonne name : « Extrait des Lunettes de Princes (ca 1461-1465) de Jean Meschinot, d’après le manuscript BnF, fr. ... »

Result: the corpus table is empty.

Yet I can create fresh new corpora and manage them as will, so nothing apparently broke atm.

@FrFerry
Copy link
Contributor

FrFerry commented Mar 30, 2022

@FrFerry I have only one error during the pgloader moment:

2022-03-28T13:09:13.673334+02:00 ERROR Database error 22001: valeur trop longue pour le type character varying(64)
CONTEXT: COPY corpus, ligne 401, colonne name : « Extrait des Lunettes de Princes (ca 1461-1465) de Jean Meschinot, d’après le manuscript BnF, fr. ... »

Result: the corpus table is empty.

Yet I can create fresh new corpora and manage them as will, so nothing apparently broke atm.

Well ... indeed we have more than 64 characters. These kind of restriction wasn't check by sqlite ?

In my opinion we have 2 solutions here:

  • manually change too long titles;
  • remove/change the limit.

What do you think ?

@PonteIneptique
Copy link
Member Author

We should remove the limit, or set it to a higher count, such as 254

@FrFerry
Copy link
Contributor

FrFerry commented Mar 31, 2022

We should remove the limit, or set it to a higher count, such as 254

I pushed the modification on the pull request.

@MrGecko
Copy link
Contributor

MrGecko commented Apr 4, 2022

Even though it now seems to be working, I would not merge the PR before the tests are fixed (too many failures related to sql constraints) or investigated. Might be discussed/worked in an other issue though.

@carinedengler
Copy link
Contributor

Hi, I'm taking over for @FrFerry for this ticket.

@FrFerry told me that the tests are mainly broken because of constraints that SQLite did not check but PostgreSQL does, but that these are not related to the behavior of the application. He proposed to not migrate the tests to PostgreSQL and instead to continue to use SQLite for the tests. What is your position on this?

@carinedengler
Copy link
Contributor

I also opened a new PR with @FrFerry 's commits and my own changes to continue working on this ticket, could you close #274 to avoid confusion?

@carinedengler
Copy link
Contributor

I am currently trying to install the application using PostgreSQL - however, I cannot find the information on which version we have agreed on (if any) ? I'd go for PostgreSQL 14 unless you have some restrictions on your side ?

@PonteIneptique
Copy link
Member Author

Please go for it. AFAIK, we should have no issues with PSQL 14 and I am not sure that there would be huge breaking change with 13 :)

@MrGecko
Copy link
Contributor

MrGecko commented May 3, 2022

I've done my tests with the v13.6 so far and its ok but please, proceed with v14 :)

@carinedengler
Copy link
Contributor

OK, thanks for your reply!

@carinedengler
Copy link
Contributor

Just to clarify this point, when I run the tests on #280 they fail

  • when running with SQLite
  • and when running with PostgreSQL

when you say the tests need to be fixed, do you mean the tests running using SQLite? Do you want us to port the tests as well to PostgreSQL?

@PonteIneptique
Copy link
Member Author

Hi @carinedengler, you should really check the current version of the dev branch, as it includes fixes for the current sqlite-only version of the application. Some tests were known for having a chance of failing (80% of the time), this seems to have fixed it. This also raises requirements version for security reasons.

@carinedengler
Copy link
Contributor

I checked the dev branch out this morning and rebased the postgresql branch on it, the 3 failing tests (using SQLite) I have are related to the changes made for the PostgreSQL port. The question is therefore whether you would want me to fix the tests so they can be run on SQLite after the port to PostgreSQL or whether you would want me to port the SQLite tests to PostgreSQL ?

@PonteIneptique
Copy link
Member Author

The change I am talking about seems to have been merged after your last commits ( #281 ), are you sure you are up to date ? I pinged you on the PR with the current PR fixing those tests.

@carinedengler
Copy link
Contributor

carinedengler commented May 3, 2022

I must have fetched/rebased before you committed your latest changes, I should be up to date now

  • I have less failing tests with PostgreSQL, still a few though
  • still the same failing tests for SQLite though (which are related to the PostgreSQL port)

should I correct the tests for SQLite or for PostgreSQL ? or both ?

@PonteIneptique
Copy link
Member Author

should I correct the tests for SQLite or for PostgreSQL ? or both ?

The application should run on both SQLite and PostgreSQL, as it is still used locally by some researchers.

CI Tests should also actually have a version for SQLite and one for PostgreSQL

@carinedengler
Copy link
Contributor

OK, thanks for the clarification

@carinedengler
Copy link
Contributor

when creating a new development instance with python manage.py run what should be the default DBMS used ?

@PonteIneptique
Copy link
Member Author

I think the default in dev mode should be SQLite, but we could have a new dev settings as Dev-Postgresql to facilitate work with this one

@carinedengler
Copy link
Contributor

carinedengler commented May 3, 2022

I think the default in dev mode should be SQLite, but we could have a new dev settings as Dev-Postgresql to facilitate work with this one

Could you give an example how you would use that? Since db.engine.dialect.name can be used to retrieve the DBMS currently in use, and DEV_DATABASE_URL allows for passing both SQLite and PostgreSQL accesses, it seems to me that introducing another setting is not necessary.

@PonteIneptique
Copy link
Member Author

I think I would simply have a new config object in the config.py file that could be used from with the manage CLI

@PonteIneptique
Copy link
Member Author

So I'd go with the information retrieval you talk about and the DATABASE URI parameter, yes :)

@carinedengler
Copy link
Contributor

OK, I get your idea now 👍

@carinedengler
Copy link
Contributor

the tests should now be fixed for SQLite - I won't have time before the end of the week (maybe even start next week) to have a look at the PostgreSQL tests, but introducing a switch for the tests for SQLite or PostgreSQL and fixing the PostgreSQL tests will be the next items on the list

PonteIneptique added a commit that referenced this issue Jan 9, 2024
* feat!: change the code to support postgresql instead of sqlite

* increase corpus name size

* feat: automatically create PostgreSQL database when running manage.py db-create

* fix: fix psycopg2-binary version

* feat: use switch for DBMS-specific calls

* fix: keep naming schema from SQLite development database

* fix: adapt validation to new field size

* fix: use dynamic list length instead of hard-coded value

* fix: fix incorrect number of tokens in new corpus

* feat: add separate PostgreSQL test config

* refactor: use id attribute instead of hard-coded ID

* feat: create test database if it doesn't exist

* fix: PostgreSQL does not autoincrement if fixed id is given

* fix: PostgreSQL's unique constraint message is different from SQLite's

* fix: in PostgreSQL, all fields must be part of GROUP BY clause

* ci: add PostgreSQL tests to CI

* fix: close connection to PostgreSQL database in tearDown

* fix: create database if it does not exist

---------

Co-authored-by: François Ferry <[email protected]>
Co-authored-by: Carine Dengler <[email protected]>
Co-authored-by: Thibault Clérice <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants