Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move postUpload logs to user's uploadDir #1318

Conversation

kencho51
Copy link
Contributor

@kencho51 kencho51 commented Jul 3, 2023

Pull request for issue: #1292 and #1317

This is a pull request for the following functionalities:

To fix #1292, this PR will move all the logs produced by the postUpload.sh in /home/centos/uploadLogs to /home/$user/uploadDir, so $user can access the logs in their own directory.

To fix #1317, this PR will remove all logos of BGI and CNGB from the gigadb-website public pages.

How to test?

Describe how the new functionalities can be tested by PR reviewers

To access the postUpload logs

Spin up staging gigadb-website

% cd /gigadb-website/ops/infrastructure/envs/staging
% ../../../scripts/tf_init.sh --project gigascience/forks/kencho-gigadb-website --env staging
% terraform apply
% terraform refresh
% ../../../scripts/ansible_init.sh --env staging
% TF_KEY_NAME=private_ip ansible-playbook -i ../../inventories webapp_playbook.yml -v
% ansible-playbook -i ../../inventories bastion_playbook.yml -e "backup=latest" -v
# Complete gitlab pipeline, make sure all staging jobs have been deployed.

Create a user account in the bastion server

% cd /gigadb-website/ops/infrastructure/envs/staging
% ansible-playbook -i ../../inventories users_playbook.yml -e "newuser=lily"
% chmod 500 output/privkeys-$bastion-ip/lily
% ls -Al output/privkeys-$bastion-ip
total 8
-r-x------@ 1 kencho  staff  3357 Jul  3 14:39 lily

Execute the datasetUpload.sh on normal spreadsheet

% scp -i /path/to/envs/staging/output/privkeys-$bastion-ip/lily /path/to/100679newversion.xls lily@$bastion-ip:/home/chrish/uploadDir
% ssh -i /path/to/envs/staging/output/privkeys-$bastion-ip/lily lily@$bastion-ip
[lily@ip-10-99-0-183 ~]$ ls
uploadDir
[lily@ip-10-99-0-183 ~]$ ls uploadDir/
100679newversion.xls
[lily@ip-10-99-0-183 ~]$ sudo /home/centos/datasetUpload.sh
DROP TRIGGER
DROP TRIGGER
DROP TRIGGER
CREATE TRIGGER
CREATE TRIGGER
CREATE TRIGGER
Done.
lily@ip-10-99-0-183 ~]$ ls -l uploadDir/
-rw-r--r--. 1 lily lily 18118 Jul 21 03:16 java.log
-rw-r--r--. 1 lily lily     0 Jul 21 03:16 javac.log

Execute the datasetUpload.sh on faulty spreadsheet

# download the faulty spreadsheet at here: https://github.com/gigascience/gigadb-website/files/12097170/GigaDB_v15-DRR202202-01_102239.xls
# upload faulty spreadsheet
% scp -i  /path/to/envs/staging/output/privkeys-$bastion-ip/lily /path/to/GigaDB_v15-DRR202202-01_102239.xls lily@$bastion-ip:/home/lily/uploadDir
% ssh -i /path/to/envs/staging/output/privkeys-$bastion-ip/lily lily@$bastion-ip
[lily@ip-10-99-0-183 ~]$ ls uploadDir/
GigaDB_v15-DRR202202-01_102239.xls
[lily@ip-10-99-0-183 ~]$ sudo /home/centos/datasetUpload.sh
DROP TRIGGER
DROP TRIGGER
DROP TRIGGER
CREATE TRIGGER
CREATE TRIGGER
CREATE TRIGGER
Spreadsheet cannot not be uploaded, please check the logs!
Done.
[lily@ip-10-99-0-183 ~]$ ls -l uploadDir/
total 324
-rw-r--r--. 1 lily lily 310784 Jul 21 03:02 GigaDB_v15-DRR202202-01_102239.xls
-rw-r--r--. 1 lily lily  18132 Jul 21 03:11 java.log
-rw-r--r--. 1 lily lily      0 Jul 21 03:11 javac.log

Execute postUpload.sh

lily@ip-10-99-0-72 ~]$ sudo /home/centos/postUpload.sh 100679
...
[License]
All files and data are distributed under the CC0 1.0 Universal (CC0 1.0) Public
Domain Dedication (https://creativecommons.org/publicdomain/zero/1.0/), unless
specifically stated otherwise, see http://gigadb.org/site/term for more details.

[Comments]

[End]

Done with creating the README file for 100679. The README file is saved in file: /home/centos/uploadLogs/readme-100679.txt

All postUpload logs have been moved to: /home/lily/uploadDir

PostUpload jobs done!

lily@ip-10-99-0-72 ~]$ ls uploadDir/
invalid-urls-100679.txt  java.log  javac.log  readme-100679.txt  updating-file-size-100679.txt  updating-md5checksum-100679.txt
[lily@ip-10-99-0-72 ~]$ 

To check the logos of BGI and CNGB have been removed from the gigadb website page's footer by going to:

  1. http://gigadb.gigasciencejournal.com:9170/site/index
  2. http://gigadb.gigasciencejournal.com:9170/site/faq
  3. http://gigadb.gigasciencejournal.com:9170/dataset/100006
  4. https://jobs.gigasciencejournal.com/

To check the logos of BGI and CNGB have been removed from the gigadb job page's footer by going to:

  1. https://jobs.gigasciencejournal.com/
  2. https://jobs.gigasciencejournal.com/jobs/tech/senior_software_engineer.html
  3. https://jobs.gigasciencejournal.com/jobs/tech/software_engineer.html
  4. https://jobs.gigasciencejournal.com/jobs/tech/freelance.html

The PR is at here: gigascience/gigascience.github.io#7.

How have functionalities been implemented?

Before the fix, the log files produced by the postUpload.sh were stored in the home/centos/uploadLogs after the execution, where $user does not have the access permission.
So, by moving all the logs to the /home/$user/uploadDir/ during the ./potUpload.sh process, individual user can get hold of the postUpload logs.

Any issues with implementation?

Describe any problems with your implementation

Any changes to automated tests?

None.

Any changes to documentation?

None.

Any technical debt repayment?

The datasetUpload.sh will intake xls file from the uploadDir and ingest the data into the database, and only java.log and javac.log will be produced in uploadLogs dir, mv /home/centos/uploadDir/* $uploadDir/ in line 28 seems to be redundant, so it is removed.

Any improvements to CI/CD pipeline?

Describe any improvements to the Gitlab pipeline

@kencho51 kencho51 force-pushed the SprintTask-1292-update-permission-in-postupload branch from f754689 to de26c09 Compare July 4, 2023 08:47
@kencho51 kencho51 requested review from rija and pli888 July 4, 2023 09:00
@kencho51 kencho51 marked this pull request as ready for review July 4, 2023 09:00
@kencho51 kencho51 force-pushed the SprintTask-1292-update-permission-in-postupload branch from de26c09 to 89238ac Compare July 6, 2023 03:12
@pli888 pli888 added the Peter label Jul 10, 2023
Copy link
Member

@pli888 pli888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Log files are moved to the $user's directory after spreadsheet upload has finished execution.
issue: In a clean staging directory when instantiating my staging environment, I got an error: An argument named create_random_password is not expected here in rds-instance.tf line 33, in module "db". This is probably caused by my environment using the latest Terraform AWS modules. The fix is to replace the out of date create_random_password variable with manage_master_user_password = false.
issue: When running sudo /home/centos/datasetUpload.sh to process xls spreadsheet, I get 3 errors in java.log:

[lily@ip-10-99-0-221 uploadDir]$ sudo /home/centos/datasetUpload.sh
[lily@ip-10-99-0-221 uploadDir]$ more java.log 
java.io.FileNotFoundException: time.txt (Permission denied)
org.postgresql.util.PSQLException: The authentication type 10 is not supported. Check that you have configured the pg_hba.conf 
file to include the client's IP address or subnet, and that it is using an authentication scheme supported by the driver.
        at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:403)
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:108)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66)
        at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:125)
        at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
        at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:22)
        at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:32)
        at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24)
        at org.postgresql.Driver.makeConnection(Driver.java:393)
        at org.postgresql.Driver.connect(Driver.java:267)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:247)
        at Database.<init>(Database.java:32)
        at Validation.<init>(Validation.java:80)
        at Main.processExcel(Main.java:121)
        at Main.main(Main.java:168)
**Begin: file 1 : GigaDBUpload_102203_GIGA-D-21-00197_hamster.xls in process...
java.io.FileNotFoundException: /tool/uploadDir/GigaDBUpload_102203_GIGA-D-21-00197_hamster.xls (Permission denied)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:163)
        at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:145)
        at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:87)
        at Excel2Database.<init>(Excel2Database.java:134)
        at Main.processExcel2Database(Main.java:43)
        at Main.processExcel(Main.java:130)

issue: java.io.FileNotFoundException: /tool/uploadDir/GigaDBUpload_102203_GIGA-D-21-00197_hamster.xls (Permission denied) is probably caused by the spreadsheet xls file in /home/centos/uploadDir still belonging to lily user:

[centos@ip-10-99-0-221 ~]$ ls -lh uploadDir/
total 72K
-rwx------. 1 lily lily 72K Jul 10 06:36 GigaDBUpload_102203_GIGA-D-21-00197_hamster.xls

Could be fixed by adding chown centos:centos /home/centos/uploadDir/* after mv $uploadDir/* /home/centos/uploadDir/ in execute.sh.

issue: The org.postgresql.util.PSQLException: The authentication type 10 is not supported. Check that you have configured the pg_hba.conf error is probably caused by the ExceltoGigaDB tool requiring its lib/postgresql-9.1-901.jdbc4.jar to be replaced with a more recent jdbc driver. I managed to get spreadsheet upload working with RDS PostgreSQL 14 using postgresql-42.6.0.jar which I downloaded from https://jdbc.postgresql.org/download/ and selecting Java 8, 42.6.0 since we are using Java 8 in the excel upload tool container.

praise: The postUpload.sh script seems to be working fine:

[lily@ip-10-99-0-172 ~]$ sudo /home/centos/postUpload.sh 102203
[License]
All files and data are distributed under the CC0 1.0 Universal (CC0 1.0) Public
Domain Dedication (https://creativecommons.org/publicdomain/zero/1.0/), unless
specifically stated otherwise, see http://gigadb.org/site/term for more details.

[Comments]

[End]

Done with creating the README file for 102203. The README file is saved in file: /home/centos/uploadLogs/readme-102203.txt

All postUpload logs have been moved to: /home/lily/uploadDir

PostUpload jobs done!
[lily@ip-10-99-0-172 ~]$ ls uploadDir/
invalid-urls-102203.txt  javac.log          updating-file-size-102203.txt
java.log                 readme-102203.txt  updating-md5checksum-102203.txt

@pli888 pli888 removed the Peter label Jul 11, 2023
@rija
Copy link
Contributor

rija commented Jul 17, 2023

Hi @kencho51

relevant email from @only1chunts:

Hi Peter, Rija, Ken,

Mary Ann and I have been looking at using the upload and post-upload scripts on beta, she uses a Mac and I use a PC (no idea if thats relevant, just letting you know in case it is).

Both scripts run for both of us, so thats a good start.

The post-upload script appears to run, but behaves differently for each of us!
I ran the scripts on dataset 102422, the post-upload script said that it made 0 changes in the database and could find 0 files on the server (the files are in the correct place in the CNGB server), it then produced a readme that it printed to the screen and said it had saved it somewhere.
I cant see that somewhere (which I think was the old issue that you've been working on)

Mary Ann ran the scripts on dataset 102420, it DID find the files in the server, and it did write stuff to the database, and it printed the readme to screen and said it saved it somewhere.
Mary Ann cannot access her readme file, but oddly she can see readme_102422.txt (mine) in the Upload directory!?! whereas I still cant see that.

I think its all permissions issues, including not being able to read stuff saved in my name on the CNGB server, but figured it might be useful information for you guys to have.

Let us know if you need more details or want us to try anything out.

Cheers
Chris


@kencho51
Copy link
Contributor Author

Hi @pli888,

All the issues have been addressed.
While the PR gigascience/ExceltoGigaDB#4 provides the fix of the postgresql jdbc driver issue.
Please have a look again.

Copy link
Contributor

@rija rija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kencho51

This is not complete review yet, but I wanted to put this out first before I continue tomorrow in order to save time

praise: code change looks good at first glance
praise: logo changes look fine
issue: change to rds-instance.tf is not complete and caused an error

$ terraform version 
Terraform v1.5.3
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v4.18.0
+ provider registry.terraform.io/hashicorp/external v2.2.2
+ provider registry.terraform.io/hashicorp/random v3.3.1
$ ../../../scripts/tf_init.sh --project gigascience/forks/rija-gigadb-website --env staging 
You need to specify an AWS region: eu-west-3

Initializing the backend...
Initializing modules...

Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/external from the dependency lock file
- Using previously-installed hashicorp/aws v4.18.0
- Using previously-installed hashicorp/random v3.3.1
- Using previously-installed hashicorp/external v2.2.2

Terraform has been successfully initialized!

 $ terraform plan  
╷
│ Error: Unsupported argument
│ 
│   on ../../modules/rds-instance/rds-instance.tf line 33, in module "db":
│   33:   manage_master_user_password = false
│ 
│ An argument named "manage_master_user_password" is not expected here.

suggestion: It's better to revert the change made to rds-instance.tf.

That work is already part of @pli888's PR #1316, and if you look at that PR, you will see it's not just that line that need change, there are other changes to that file that need to be made together and tested properly.
I think that's not the purpose of your PR (especially, there's already additional changes to the frontend you have taken on this PR, plus the PostgreSQL jar file).
If we have the setup necessary for #1316, and we want to test this PR without getting into terraform errors, I reckon it should be easy enough to delete and recreate the ops/infrastructure/envs/staging directory or make another checkout of the codebase.

@rija
Copy link
Contributor

rija commented Jul 19, 2023

If we have the setup necessary for #1316, and we want to test this PR without getting into terraform errors, I reckon it should be easy enough to delete and recreate the ops/infrastructure/envs/staging directory or make another checkout of the codebase.

Hi @kencho51, @pli888,

For info, part of my sentence is not correct.
It's easy to create from scratch a new AWS environment with the develop branch (and this one), but the steps are:

  • Copy in ops/infrastructure/envs/(staging|live)the following directory and file from an existing working deployment (if you don't have one for your fork anymore, you can use the setup you've used to build the Upstream environments for production):
    • .terraform/
    • .terraform.lock.hcl
  • Run the command terraform init -migrate-state if asked

And that's it, the rest is as usual.

Copy link
Contributor

@rija rija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kencho51,

praise: postUpload.sh script is now correctly copying its output in the user's uploadDir directory
praise: I was able to build a brand new AWS staging environment without errors
issue: the behaviour of datasetUpload.sh has regressed.
Before, when users upload a spreadsheet and the process failed they will know immediately without having to look at the log because they will still see the spreadsheet in the upload directory.
It also means that when they don't see the spreadsheet in the upload directory anymore, they know the run was successful.
I've just tried with a spreadsheet I knew will fail, but I didn't see it in lily's upload directory. After checking in centos user's upload directory I saw it there. It wasn't copied over to lily's after the run.
suggestion: do revert the change you made to execute.sh.
The line you didn't think was useful was there to allow the aforementioned feature to happen.
Admittedly, It's not a very elegant code (it very much look like I rushed that one), and you cannot be faulted for thinking it was an error, so If you can quickly come up with a better implementation for doing the same thing, please do so.
I'm attaching in a comment below the spreadsheet I used to reproduce the issue, if it helps reproducing the issue

@rija
Copy link
Contributor

rija commented Jul 19, 2023

Spreadsheet that would fail the datasetUpload.sh run for me:

GigaDB_v15-DRR202202-01_102239.xls

Copy link
Contributor

@rija rija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kencho51

praise: On AWS deployment, the scripts are now behaving as expected whether the spreadsheet is valid or not

happy to approve

pli888 added a commit to gigascience/ExceltoGigaDB that referenced this pull request Jul 24, 2023
…m kencho51/update-jdbc-driver)

The postgresql-9.1-901.jdbc4.jar has been replaced with postgresql-42.6.0.jar in order for the spreadsheet
upload tool to work with PostgreSQL 14.

Reviewed-by: @pli888, @rija
Refs: gigascience/gigadb-website#1318
@pli888 pli888 added the Peter label Jul 24, 2023
Copy link
Member

@pli888 pli888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The org.postgresql.util.PSQLException: The authentication type 10 is not supported error has disappeared with the new PostgreSQL jdbc jar being used now by the consultant's spreadsheet upload tool when I upload a spreadsheet on my staging server. I also do not see java.io.FileNotFoundException error anymore either.
I approve this PR.

@pli888 pli888 removed the Peter label Jul 24, 2023
@rija rija merged commit c218904 into gigascience:develop Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants