Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support having 2 tasks with the same name on DB #998

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

wil93
Copy link
Member

@wil93 wil93 commented Aug 22, 2018

Currently we have three separate uniqueness constraints: (name), (name, contest_id), and (num, contest_id). The first one is not really useful and in some cases it's really annoying (current workaround is adding the contestname_ as a prefix of taskname).

Fixes #765

Before:

Indexes:
    "tasks_pkey" PRIMARY KEY, btree (id)
    "tasks_contest_id_name_key" UNIQUE CONSTRAINT, btree (contest_id, name)
    "tasks_contest_id_num_key" UNIQUE CONSTRAINT, btree (contest_id, num)
    "tasks_name_key" UNIQUE CONSTRAINT, btree (name)
    "ix_tasks_contest_id" btree (contest_id)

After:

Indexes:
    "tasks_pkey" PRIMARY KEY, btree (id)
    "tasks_contest_id_name_key" UNIQUE CONSTRAINT, btree (contest_id, name)
    "tasks_contest_id_num_key" UNIQUE CONSTRAINT, btree (contest_id, num)
    "ix_tasks_contest_id" btree (contest_id)

This change is Reviewable

@codecov
Copy link

codecov bot commented Aug 22, 2018

Codecov Report

Merging #998 into master will decrease coverage by 19.9%.
The diff coverage is 19.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #998       +/-   ##
==========================================
- Coverage    65.1%   45.2%   -19.91%     
==========================================
  Files         231     231               
  Lines       17746   17763       +17     
==========================================
- Hits        11554    8030     -3524     
- Misses       6192    9733     +3541
Flag Coverage Δ
#functionaltests ?
#unittests 45.2% <19.56%> (-0.84%) ⬇️
Impacted Files Coverage Δ
cms/db/task.py 72.91% <ø> (-9.73%) ⬇️
cmscontrib/RemoveTask.py 0% <0%> (ø) ⬆️
cmscontrib/AddTestcases.py 0% <0%> (ø) ⬆️
cmscontrib/AddSubmission.py 76.41% <100%> (ø) ⬆️
cmscontrib/ImportTask.py 29.76% <11.76%> (-51.64%) ⬇️
cmscontrib/AddStatement.py 32.2% <21.42%> (-43.66%) ⬇️
cmscontrib/ImportDataset.py 37.31% <25%> (-45.77%) ⬇️
cmscontrib/ImportContest.py 77.5% <55.55%> (-3.27%) ⬇️
cmscontrib/importing.py 39.44% <8.33%> (-38.77%) ⬇️
cmsranking/__init__.py 0% <0%> (-100%) ⬇️
... and 105 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d75b457...7f5590b. Read the comment docs.

@lw
Copy link
Member

lw commented Aug 23, 2018

Are you serious? Did you really not even try to fix all the things that would break because of this?

Here are just the first ones that came to my mind: AddTestcases, AddStatement, ImportTask, RemoveTask, RWS, ...

@lw
Copy link
Member

lw commented Aug 23, 2018

I searched a bit more and also ImportContest and ImportDataset would break.

@wil93
Copy link
Member Author

wil93 commented Aug 23, 2018

I assume by "will break" you mean "will print a SQL integrity error instead of a user-friendly error message, in some cases" (especially given the fact that all workflows used until now, including the test suite, are working with no problems).

Anyway, OK, I will look into adding more user-friendly error messages. It must be said though that there are some situations where you can get this kind of SQL error printouts already (if you edit tasks in the wrong way, using AWS ...) so I would tend to assume that "contest admins" should mostly be fine with such errors.

@lw
Copy link
Member

lw commented Aug 23, 2018 via email

@lw
Copy link
Member

lw commented Aug 23, 2018 via email

@wil93
Copy link
Member Author

wil93 commented Aug 23, 2018

Oh, now I see, good points.

I guess I'll replace most of "task name" requests with "task id"...would that work? These are admin tools anyway so we can assume an admin knows how to find out the ID of a task. We could eventually add some scripts to ease data browsing from the command line (cmsList (tasks|users|...)) so that to retrieve the IDs one doesn't need to use AWS or psql.

@stefano-maggiolo
Copy link
Member

I would prefer a disambiguation (at least via command line argument, possibly interactive too) to avoid breaking existing scripts.

@wil93
Copy link
Member Author

wil93 commented Aug 23, 2018

OK, I will add an optional -t <task_id> argument that overrides the positional task_name when specified, and becomes mandatory when the task name is ambiguous.

@stefano-maggiolo
Copy link
Member

[Note this might not cover all issues, in some other places you might need to do something different...]

@lw
Copy link
Member

lw commented Aug 23, 2018

I would prefer a --contest-name/-c option. We already have it, for example in AddTestcases, except that now it's only used to verify that the task with the given name belongs to that contest. So it would be nice to fix that behavior, and to extend that flag to all affected commands.

@lw
Copy link
Member

lw commented Aug 23, 2018

As for RWS, it may be enough to just fix the PS side. PS should keep the short_name field of the task set to its name, but the identifier (the last component of the URL) should now become the combination of the contest name and the task name (with some precautions to avoid ambiguities of course).

@lw
Copy link
Member

lw commented Aug 23, 2018

Also, in AWS there are some views where we list all tasks, for all contests, by name. We should now add their contests' names as well, to allow the admins to select without ambiguity.

@wil93
Copy link
Member Author

wil93 commented Aug 23, 2018

I think the id of the task must be specified in some cases. There can be 10 tasks with the same name and not tied to any contest

@wil93
Copy link
Member Author

wil93 commented Aug 23, 2018

OK, now almost everything works. Soon I will fix the rest.

(cms) wil93@xps13:~/git/cms$ cmsRemoveTask batch
2018-08-24 01:03:16,075 - INFO [<unknown>] Using configuration file /usr/local/etc/cms.conf.
2018-08-24 01:03:16,396 - ERROR [<unknown>] Task name is ambiguous, please use the -t argument to specify a valid ID:
 ID =   1 | batch (Batch) | con_test
 ID =  10 | batch (Batch) | <none>
 ID =  11 | batch (Batch) | <none>
2018-08-24 01:03:16,396 - INFO [<unknown>] Error while importing, no changes were made.

And this is when I specify an ID:

(cms) wil93@xps13:~/git/cms$ cmsRemoveTask batch -t 10
2018-08-24 01:04:25,519 - INFO [<unknown>] Using configuration file /usr/local/etc/cms.conf.
This will delete task `batch' and all related data, including submissions. Are you sure? [y/N] y
2018-08-24 01:04:28,023 - INFO [<unknown>] Task `batch' removed.

@stefano-maggiolo
Copy link
Member

Ready to review again?

@wil93
Copy link
Member Author

wil93 commented Aug 27, 2018

Well it's not finished yet but the tools should work. AWS needs some aesthetic work but it's not broken. I didn't look into RWS yet.

Copy link
Member

@lw lw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 6 of 7 files at r2, 1 of 1 files at r4.
Reviewable status: 8 of 9 files reviewed, 18 unresolved discussions (waiting on @wil93)

a discussion (no related file):
The unit tests for ImportContest, ImportTask, ImportDataset and AddStatement are failing, please fix them.


a discussion (no related file):
In some places you switched some functions from doing their own logging of errors and returning True/False to them raising an exception with the error message. I prefer the latter too and think it's good to do it, but if so please do it everywhere (or nowhere) consistently and do it in a separate commit for clarity.


a discussion (no related file):
In a couple of cases you changed the behavior of the tools. That is probably what is causing some of the test failures. Please don't do that. If all tasks in the database just happen to all have distinct names, the tools should keep working exactly like they did before. If that is not the case, the tools should require the least possible information necessary to disambiguate.



cmscontrib/AddStatement.py, line 47 at r2 (raw file):

def add_statement(task_name, task_id, language_code, statement_file,

Add contest_name.

Provide a None default for both contest_name and task_id (which means moving them to the end of the signature).


cmscontrib/AddStatement.py, line 103 at r2 (raw file):

                        help="absolute/relative path of statement file")
    parser.add_argument("-t", "--task-id", action="store", type=int,
                        help="optional task ID used for disambiguation")

Also add a --contest-name, -c option.


cmscontrib/AddStatement.py, line 120 at r2 (raw file):

        return 1

    return 0 if success is True else 1

Does add_statement ever return False now? Isn't the return value in fact meaningless? This could be changed to just return 0.


cmscontrib/AddSubmission.py, line 166 at r2 (raw file):

        description="Adds a submission to a contest in CMS.")
    parser.add_argument("-c", "--contest-id", action="store", type=int,
                        help="id of contest where to add the user")

If you feel like adding --contest-name and --task-id here too it wouldn't hurt I guess.


cmscontrib/AddTestcases.py, line 50 at r2 (raw file):

def add_testcases(archive, input_template, output_template,
                  task_name, task_id, dataset_description=None,

Same here: allow contest_name.


cmscontrib/ImportContest.py, line 205 at r2 (raw file):

        tasks = session.query(Task).filter(Task.name == taskname).all()

        if self.import_tasks:

You changed the behavior of the function. In this case you import the task without any checks and later you unconditionally bind it to a contest. An exception (of the wrong type) could be raised.


cmscontrib/ImportDataset.py, line 56 at r5 (raw file):

class DatasetImporter(object):
    def __init__(self, path, description, loader_class, task_id):

Same here (contest_name).


cmscontrib/importing.py, line 68 at r2 (raw file):

def task_from_db(session, task_name, task_id):

Add contest_name too, between task_name and task_id.

Also make contest_name and task_id by default None.


cmscontrib/importing.py, line 72 at r2 (raw file):

    session (Session): SQLAlchemy session to use.
    task_name (string|None): the name of the task, or None to return None.

s/string/str/ (not your change, but as we're moving to py3 we're slowly fixing them wherever we find them)


cmscontrib/importing.py, line 77 at r2 (raw file):

    return (Task|None): None if task_name is None, or the task.
    raise (ImportDataError): if there is no task with the given name.
    raise (AmbiguousTaskName): if the task name is ambiguous.

?


cmscontrib/importing.py, line 84 at r2 (raw file):

    tasks = session.query(Task).filter(Task.name == task_name)

Also filter by contest_name.


cmscontrib/importing.py, line 88 at r2 (raw file):

        tasks = tasks.filter(Task.id == task_id)

    tasks = tasks.all()

Use .one() instead (which should be more efficient) and detect the two cases of the if below by catching the two exceptions it can raise. See
http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.one


cmscontrib/ImportTask.py, line 69 at r2 (raw file):

    def __init__(self, path, prefix, override_name, update, no_statement,
                 contest_id, task_id, loader_class):

Same here.


cmscontrib/ImportTask.py, line 142 at r2 (raw file):

        """
        if self.update:

You changed the semantics of this function. Now if a task with that name and contest is already in the DB and update is false no exception will be raised.

Please fix this.


cmscontrib/RemoveTask.py, line 50 at r2 (raw file):

def remove_task(task_name, task_id):

Same here.

Copy link
Member Author

@wil93 wil93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 18 unresolved discussions (waiting on @lerks and @wil93)

a discussion (no related file):

Previously, lerks (Luca Wehrstedt) wrote…

The unit tests for ImportContest, ImportTask, ImportDataset and AddStatement are failing, please fix them.

Done (I think, let's see what travis says)


a discussion (no related file):

Previously, lerks (Luca Wehrstedt) wrote…

In some places you switched some functions from doing their own logging of errors and returning True/False to them raising an exception with the error message. I prefer the latter too and think it's good to do it, but if so please do it everywhere (or nowhere) consistently and do it in a separate commit for clarity.

I usually just update those that I encounter while changing other stuff, maybe a specific commit can be done for the remaining ones, after the PR...


a discussion (no related file):

Previously, lerks (Luca Wehrstedt) wrote…

In a couple of cases you changed the behavior of the tools. That is probably what is causing some of the test failures. Please don't do that. If all tasks in the database just happen to all have distinct names, the tools should keep working exactly like they did before. If that is not the case, the tools should require the least possible information necessary to disambiguate.

Most of the failures were because I forgot to add =None to the new task_id parameter. I think all behaviors (when task names are different) should be the same, unless I did some mistake.



cmscontrib/AddStatement.py, line 47 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Add contest_name.

Provide a None default for both contest_name and task_id (which means moving them to the end of the signature).

Added default None for task_id. For contest name I'm not sure why we would need that...


cmscontrib/AddStatement.py, line 103 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Also add a --contest-name, -c option.

Do we really need it?


cmscontrib/AddStatement.py, line 120 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Does add_statement ever return False now? Isn't the return value in fact meaningless? This could be changed to just return 0.

Done.


cmscontrib/AddSubmission.py, line 166 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

If you feel like adding --contest-name and --task-id here too it wouldn't hurt I guess.

Mmm but if we already have contest_id then why specify also contest_name? Also task_id should never be necessary here, since contest_id and task_name are mandatory (and they uniquely identify which task we are submitting to)


cmscontrib/AddTestcases.py, line 50 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Same here: allow contest_name.

I had removed it because it didn't look necessary. Now I added it back, I guess it could stay for backwards compatibility.


cmscontrib/ImportContest.py, line 205 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

You changed the behavior of the function. In this case you import the task without any checks and later you unconditionally bind it to a contest. An exception (of the wrong type) could be raised.

Which kind of check should we do? If -i is used, then the task can always be imported (since we don't have the name constraint anymore). Now I tried to reflect this in the testsuite.


cmscontrib/ImportDataset.py, line 56 at r5 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Same here (contest_name).

Is it necessary?


cmscontrib/importing.py, line 68 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Add contest_name too, between task_name and task_id.

Also make contest_name and task_id by default None.

Made task_id None by default.


cmscontrib/importing.py, line 72 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

s/string/str/ (not your change, but as we're moving to py3 we're slowly fixing them wherever we find them)

Done.


cmscontrib/importing.py, line 77 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

?

Done. I initially implemented the "disambiguation screen" as a separate exception (which extended ImportDataError) but then I refactored it as just a custom message of ImportDataError.


cmscontrib/importing.py, line 84 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Also filter by contest_name.

Is it necessary?


cmscontrib/importing.py, line 88 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Use .one() instead (which should be more efficient) and detect the two cases of the if below by catching the two exceptions it can raise. See
http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.one

Mmm... I just tried but the problem is, in the "MultipleResultsFound" case, I need to access the tasks list (in order to show the disambiguation screen). I suspect the only way to do that is with .all()


cmscontrib/ImportTask.py, line 69 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Same here.

Made task_id None.


cmscontrib/ImportTask.py, line 142 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

You changed the semantics of this function. Now if a task with that name and contest is already in the DB and update is false no exception will be raised.

Please fix this.

I'm not completely sure of some cases: if I'm trying to update a task with cmsImportTask foo -c 123, do I want to find the task with name foo in contest 123 (which is unique, even though there could be many foos) and then update its content? Or do I want to find the task with name foo (assuming it's unique, otherwise I'll specify the ID as well) and update it as in "update its content AND tying it to contest 123"?

What if the task with name foo is unique and not tied to any contest yet? (a special case of the case described above). Should I tie it or just update the contests? Or maybe say "can't find task with that name in the contest"?


cmscontrib/RemoveTask.py, line 50 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Same here.

Made task_id optional

Copy link
Member

@lw lw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 10 files reviewed, 17 unresolved discussions (waiting on @lerks)

a discussion (no related file):

Previously, wil93 (William Di Luigi) wrote…

Done (I think, let's see what travis says)

Nope. AddStatement and ImportTask are still broken. (You can run those tests yourself, offline, before pushing, by launching cmstestsuite/RunUnitTests.py)


a discussion (no related file):

Previously, wil93 (William Di Luigi) wrote…

I usually just update those that I encounter while changing other stuff, maybe a specific commit can be done for the remaining ones, after the PR...

I still want it to be done in a separate commit from all the rest of the work in this PR as otherwise it's just a mess to read.



cmscontrib/AddStatement.py, line 47 at r2 (raw file):

Previously, wil93 (William Di Luigi) wrote…

Added default None for task_id. For contest name I'm not sure why we would need that...

We don't "need" the contest name and, for that matter, we don't "need" the task name either (if we have the ID). I'm sure, however, that admins remember the names they gave to things better than the IDs that CMS gave to those same things, making it easier for them to use these commands. Also, this allows them to use the same commands (and, say, to put it in scripts) and have it work across reimports. I would actually ask you to remove the task ID if it weren't necessary to disambiguate between unbound tasks.

Copy link
Member

@stefano-maggiolo stefano-maggiolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 10 files reviewed, 17 unresolved discussions (waiting on @lerks and @wil93)

a discussion (no related file):

Previously, lerks (Luca Wehrstedt) wrote…

Nope. AddStatement and ImportTask are still broken. (You can run those tests yourself, offline, before pushing, by launching cmstestsuite/RunUnitTests.py)

Alternatively, you can setup travis to run on your fork and push a different branch than the one you're using for this PR to have it run the tests without affecting the PR.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Support having 2 tasks on DB with the same Task.name
3 participants