Support having 2 tasks with the same name on DB #998

wil93 · 2018-08-22T16:50:03Z

Currently we have three separate uniqueness constraints: (name), (name, contest_id), and (num, contest_id). The first one is not really useful and in some cases it's really annoying (current workaround is adding the contestname_ as a prefix of taskname).

Fixes #765

Before:

Indexes:
    "tasks_pkey" PRIMARY KEY, btree (id)
    "tasks_contest_id_name_key" UNIQUE CONSTRAINT, btree (contest_id, name)
    "tasks_contest_id_num_key" UNIQUE CONSTRAINT, btree (contest_id, num)
    "tasks_name_key" UNIQUE CONSTRAINT, btree (name)
    "ix_tasks_contest_id" btree (contest_id)

After:

Indexes:
    "tasks_pkey" PRIMARY KEY, btree (id)
    "tasks_contest_id_name_key" UNIQUE CONSTRAINT, btree (contest_id, name)
    "tasks_contest_id_num_key" UNIQUE CONSTRAINT, btree (contest_id, num)
    "ix_tasks_contest_id" btree (contest_id)

This change is

codecov · 2018-08-22T17:01:40Z

Codecov Report

Merging #998 into master will decrease coverage by 19.9%.
The diff coverage is 19.56%.

@@            Coverage Diff             @@
##           master    #998       +/-   ##
==========================================
- Coverage    65.1%   45.2%   -19.91%     
==========================================
  Files         231     231               
  Lines       17746   17763       +17     
==========================================
- Hits        11554    8030     -3524     
- Misses       6192    9733     +3541

Flag	Coverage Δ
#functionaltests	`?`
#unittests	`45.2% <19.56%> (-0.84%)`	⬇️

Impacted Files	Coverage Δ
cms/db/task.py	`72.91% <ø> (-9.73%)`	⬇️
cmscontrib/RemoveTask.py	`0% <0%> (ø)`	⬆️
cmscontrib/AddTestcases.py	`0% <0%> (ø)`	⬆️
cmscontrib/AddSubmission.py	`76.41% <100%> (ø)`	⬆️
cmscontrib/ImportTask.py	`29.76% <11.76%> (-51.64%)`	⬇️
cmscontrib/AddStatement.py	`32.2% <21.42%> (-43.66%)`	⬇️
cmscontrib/ImportDataset.py	`37.31% <25%> (-45.77%)`	⬇️
cmscontrib/ImportContest.py	`77.5% <55.55%> (-3.27%)`	⬇️
cmscontrib/importing.py	`39.44% <8.33%> (-38.77%)`	⬇️
cmsranking/__init__.py	`0% <0%> (-100%)`	⬇️
... and 105 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d75b457...7f5590b. Read the comment docs.

lw · 2018-08-23T09:25:53Z

Are you serious? Did you really not even try to fix all the things that would break because of this?

Here are just the first ones that came to my mind: AddTestcases, AddStatement, ImportTask, RemoveTask, RWS, ...

lw · 2018-08-23T09:33:46Z

I searched a bit more and also ImportContest and ImportDataset would break.

wil93 · 2018-08-23T10:00:15Z

I assume by "will break" you mean "will print a SQL integrity error instead of a user-friendly error message, in some cases" (especially given the fact that all workflows used until now, including the test suite, are working with no problems).

Anyway, OK, I will look into adding more user-friendly error messages. It must be said though that there are some situations where you can get this kind of SQL error printouts already (if you edit tasks in the wrong way, using AWS ...) so I would tend to assume that "contest admins" should mostly be fine with such errors.

lw · 2018-08-23T10:16:27Z

No, by "break" I mean it will become impossible to add a statement to a task if there is another one with the same name in another contest, or that the wrong task will be removed, or that a dataset will be added to the wrong task, or that data will be overwritten. In short, tools will not do what they're supposed to. Like, they will break.

…

On Thu, Aug 23, 2018, 19:00 William Di Luigi ***@***.***> wrote: I assume by "will break" you mean "will print a SQL integrity error instead of a user-friendly error message, in some cases" (especially given the fact that all workflows used until now, including the test suite, are working with no problems). Anyway, OK, I will look into adding more user-friendly error messages. It must be said though that there are some situations where you can get this kind of SQL error printouts already (if you edit tasks in the wrong way, using AWS ...) so I would tend to assume that "contest admins" should mostly be fine with such errors. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#998 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHX6oG40u0cpyVt0mxK384_5pwV7zzMks5uTn01gaJpZM4WICRa> .

lw · 2018-08-23T10:25:16Z

The list I gave you earlier is just all the tools that issue SELECT queries filtering by task name without also filtering by contest. All these queries will stop working the way they're expected and need to be fixed. Since there's small differences in behavior whether a query is fetched with first(), one() or scalar() (and that now I'm not at my computer) I can't really tell you which tool breaks in which way. And then there's RWS, which won't be fun to fix.

…

On Thu, Aug 23, 2018, 19:16 Luca Wehrstedt ***@***.***> wrote: No, by "break" I mean it will become impossible to add a statement to a task if there is another one with the same name in another contest, or that the wrong task will be removed, or that a dataset will be added to the wrong task, or that data will be overwritten. In short, tools will not do what they're supposed to. Like, they will break. On Thu, Aug 23, 2018, 19:00 William Di Luigi ***@***.***> wrote: > I assume by "will break" you mean "will print a SQL integrity error > instead of a user-friendly error message, in some cases" (especially given > the fact that all workflows used until now, including the test suite, are > working with no problems). > > Anyway, OK, I will look into adding more user-friendly error messages. It > must be said though that there are some situations where you can get this > kind of SQL error printouts already (if you edit tasks in the wrong way, > using AWS ...) so I would tend to assume that "contest admins" should > mostly be fine with such errors. > > — > You are receiving this because you commented. > > > Reply to this email directly, view it on GitHub > <#998 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AAHX6oG40u0cpyVt0mxK384_5pwV7zzMks5uTn01gaJpZM4WICRa> > . >

wil93 · 2018-08-23T11:06:29Z

Oh, now I see, good points.

I guess I'll replace most of "task name" requests with "task id"...would that work? These are admin tools anyway so we can assume an admin knows how to find out the ID of a task. We could eventually add some scripts to ease data browsing from the command line (cmsList (tasks|users|...)) so that to retrieve the IDs one doesn't need to use AWS or psql.

stefano-maggiolo · 2018-08-23T11:39:07Z

I would prefer a disambiguation (at least via command line argument, possibly interactive too) to avoid breaking existing scripts.

wil93 · 2018-08-23T11:51:28Z

OK, I will add an optional -t <task_id> argument that overrides the positional task_name when specified, and becomes mandatory when the task name is ambiguous.

stefano-maggiolo · 2018-08-23T11:53:22Z

[Note this might not cover all issues, in some other places you might need to do something different...]

lw · 2018-08-23T12:55:42Z

I would prefer a --contest-name/-c option. We already have it, for example in AddTestcases, except that now it's only used to verify that the task with the given name belongs to that contest. So it would be nice to fix that behavior, and to extend that flag to all affected commands.

lw · 2018-08-23T13:01:51Z

As for RWS, it may be enough to just fix the PS side. PS should keep the short_name field of the task set to its name, but the identifier (the last component of the URL) should now become the combination of the contest name and the task name (with some precautions to avoid ambiguities of course).

lw · 2018-08-23T13:03:23Z

Also, in AWS there are some views where we list all tasks, for all contests, by name. We should now add their contests' names as well, to allow the admins to select without ambiguity.

wil93 · 2018-08-23T15:05:14Z

I think the id of the task must be specified in some cases. There can be 10 tasks with the same name and not tied to any contest

wil93 · 2018-08-23T23:06:35Z

OK, now almost everything works. Soon I will fix the rest.

(cms) wil93@xps13:~/git/cms$ cmsRemoveTask batch
2018-08-24 01:03:16,075 - INFO [<unknown>] Using configuration file /usr/local/etc/cms.conf.
2018-08-24 01:03:16,396 - ERROR [<unknown>] Task name is ambiguous, please use the -t argument to specify a valid ID:
 ID =   1 | batch (Batch) | con_test
 ID =  10 | batch (Batch) | <none>
 ID =  11 | batch (Batch) | <none>
2018-08-24 01:03:16,396 - INFO [<unknown>] Error while importing, no changes were made.

And this is when I specify an ID:

(cms) wil93@xps13:~/git/cms$ cmsRemoveTask batch -t 10
2018-08-24 01:04:25,519 - INFO [<unknown>] Using configuration file /usr/local/etc/cms.conf.
This will delete task `batch' and all related data, including submissions. Are you sure? [y/N] y
2018-08-24 01:04:28,023 - INFO [<unknown>] Task `batch' removed.

stefano-maggiolo · 2018-08-27T20:25:03Z

Ready to review again?

wil93 · 2018-08-27T20:31:46Z

Well it's not finished yet but the tools should work. AWS needs some aesthetic work but it's not broken. I didn't look into RWS yet.

lw

Reviewed 1 of 1 files at r1, 6 of 7 files at r2, 1 of 1 files at r4.
Reviewable status: 8 of 9 files reviewed, 18 unresolved discussions (waiting on @wil93)

a discussion (no related file):
The unit tests for ImportContest, ImportTask, ImportDataset and AddStatement are failing, please fix them.

a discussion (no related file):
In some places you switched some functions from doing their own logging of errors and returning True/False to them raising an exception with the error message. I prefer the latter too and think it's good to do it, but if so please do it everywhere (or nowhere) consistently and do it in a separate commit for clarity.

a discussion (no related file):
In a couple of cases you changed the behavior of the tools. That is probably what is causing some of the test failures. Please don't do that. If all tasks in the database just happen to all have distinct names, the tools should keep working exactly like they did before. If that is not the case, the tools should require the least possible information necessary to disambiguate.

cmscontrib/AddStatement.py, line 47 at r2 (raw file):

def add_statement(task_name, task_id, language_code, statement_file,

Add contest_name.

Provide a None default for both contest_name and task_id (which means moving them to the end of the signature).

cmscontrib/AddStatement.py, line 103 at r2 (raw file):

                        help="absolute/relative path of statement file")
    parser.add_argument("-t", "--task-id", action="store", type=int,
                        help="optional task ID used for disambiguation")

Also add a --contest-name, -c option.

cmscontrib/AddStatement.py, line 120 at r2 (raw file):

        return 1

    return 0 if success is True else 1

Does add_statement ever return False now? Isn't the return value in fact meaningless? This could be changed to just return 0.

cmscontrib/AddSubmission.py, line 166 at r2 (raw file):

        description="Adds a submission to a contest in CMS.")
    parser.add_argument("-c", "--contest-id", action="store", type=int,
                        help="id of contest where to add the user")

If you feel like adding --contest-name and --task-id here too it wouldn't hurt I guess.

cmscontrib/AddTestcases.py, line 50 at r2 (raw file):

def add_testcases(archive, input_template, output_template,
                  task_name, task_id, dataset_description=None,

Same here: allow contest_name.

cmscontrib/ImportContest.py, line 205 at r2 (raw file):

        tasks = session.query(Task).filter(Task.name == taskname).all()

        if self.import_tasks:

You changed the behavior of the function. In this case you import the task without any checks and later you unconditionally bind it to a contest. An exception (of the wrong type) could be raised.

cmscontrib/ImportDataset.py, line 56 at r5 (raw file):

class DatasetImporter(object):
    def __init__(self, path, description, loader_class, task_id):

Same here (contest_name).

cmscontrib/importing.py, line 68 at r2 (raw file):

def task_from_db(session, task_name, task_id):

Add contest_name too, between task_name and task_id.

Also make contest_name and task_id by default None.

cmscontrib/importing.py, line 72 at r2 (raw file):

    session (Session): SQLAlchemy session to use.
    task_name (string|None): the name of the task, or None to return None.

s/string/str/ (not your change, but as we're moving to py3 we're slowly fixing them wherever we find them)

cmscontrib/importing.py, line 77 at r2 (raw file):

    return (Task|None): None if task_name is None, or the task.
    raise (ImportDataError): if there is no task with the given name.
    raise (AmbiguousTaskName): if the task name is ambiguous.

?

cmscontrib/importing.py, line 84 at r2 (raw file):

    tasks = session.query(Task).filter(Task.name == task_name)

Also filter by contest_name.

cmscontrib/importing.py, line 88 at r2 (raw file):

        tasks = tasks.filter(Task.id == task_id)

    tasks = tasks.all()

Use .one() instead (which should be more efficient) and detect the two cases of the if below by catching the two exceptions it can raise. See
http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.one

cmscontrib/ImportTask.py, line 69 at r2 (raw file):

    def __init__(self, path, prefix, override_name, update, no_statement,
                 contest_id, task_id, loader_class):

Same here.

cmscontrib/ImportTask.py, line 142 at r2 (raw file):

        """
        if self.update:

You changed the semantics of this function. Now if a task with that name and contest is already in the DB and update is false no exception will be raised.

Please fix this.

cmscontrib/RemoveTask.py, line 50 at r2 (raw file):

def remove_task(task_name, task_id):

Same here.

wil93

Reviewable status: 8 of 9 files reviewed, 18 unresolved discussions (waiting on @lerks and @wil93)

a discussion (no related file):

Previously, lerks (Luca Wehrstedt) wrote…

The unit tests for ImportContest, ImportTask, ImportDataset and AddStatement are failing, please fix them.

Done (I think, let's see what travis says)

a discussion (no related file):

Previously, lerks (Luca Wehrstedt) wrote…

In some places you switched some functions from doing their own logging of errors and returning True/False to them raising an exception with the error message. I prefer the latter too and think it's good to do it, but if so please do it everywhere (or nowhere) consistently and do it in a separate commit for clarity.

I usually just update those that I encounter while changing other stuff, maybe a specific commit can be done for the remaining ones, after the PR...

a discussion (no related file):

Previously, lerks (Luca Wehrstedt) wrote…

In a couple of cases you changed the behavior of the tools. That is probably what is causing some of the test failures. Please don't do that. If all tasks in the database just happen to all have distinct names, the tools should keep working exactly like they did before. If that is not the case, the tools should require the least possible information necessary to disambiguate.

Most of the failures were because I forgot to add =None to the new task_id parameter. I think all behaviors (when task names are different) should be the same, unless I did some mistake.

cmscontrib/AddStatement.py, line 47 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Add contest_name.

Provide a None default for both contest_name and task_id (which means moving them to the end of the signature).

Added default None for task_id. For contest name I'm not sure why we would need that...

cmscontrib/AddStatement.py, line 103 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Also add a --contest-name, -c option.

Do we really need it?

cmscontrib/AddStatement.py, line 120 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Does add_statement ever return False now? Isn't the return value in fact meaningless? This could be changed to just return 0.

Done.

cmscontrib/AddSubmission.py, line 166 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

If you feel like adding --contest-name and --task-id here too it wouldn't hurt I guess.

Mmm but if we already have contest_id then why specify also contest_name? Also task_id should never be necessary here, since contest_id and task_name are mandatory (and they uniquely identify which task we are submitting to)

cmscontrib/AddTestcases.py, line 50 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Same here: allow contest_name.

I had removed it because it didn't look necessary. Now I added it back, I guess it could stay for backwards compatibility.

cmscontrib/ImportContest.py, line 205 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

You changed the behavior of the function. In this case you import the task without any checks and later you unconditionally bind it to a contest. An exception (of the wrong type) could be raised.

Which kind of check should we do? If -i is used, then the task can always be imported (since we don't have the name constraint anymore). Now I tried to reflect this in the testsuite.

cmscontrib/ImportDataset.py, line 56 at r5 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Same here (contest_name).

Is it necessary?

cmscontrib/importing.py, line 68 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Add contest_name too, between task_name and task_id.

Also make contest_name and task_id by default None.

Made task_id None by default.

cmscontrib/importing.py, line 72 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

s/string/str/ (not your change, but as we're moving to py3 we're slowly fixing them wherever we find them)

Done.

cmscontrib/importing.py, line 77 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

?

Done. I initially implemented the "disambiguation screen" as a separate exception (which extended ImportDataError) but then I refactored it as just a custom message of ImportDataError.

cmscontrib/importing.py, line 84 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Also filter by contest_name.

Is it necessary?

cmscontrib/importing.py, line 88 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Use .one() instead (which should be more efficient) and detect the two cases of the if below by catching the two exceptions it can raise. See
http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.one

Mmm... I just tried but the problem is, in the "MultipleResultsFound" case, I need to access the tasks list (in order to show the disambiguation screen). I suspect the only way to do that is with .all()

cmscontrib/ImportTask.py, line 69 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Same here.

Made task_id None.

cmscontrib/ImportTask.py, line 142 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

You changed the semantics of this function. Now if a task with that name and contest is already in the DB and update is false no exception will be raised.

Please fix this.

I'm not completely sure of some cases: if I'm trying to update a task with cmsImportTask foo -c 123, do I want to find the task with name foo in contest 123 (which is unique, even though there could be many foos) and then update its content? Or do I want to find the task with name foo (assuming it's unique, otherwise I'll specify the ID as well) and update it as in "update its content AND tying it to contest 123"?

What if the task with name foo is unique and not tied to any contest yet? (a special case of the case described above). Should I tie it or just update the contests? Or maybe say "can't find task with that name in the contest"?

cmscontrib/RemoveTask.py, line 50 at r2 (raw file):

Previously, lerks (Luca Wehrstedt) wrote…

Same here.

Made task_id optional

lw

Reviewable status: 2 of 10 files reviewed, 17 unresolved discussions (waiting on @lerks)

a discussion (no related file):

Previously, wil93 (William Di Luigi) wrote…

Done (I think, let's see what travis says)

Nope. AddStatement and ImportTask are still broken. (You can run those tests yourself, offline, before pushing, by launching cmstestsuite/RunUnitTests.py)

a discussion (no related file):

Previously, wil93 (William Di Luigi) wrote…

I usually just update those that I encounter while changing other stuff, maybe a specific commit can be done for the remaining ones, after the PR...

I still want it to be done in a separate commit from all the rest of the work in this PR as otherwise it's just a mess to read.

cmscontrib/AddStatement.py, line 47 at r2 (raw file):

Previously, wil93 (William Di Luigi) wrote…

Added default None for task_id. For contest name I'm not sure why we would need that...

We don't "need" the contest name and, for that matter, we don't "need" the task name either (if we have the ID). I'm sure, however, that admins remember the names they gave to things better than the IDs that CMS gave to those same things, making it easier for them to use these commands. Also, this allows them to use the same commands (and, say, to put it in scripts) and have it work across reimports. I would actually ask you to remove the task ID if it weren't necessary to disambiguate between unbound tasks.

stefano-maggiolo

Reviewable status: 2 of 10 files reviewed, 17 unresolved discussions (waiting on @lerks and @wil93)

a discussion (no related file):

Previously, lerks (Luca Wehrstedt) wrote…

Nope. AddStatement and ImportTask are still broken. (You can run those tests yourself, offline, before pushing, by launching cmstestsuite/RunUnitTests.py)

Alternatively, you can setup travis to run on your fork and push a different branch than the one you're using for this PR to have it run the tests without affecting the PR.

Remove unique=True from Task.name

d283e77

Fix tools, broken by the removal of Task.name uniqueness

509258c

wil93 added 3 commits August 24, 2018 01:12

Fix more tools

1b2e04f

minor

ad6dc8f

minor-ish

8531811

lw requested changes Aug 27, 2018

View reviewed changes

fix after review

7f5590b

wil93 commented Aug 28, 2018

View reviewed changes

lw requested changes Aug 28, 2018

View reviewed changes

stefano-maggiolo reviewed Sep 14, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support having 2 tasks with the same name on DB #998

Support having 2 tasks with the same name on DB #998

wil93 commented Aug 22, 2018 •

edited by stefano-maggiolo

Loading

codecov bot commented Aug 22, 2018 •

edited

Loading

lw commented Aug 23, 2018

lw commented Aug 23, 2018

wil93 commented Aug 23, 2018

lw commented Aug 23, 2018 via email

lw commented Aug 23, 2018 via email

wil93 commented Aug 23, 2018

stefano-maggiolo commented Aug 23, 2018

wil93 commented Aug 23, 2018

stefano-maggiolo commented Aug 23, 2018

lw commented Aug 23, 2018

lw commented Aug 23, 2018

lw commented Aug 23, 2018

wil93 commented Aug 23, 2018

wil93 commented Aug 23, 2018

stefano-maggiolo commented Aug 27, 2018

wil93 commented Aug 27, 2018

lw left a comment

wil93 left a comment

lw left a comment

stefano-maggiolo left a comment

Support having 2 tasks with the same name on DB #998

Are you sure you want to change the base?

Support having 2 tasks with the same name on DB #998

Conversation

wil93 commented Aug 22, 2018 • edited by stefano-maggiolo Loading

codecov bot commented Aug 22, 2018 • edited Loading

Codecov Report

lw commented Aug 23, 2018

lw commented Aug 23, 2018

wil93 commented Aug 23, 2018

lw commented Aug 23, 2018 via email

lw commented Aug 23, 2018 via email

wil93 commented Aug 23, 2018

stefano-maggiolo commented Aug 23, 2018

wil93 commented Aug 23, 2018

stefano-maggiolo commented Aug 23, 2018

lw commented Aug 23, 2018

lw commented Aug 23, 2018

lw commented Aug 23, 2018

wil93 commented Aug 23, 2018

wil93 commented Aug 23, 2018

stefano-maggiolo commented Aug 27, 2018

wil93 commented Aug 27, 2018

lw left a comment

Choose a reason for hiding this comment

wil93 left a comment

Choose a reason for hiding this comment

lw left a comment

Choose a reason for hiding this comment

stefano-maggiolo left a comment

Choose a reason for hiding this comment

wil93 commented Aug 22, 2018 •

edited by stefano-maggiolo

Loading

codecov bot commented Aug 22, 2018 •

edited

Loading