Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change configuration format from JSON to TOML #1224

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

magula
Copy link
Contributor

@magula magula commented Feb 1, 2023

In light of #119, this would change the format of the cms.conf and cms.ranking.conf configuration files from JSON to TOML. Apart from the obvious, the changes include:

  • Transforms the _section headers of the previous JSON format into TOML sections.
    The TOML sections of cms.conf correspond to predefined dataclasses in Python, i.e. there is an object for each TOML section that contains its attributes. E.g., keep_sandbox can be accessed as config.worker.keep_sandbox, while previously it would have just been config.keep_sandbox. Some variable names also change: config.admin_cookie_duration becomes config.aws.cookie_duration, and config.cookie_duration becomes config.cws.cookie_duration.
    This should structure the attributes better and provide more consistent key names.

  • The structure of cms.ranking.conf is left untouched, the file is just translated to TOML.

  • Includes a check if the config file that would be used is in JSON format. If so, the user is warned about the change and advised to tranform the config into the new format. He is also presented with an attempt at translating the file.
    This works by injecting values taken from the JSON config into a Jinja template of the bare TOML config.
    If an unexpected key is present in the JSON config, the key-value pair is put in a section called stray. When loading a TOML config, everything in stray is stored as a field of Config, so it can still be accessed as config.[keyname].
    The automatic translation works only heuristically. The attributes are rendered in the Jinja template using tojson, so there is no guarantee the outcome will adhere to the TOML specification.
    One could use a TOML writer instead, but that would add another package dependency.

Note that these changes have not been tested thoroughly.

Any feedback is very welcome!

The converter puts unknown keys under a section 'stray'.
The config parser puts attributes from the 'stray' section at the top
level, where JSON keys used to be put.
This means that keys not considered in the new config structure should
continue working.
@codecov
Copy link

codecov bot commented Feb 1, 2023

Codecov Report

Merging #1224 (a460387) into master (f541efc) will decrease coverage by 0.20%.
The diff coverage is 67.46%.

@@            Coverage Diff             @@
##           master    #1224      +/-   ##
==========================================
- Coverage   63.59%   63.40%   -0.20%     
==========================================
  Files         233      233              
  Lines       17151    17189      +38     
==========================================
- Hits        10907    10898       -9     
- Misses       6244     6291      +47     
Flag Coverage Δ
functionaltests 47.29% <67.46%> (-0.07%) ⬇️
unittests 43.96% <61.11%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cms/db/util.py 55.55% <ø> (+2.22%) ⬆️
cms/io/service.py 72.78% <ø> (-0.64%) ⬇️
cms/server/contest/server.py 86.36% <ø> (ø)
cms/conf.py 73.52% <66.66%> (-12.08%) ⬇️
cms/io/web_service.py 91.48% <66.66%> (+4.25%) ⬆️
cmsranking/Config.py 72.61% <80.00%> (-1.20%) ⬇️
cms/server/contest/authentication.py 100.00% <100.00%> (ø)
cms/db/base.py 85.14% <0.00%> (-3.97%) ⬇️
cms/service/workerpool.py 63.88% <0.00%> (-2.78%) ⬇️
cms/server/admin/handlers/dataset.py 29.76% <0.00%> (-2.61%) ⬇️
... and 16 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

setup.py Outdated Show resolved Hide resolved
Copy link
Member

@wil93 wil93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for taking care of this important task!

A thought about the use of sections: I get the desire of organizing keys into related groups, however maybe it would be better to keep all the keys under the same "namespace" as they were in the JSON, as to reduce the necessary changes to the absolute minimum (thus minimizing the risk of new bugs) especially given that we don't have a type system helping us to verify with certainty that all config usages are correctly migrated. Doing this would also allow us to continue supporting JSON for backwards-compatibility (I like the "heuristically generated" TOML version, it will be very helpful to users 😄 however I think that for some time can still support the JSON file when a TOML version is missing).

Also, I remember that some time ago there was some discussion about wanting to move many of the configuration options away from cms.conf and into the database: ideally, the cms.conf file would almost exclusively be used to specify the database connection string, while most of the options would be stored in the database itself (this would make it easier to change the values, e.g. "keep_sandbox" could be a simple checkbox in the AWS panel) and with this scenario in mind, I think it would make more sense to not organize options into sections.

Another thing: maybe we can update the file name to include .toml as the extension? I think there could be some value in indicating this extension (it clearly communicates that it's a TOML file so users, and most importantly their IDE / editors, will expect to find TOML syntax inside of it). For example we could change cms.conf to cms.toml, or if we really want to keep conf we could usecms-conf.toml or cms.conf.toml.

What do you think?

setup.py Outdated Show resolved Hide resolved
@magula
Copy link
Contributor Author

magula commented Feb 3, 2023

Yes, not grouping the keys into sections, thus changing way less code is definitely safer, and preserving backwards-compatibility is also a big plus. I am indecisive on this, but I think I will change it back then. :)

Would you mind if we still renamed

  • cookie_duration -> contest_cookie_duration and
  • num_proxies_used -> contest_num_proxies_used,

but to preserve compatibility used the old names as aliases when loading the config? These changes should be small enough to reasonably test manually.
Also, I would like to rename the "section comment" ScoringService to ProxyService, which seems more fitting. ScoringService only uses one of these attributes to find out whether it is worth to connect to ProxyService because the ranking is enabled.

Should we keep

  • the attributes' type annotations, and
  • the check in the config loader for each attribute key if that config attribute exists (up until now, a typo in a key would have gone unnoticed and the default value been used inadvertently)?

Naming the config file cms.toml also seems like a good idea, I will do that. I had only refrained from it to not change even more of the code. 😄

PS: I am not sure I would be happy about moving much of the configuration to the database, actually. We tend to use the same basic configuration for new setups, and then only adapt where it is necessary. This is easily done with a configuration file I can just change and copy to new servers (in fact I have just started keeping these in a git repo). Also, even with our regular setup for our selection process, we usually initialize a new database once a year.
If these things became a part of the database and the way to change them would be via the admin interface, we would have to write down what goes in there and then manually add it, or write a script to load the configuration into the database, which seems overly complicated. So my concerns are similar, I think, to what @lw said in #119.

@wil93
Copy link
Member

wil93 commented Feb 4, 2023

Would you mind if we still renamed

  • cookie_duration -> contest_cookie_duration and
  • num_proxies_used -> contest_num_proxies_used,

but to preserve compatibility used the old names as aliases when loading the config? These changes should be small enough to reasonably test manually.

Yeah that sounds reasonable (I assume these aliases would work both in the legacy JSON file and the new TOML one?)

Should we keep

  • the attributes' type annotations, and
  • the check in the config loader for each attribute key if that config attribute exists (up until now, a typo in a key would have gone unnoticed and the default value been used inadvertently)?

The type annotations are a big plus in my opinion 😄

As for the second point, one concern I have is that (if I remember correctly) sometimes for a contest we would remove keys from the configuration file since anyway they were set to the default value, thus creating a "minimal" configuration file that only ovverrides what is strictly necessary. Would this change break such "minimal" configuration files?

PS: I am not sure I would be happy about moving much of the configuration to the database, actually. We tend to use the same basic configuration for new setups, and then only adapt where it is necessary.

That's a great input, I will keep this use-case in mind. I think for now it's fine to continue using a configuration file, and if we move to DB-configuration then we should look into having a file representation anyway to support this use-case (e.g. exporting the settings to a toml file, and being able to restore them)

cms/conf.py Outdated
Comment on lines 232 to 233
legacy_paths = [os.path.join(p, "cms.conf") for p in etc_paths]
paths = [os.path.join(p, "cms.toml") for p in etc_paths]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that maybe it would make sense here to not differentiate between legacy_paths and normal paths, we could just have paths (with the legacy ones appended at the end) and simply try to read sequentially each one of them: as soon as we find a file that we manage to read we return the configuration found there. We would try to read in JSON basically only when reading in TOML raises some kind of format exception. I think the code would be simpler. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I put the legacy paths first though, in case the new sample config is copied during installation: Otherwise, this sample config would be used inadvertently and the message about the old config file would not appear. So now,

  • a warning is going to be raised as soon as one of the files can be read in JSON format before one can be read in TOML format, and
  • this JSON config will then proceed to be loaded.

Let me know if I should change this.

@magula
Copy link
Contributor Author

magula commented Feb 4, 2023

Yeah that sounds reasonable (I assume these aliases would work both in the legacy JSON file and the new TOML one?)

Yes, they will work in both.

As for the second point, one concern I have is that (if I remember correctly) sometimes for a contest we would remove keys from the configuration file since anyway they were set to the default value, thus creating a "minimal" configuration file that only ovverrides what is strictly necessary. Would this change break such "minimal" configuration files?

It would not enforce keys to be present in the configuration file, so such a minimal configuration file will still work just the same. It just warns about config keys that it does not know about, instead of loading them silently. (A warning on missing keys is also included right now, but I will remove that again when I revert the changes of the config structure.)

That's a great input, I will keep this use-case in mind. I think for now it's fine to continue using a configuration file, and if we move to DB-configuration then we should look into having a file representation anyway to support this use-case (e.g. exporting the settings to a toml file, and being able to restore them)

Sounds good, thanks!

@magula
Copy link
Contributor Author

magula commented Feb 9, 2023

I removed the section headers (or table headers, as TOML calls it) in the last commit, so the structure is as before.

However, there seems to be no (legible) way to structure core_services and other_services if not as tables, because inline tables can not contain newlines. This is a problem because the top-level table ends with the first table, so nothing after that could be top-level again. Thus I moved these to the end of the TOML file in the last commit. They are now the only parts contained in a table, and they have to be at the end of the file, which seems unelegant. Do you see a better way to do this?

magula and others added 3 commits February 9, 2023 20:43
@wil93
Copy link
Member

wil93 commented Jan 18, 2024

Now that we have pytest as test runner, adding tests should be easier so it would be nice to add a few tests, for example:

  • a test to make sure that the values from the TOML / JSON file indeed are reflected in the config object
  • a test to verify the logic of choosing TOML over JSON when available, and falling back to JSON when not available

Let me know if you would be down to add these, if not I can try to spend some time on it when I get free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants