Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Pure Python style Configuration File #1071

Merged
merged 107 commits into from
Jun 16, 2023

Conversation

HAOCHENYE
Copy link
Collaborator

@HAOCHENYE HAOCHENYE commented Apr 12, 2023

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Add Pure Python style Configuration File

Current pure text style configuration files can satisfy most of our development needs and some module aliases can greatly simplify the configuration files (e.g. ResNet can refer to mmcls.models.ResNet). However, there are also some disadvantages:

  1. In the configuration file, the type field is specified by a string, and IDE cannot directly jump to the corresponding class definition, which is not conducive to code reading and jumping.
  2. The inheritance of configuration files is also specified by a string, and IDE cannot directly jump to the inherited file. When the inheritance structure of the configuration file is complex, it is not conducive to reading and jumping of the configuration file.
  3. The inheritance rules are relatively implicit, and beginners find it difficult to understand how the configuration file merges variables with the same fields and derives special syntax such as _delete_, resulting in a higher learning cost.
  4. It is easy for users to forget to register the module and cause module not found errors.
  5. In the yet-to-be-mentioned cross-codebase inheritance, the introduction of the scope makes the inheritance rules of the configuration file more complicated, and beginners find it difficult to understand.

In summary, although pure text style configuration files can provide the same syntax rules for python, json, and yaml format configurations, when the configuration files become complex, pure text style configuration files will appear inadequate. Therefore, we provide a pure Python style configuration file, i.e., the lazy import mode, which can fully utilize Python's syntax rules to solve the above problems. At the same time, the pure Python style configuration file also supports exporting to json and yaml formats.

Basic Syntax

Simply describe the syntax difference between python style config and pure text style config

Module Construction

We use a simple example to compare pure Python style and pure text style configuration files:

  1. Registration for pure Python Style and current pure text style:

Pure Python style

# No need for registration

Pure text style

from torch.optim import SGD
from mmengine.registry import OPTIMIZERS
OPTIMIZERS.register_module(module=SGD, name='SGD')
  1. Configuration file writing for pure Python style and current pure text style:

Pure Python style

# Configuration file writing
from torch.optim import SGD
optimizer = dict(type=SGD, lr=0.1)

Pure text style

# Configuration file writing
optimizer = dict(type='SGD', lr=0.1)
  1. Module construction for pure Python style and current pure text style:

The same for pure Python style and pure text style

import torch.nn as nn
from mmengine.registry import OPTIMIZERS
cfg = Config.fromfile('optimizer.py')
model = nn.Conv2d(1, 1, 1)
cfg.optimizer.params = model.parameters()
optimizer = OPTIMIZERS.build(cfg.optimizer)

From the above example, we can see that the difference between pure Python style and pure text style configuration files is:

  1. Pure Python style configuration files do not require module registration.
  2. In pure Python style configuration files, the type field is no longer a string but directly refers to the module. Correspondingly, import syntax needs to be added in the configuration file.

It should be noted that the OpenMMLab series algorithm library still retains the registration process when adding modules. When users build their own projects based on MMEngine, if they use pure Python style configuration files, registration is not required. You may wonder that if you are not in an environment with torch installed, you cannot parse the sample configuration file. Can this configuration file still be called a configuration file? Don't worry, we will explain this part later.

Inheritance

The inheritance syntax of pure Python style configuration files is slightly different:

Pure Python Style:

_base_ = [./optimizer.py]
if '_base_':
    from .optimizer import *

Pure Python style configuration files use import syntax to achieve inheritance. The advantage of doing this is that we can directly jump to the inherited configuration file for easy reading and jumping. The variable inheritance rule (add, delete, change, and search) is completely aligned with Python syntax. For example, if I want to modify the learning rate of the optimizer in the base configuration file:

if '_base_':
    from .optimizer import *

# optimizer is a variable defined in the base configuration file
optimizer.update(
    lr=0.01,
)

Of course, if you are already accustomed to the inheritance rules of pure text style configuration files and the variable is of the dict type in the _base_ configuration file, you can also use merge syntax to achieve the same inheritance rule as pure text style configuration files:

if '_base_':
    from .optimizer import *

# optimizer is a variable defined in the base configuration file
optimizer.merge(
    _delete_=True,
    lr=0.01,
    type='SGD'
)

# The equivalent Python style writing is as follows, completely consistent with Python's import rules
# optimizer = dict(
#     lr=0.01,
#     type='SGD'
# )

Compared with pure text style configuration files, the inheritance rule of pure Python style configuration files is completely aligned with the import syntax of Python, which is easier to understand and supports jumping between configuration files. You may wonder since both inheritance and module imports use import syntax, why do we need an if '_base_' statement for inheriting configuration files? On the one hand, this can improve the readability of configuration files, making inherited configuration files more prominent. On the other hand, it is also restricted by the rules of lazy_import, which will be explained later.

What is Lazy Import

You may find that pure Python style configuration files seem to organize configuration files using pure Python syntax. Then, I do not need configuration classes, and I could just import configuration files using Python syntax. If you have such a feeling, then it is worth celebrating because this is exactly the effect we want.

As mentioned earlier, parsing configuration files requires dependencies on third-party libraries referenced in the configuration files. This is actually a very unreasonable thing. For example, if I trained a model based on MMagic and wanted to deploy it with the onnxruntime backend of MMDeploy. Due to the lack of torch in the deployment environment, and torch is needed in the configuration file parsing process, this makes it inconvenient for me to directly use the configuration file of MMagic as the deployment configuration. To solve this problem, we introduced the concept of lazy_import.

It is a complex task to discuss the specific implementation of lazy_import, so here we only briefly introduce its function. The core idea of lazy_import is to delay the execution of the import statement in the configuration file until the configuration file is parsed, so that the dependency problem caused by the import statement in the configuration file can be avoided. During the configuration file parsing process, the equivalent code executed by the Python interpreter is as follows:

Original configuration file:

from torch.optim import SGD

optimizer = dict(type=SGD)

Code actually executed by the python interpreter through the configuration class:

lazy_obj = LazyObject('torch.optim', 'SGD')

optimizer = dict(type=lazy_obj)

As an internal type of the Config module, the LazyObject cannot be accessed directly by users. When accessing the type field, it will undergo a series of conversions to convert LazyObject into the actual torch.optim.SGD type. In this way, parsing the configuration file will not trigger the import of third-party libraries, while users can still access the types of third-party libraries normally when using the configuration file.

To access the internal type of LazyObject, you can use the Config.to_dict interface:

cfg = Config.fromfile('optimizer.py').to_dict()
print(type(cfg['optimizer']['type']))
# mmengine.config.lazy.LazyObject

At this point, the type accessed is the LazyObject type.

However, we cannot adopt the lazy import strategy for the inheritance (import) of base files since we need the configuration file parsed to include the fields defined in the base configuration file, and we need to trigger the import really. Therefore, we have added a restriction on importing base files, which must be imported in the if '_base_' code block.

Limitations

  1. Functions and classes cannot be defined in the configuration file.
  2. The configuration file name must comply with the naming convention of Python modules, which can only contain letters, numbers, and underscores, and cannot start with a number.
  3. When importing variables from the base configuration file, such as from ._base_.alpha import beta, the alpha here must be the module (module) name, i.e., a Python file, rather than the package (package) name containing __init__.py.
  4. Importing multiple variables simultaneously in an absolute import statement, such as import torch, numpy, os, is not supported. Multiple import statements need to be used instead, such as import torch; import numpy; import os.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

open-mmlab/mmdetection#10366
open-mmlab/mmyolo#787
open-mmlab/mmrazor#539
open-mmlab/mmpose#2390
open-mmlab/mmpretrain#1567

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@ly015
Copy link
Member

ly015 commented Jun 14, 2023

LGTM

@zhouzaida zhouzaida changed the title New config type [Feature] Support Pure Python style Configuration File Jun 15, 2023
@zhouzaida zhouzaida merged commit 6ece63e into open-mmlab:main Jun 16, 2023
Copy link

codecov bot commented Aug 23, 2024

Codecov Report

Attention: Patch coverage is 82.94011% with 94 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@3715fea). Learn more about missing BASE report.

Files Patch % Lines
mmengine/config/config.py 80.90% 40 Missing and 19 partials ⚠️
mmengine/config/utils.py 91.66% 5 Missing and 3 partials ⚠️
mmengine/runner/runner.py 33.33% 5 Missing and 3 partials ⚠️
mmengine/utils/package_utils.py 55.55% 4 Missing and 4 partials ⚠️
mmengine/config/lazy.py 93.58% 3 Missing and 2 partials ⚠️
mmengine/utils/misc.py 87.50% 2 Missing and 1 partial ⚠️
mmengine/registry/registry.py 77.77% 2 Missing ⚠️
mmengine/runner/loops.py 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1071   +/-   ##
=======================================
  Coverage        ?   77.90%           
=======================================
  Files           ?      140           
  Lines           ?    11974           
  Branches        ?     2464           
=======================================
  Hits            ?     9328           
  Misses          ?     2205           
  Partials        ?      441           
Flag Coverage Δ
unittests 77.90% <82.94%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants