Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix cleanup after loading autoload definitions #597

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mmatera
Copy link
Contributor

@mmatera mmatera commented Nov 6, 2022

This is the part of #595 that avoids that definitions in autoload modules erase the builtin definitions.

@rocky
Copy link
Member

rocky commented Nov 7, 2022

If this is going to be split from #595, which I think is good, then I'd like to understand this more deeply. (The discussion was started in #595.) I am pretty sure that on its own there is more going on there that could or should be straightened out.

@rocky rocky force-pushed the fix_definition_autoload_to_builtin branch from be0fb71 to d95f501 Compare November 12, 2022 19:03
@rocky
Copy link
Member

rocky commented Nov 13, 2022

@mmatera Although the changes here are pretty small, in trying to understand what is up here, I think we have been going down the wrong path.

In short, we should not be copying user definitions into builtin, inside Definitions() initialization. Instead that should be done once outside.

Here is the way this kind of thing often works in interpreters.

There are a system-level definitions object that are part of the base system. Initializing this is done once.

When a new session starts, instead of reading builtins, those should be copied from the system definitions. Or no is done copy is done, but system definitions are magically merged in on definition lookup.

I really wonder what the user case is for creating definitions with the parameter add_builtin set to False. I can see possibly in testing you might want that to speed testing.

Does this make sense?

@mmatera
Copy link
Contributor Author

mmatera commented Nov 13, 2022

@rocky, I can tell you why this works in the current implementation, but not if the current implementation is good.
The reason why the definitions object stores "user" definitions separated from "built-in" definitions is because by "clear" the definition of a symbol, we mean to restore it to it's builtin definition. So, if we clear Plus , we want to restore it's definition to the built-in definition. In evaluation, user definitions and built-in definitions are merged and stored in a cache of merged definitions. So, to clear a definition, the user and cached definitions are deleted.

Autoload code adds definitions by evaluating WL expressions, so the resulting definitions are stored as user definitions. But then, we want to use these definitions in equal foot with the built-in definitions (those coming from contribute) In a way that if we clean a symbol generated from an autoload module, it's definition go back that default state.
To do that, the built-in definitions dict is updated with the autoload definitions, and then the copy in the user dict is erased.

In master, the built-in definition is updated from the user dict instead from the merged version, as it is done here. This is why if you provide nee rules in an autoload module, in master, the built-in rules are lost.

@mmatera
Copy link
Contributor Author

mmatera commented Nov 13, 2022

Regarding what happens if add_builtin is ser to false in the constructor of Definitions, yes, it makes the load much faster, but leaves it in an state that is not functional to perform any real evaluation. It just allows to run some tests that do not involve any evaluation. Also, it could be useful if you want to build an empty Definitions object to be populated from another definitions object, or from definitions stored in a pickle file.

@rocky
Copy link
Member

rocky commented Nov 13, 2022

Thanks for the explanation. I kind of got that.

But doing things this was means the builtins (usually called "systems") dictionary can be recreated more than once . It only needs to be created once.

To address the issue of not having to build the (system) builtin dictionary more than once, instead hold that in a dictionary outside the Definitions object. If reading from files puts that in a "user" dictionary of a Definitions object in the special case where we want it instead in the (systems) builtin dictionary, that's okay, just move it from the Definitions object into a global dictionary that isn't part of the Definitions object. And then the initial Definitions object used to create the (system) builtin dictionary can be destroyed.

Then, when new Definitions objects are created, they access (not build) global (system) builtin dictionary. Of course, the (system) builtin dictionary, starts out empty to make the loading of the Definitions object used to create the (system) builtin dictionary work.

And by doing things this way, the copy doesn't appear inside the Definition initialization, but instead appears outside of it.

@rocky
Copy link
Member

rocky commented Nov 13, 2022

To be clear about "system" versus "builtin". In contrast to calling expression tree elements "leaves", and eval functions "apply", the using name "builtin" where "system" is common (and in fact used as the context name for just about all of the variables inside "builtin") is not wrong, just not as common as "system".

And in WMA you often find terms which deviate from the more common name for them but are still reasonable and clear. "Builtin" seems to be the term used in WMA.

@mmatera
Copy link
Contributor Author

mmatera commented Nov 13, 2022

Thanks for the explanation. I kind of got that.

But doing things this was means the builtins (usually called "systems") dictionary can be recreated more than once . It only needs to be created once.

This is not what happens in the current implementation: builtin definitions are created once, when the Definitions class is created.

Definitions stores four dictionaries with definitions:

  • builtin
  • pymathics
  • user
  • definitions_cache

The first dictionary (builtin) stores the builtin definitions, that are never deleted.
The second dictionary (user) stores definitions that are generated during the evaluations.
The third dictionary (pymathics) stores definitions from Pymathics modules

The fourth dictionary (definitions_cache) contains the merging of the definitions contained in the other three dictionaries.

When a Definitions object is loaded, the first step is to load the builtin definitions (by calling the method "contribute"). At the end of that stage, all the definitions contained in Builtin classes are loaded in that dictionary.

Then, autoload modules are evaluated, producing definitions that are stored in the user dictionary. However, these definitions should be handled on equal feet to the builtin definitions. In particular, they should not be deleted when Clear* are applied to the corresponding symbols. This is why both dicts are merged before the first "user" evaluation is run by the front end.

pymathics dict is populated/depleted when a python module is loaded (by calling LoadModule) / unloaded (by calling UnloadModule).

Each time one definition is modified in one of these dicts, the definition in definitions_cache is deleted. Then, in get_definitions, if the required symbol has an entry in definitions_cache, that definition is returned. Otherwise, if there is a definition in just one of the other dictionaries, that definition is returned. If there are definitions in more than one of the other dicts, a new definition is built by merging the definitions available and is stored in definitions_cache.

To address the issue of not having to build the (system) builtin dictionary more than once, instead hold that in a dictionary outside the Definitions object. If reading from files puts that in a "user" dictionary of a Definitions object in the special case where we want it instead in the (systems) builtin dictionary, that's okay, just move it from the Definitions object into a global dictionary that isn't part of the Definitions object. And then the initial Definitions object used to create the (system) builtin dictionary can be destroyed.

If this approach would be possible, it would not be much different than the current behaviour. We could store it as a class attribute instead of an object attribute, and then there would be just one instance. However, I guess that the current implementation was written thinking about having several instances of Definitions objects (it is, several independent evaluation contexts) which also could have different sets of autoload modules. If we assume that on the initialization, all the Definitions instances are identical, then we could keep the code almost as it is now, changing just an small number of lines.

In any case, the problem that we have in master would remain, and this PR would also be a fix in that case.

Then, when new Definitions objects are created, they access (not build) global (system) builtin dictionary. Of course, the (system) builtin dictionary, starts out empty to make the loading of the Definitions object used to create the (system) builtin dictionary work.

And by doing things this way, the copy doesn't appear inside the Definition initialization, but instead appears outside of it.

@rocky
Copy link
Member

rocky commented Nov 14, 2022

Thanks for the explanation. I kind of got that.
But doing things this was means the builtins (usually called "systems") dictionary can be recreated more than once . It only needs to be created once.

This is not what happens in the current implementation: builtin definitions are created once, when the Definitions class is created.

When a Definitions object is loaded, the first step is to load the builtin definitions (by calling the method "contribute").

The code here is considered for running every time Definitions is instantiated.
It should be done once outside of that instantiation and the system-wide unchanging builtin data can be used or copied inside definitions object creation if needed.

For now let's simplify discussion and let's assume no pymodules which means no changing pymodules between definitions instances.
Once we have the simple case down, that is, ignoring pymathics modules, then we can come back to the more complicated cases, if it is even needed.
I can't think of a good reason why several definitions objects should have different pymathics modules. Unless there is good reason, this is unnecessary generalization right now.

I also have had a number of problems with how pymathics modules work right now with respect to namespacing, but I will bring those issues up some time in the future. For now let's focus on having simple cases working in a more logical manner.

To address the issue of not having to build the (system) builtin dictionary more than once, instead hold that in a dictionary outside the Definitions object. If reading from files puts that in a "user" dictionary of a Definitions object in the special case where we want it instead in the (systems) builtin dictionary, that's okay, just move it from the Definitions object into a global dictionary that isn't part of the Definitions object. And then the initial Definitions object used to create the (system) builtin dictionary can be destroyed.

If this approach would be possible, it would not be much different than the current behaviour. We could store it as a class attribute instead of an object attribute, and then there would be just one instance. However, I guess that the current implementation was written thinking about having several instances of Definitions objects (it is, several independent evaluation contexts) which also could have different sets of autoload modules. If we assume that on the initialization, all the Definitions instances are identical, then we could keep the code almost as it is now, changing just an small number of lines.

In any case, the problem that we have in master would remain, and this PR would also be a fix in that case.

Having code to copy the definitions from the "user" space into "builtin" space on each instantiation smells wrong, because this aspect is part of the initial setup of the system, not a part of definitions object instantiation. And I note earlier in the code there is a comment about how using import instead of importlib also smells wrong.

I am more interested that things work in a more easily-explainable fashion (and without the circular dependencies) than I am that two pieces of code when run just happen to do the same thing.

@rocky rocky mentioned this pull request Nov 14, 2022
@mmatera
Copy link
Contributor Author

mmatera commented Nov 15, 2022

OK, here #620 is a proposal about how to load builtins and autoload just once. The need of migrating user definitions produced by autoload modules to the built-in is still there if we do not want to hack in some way the assignment module, or the add_rule/get_definitions methods.

@rocky
Copy link
Member

rocky commented Nov 15, 2022

OK, here #620 is a proposal about how to load builtins and autoload just once. The need of migrating user definitions produced by autoload modules to the built-in is still there if we do not want to hack in some way the assignment module, or the add_rule/get_definitions methods.

I appreciate you flexibility and willingness to listen. I must not be communicating very well here. It is not that in that very special situation where we are first initializing the system builtin definitions there isn't a copy, it is just that it is not done inside the init method and we also don't go module hunting inside the definition init method either. Those would be done in a place where systems initialization occurs. (I suspect this is inside builtin/__init__.py module which is also considered bad practice because those should be small since it is the parent of module of everything below.)

When I get a chance let me propose some code for how to do this which I believe will also reduce the fragility of circular imports because initializing a Definitions object should not be importing every other builtin module.

@mmatera
Copy link
Contributor Author

mmatera commented Nov 15, 2022

When I get a chance let me propose some code for how to do this which I believe will also reduce the fragility of circular imports because initializing a Definitions object should not be importing every other builtin module.

Sure. It would be great. However, I do not see a hurry in that reformulation. On the other hand, to continue with the MakeBoxes wl module, it would be useful to merge this as an intermediate step.

@rocky
Copy link
Member

rocky commented Nov 15, 2022

On the other hand, to continue with the MakeBoxes wl module, it would be useful to merge this as an intermediate step.

A couple of thoughts here. Is the MakeBoxes wl module fully complete? If not complete it in WL.

On a broader scale, right now we are finding that there are more fundamental problems with loading and running simple packages, even ones we write. The package loading issue feels more important.

After that is done we should be able to segregate the boxing rules that exist as class-level dictionary assignments in Python coe, put those in a Mathics .m file and read those in and fix how those are wrong. And even when it can't be done to the current Mathics implementation it can be done on WMA, to check that these are right.

Too often I find that we are rushing ahead with not-fully-thought-out work, rather than improving the base form which we can work on.

Last though, if all you want to see is a proof of concept of the MakeBoxes code in Mathics you are writing, then keep all of these quick hacks in your private fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants