Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose the correct namespaces for Python modules #751

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

datapythonista
Copy link

Currently, in the datafusion module there is no match between what dir() returns, and what's actually inside the module. For example:

>>> import datafusion

>>> dir(datafusion)
['ABCMeta', 'Accumulator', 'AggregateUDF', 'Alias', 'Analyze', 'Between', 'Case', 'Cast', 'Config', 'CreateMemoryTable', 'CreateView', 'DFSchema', 'DataFrame', 'Distinct', 'DropTable', 'Exists', 'Explain', 'Expr', 'Extension', 'Filter', 'GroupingSet', 'ILike', 'InList', 'InSubquery', 'IsFalse', 'IsNotFalse', 'IsNotNull', 'IsNotTrue', 'IsNotUnknown', 'IsTrue', 'IsUnknown', 'Like', 'Limit', 'List', 'Negative', 'Not', 'Partitioning', 'Placeholder', 'Projection', 'Repartition', 'RuntimeConfig', 'SQLOptions', 'ScalarSubquery', 'ScalarUDF', 'ScalarVariable', 'SessionConfig', 'SessionContext', 'SimilarTo', 'Sort', 'Subquery', 'SubqueryAlias', 'TableScan', 'TryCast', 'Window', 'WindowFrame', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_internal', 'abstractmethod', 'col', 'column', 'common', 'expr', 'importlib_metadata', 'lit', 'literal', 'pa', 'udaf', 'udf']

>>> datafusion.ABCMeta  # <- THIS IS A STANDARD LIBRARY CLASS, NOT PART OF DATAFUSION
<class 'abc.ABCMeta'>

>>> dir(datafusion.common)  # <- BASED ON THIS, THERE IS NOTHING IN THE MODULE EXCEPT THE MODULE ITSELF AGAIN
['__builtins__', '__cached__', '__doc__', '__file__', '__getattr__', '__loader__', '__name__', '__package__', '__spec__', 'common']

>>> datafusion.common.common
<module 'common'>

>>> datafusion.common.SqlTable  # <- BUT WHEN USING THE MODULE THERE ARE ACTUALLY CLASSES
<class 'datafusion.common.SqlTable'>

With this PR, I basically show what is part of the module, and I hide what's being displayed now as a side effect:

>>> import datafusion

>>> dir(datafusion)
['Accumulator', 'AggregateUDF', 'Alias', 'Analyze', 'Between', 'Case', 'Cast', 'Config', 'CreateMemoryTable', 'CreateView', 'DFSchema', 'DataFrame', 'Distinct', 'DropTable', 'Exists', 'Explain', 'Expr', 'Extension', 'Filter', 'GroupingSet', 'ILike', 'InList', 'InSubquery', 'IsFalse', 'IsNotFalse', 'IsNotNull', 'IsNotTrue', 'IsNotUnknown', 'IsTrue', 'IsUnknown', 'Like', 'Limit', 'Negative', 'Not', 'Partitioning', 'Placeholder', 'Projection', 'Repartition', 'RuntimeConfig', 'SQLOptions', 'ScalarSubquery', 'ScalarUDF', 'ScalarVariable', 'SessionConfig', 'SessionContext', 'SimilarTo', 'Sort', 'Subquery', 'SubqueryAlias', 'TableScan', 'TryCast', 'Window', 'WindowFrame', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_internal', 'col', 'column', 'common', 'expr', 'lit', 'literal', 'udaf', 'udf']

>>> dir(datafusion.common)
['DFSchema', 'DataType', 'DataTypeMap', 'PythonType', 'SqlFunction', 'SqlSchema', 'SqlStatistics', 'SqlTable', 'SqlType', 'SqlView', '__builtins__', '__cached__', '__dir__', '__doc__', '__file__', '__getattr__', '__loader__', '__name__', '__package__', '__spec__']

Personally, instead of exposing just the _internal module from PyO3, I would create all these modules and submodules directly from PyO3. I'd use _datafusion as the main module, but instead of having a common.py file, the _datafusion.common submodule would be created directly from PyO3. So, all this magic of __getattr__ and __dir__ is not needed. @andygrove is there a reason why this wasn't implemented like this?

Also, I don't think it's a good practice to have everything in the main namespace. I think it's fine that for examples users have to use datafusion.expr.IsTrue instead of datafusion.IsTrue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant