Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc/cross reference #791

Merged
merged 9 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
pokemon.csv
yellow_trip_data.parquet
3 changes: 2 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@ maturin
jinja2
ipython
pandas
pickleshare
pickleshare
sphinx-autoapi
31 changes: 0 additions & 31 deletions docs/source/api.rst

This file was deleted.

27 changes: 0 additions & 27 deletions docs/source/api/dataframe.rst

This file was deleted.

29 changes: 0 additions & 29 deletions docs/source/api/execution_context.rst

This file was deleted.

27 changes: 0 additions & 27 deletions docs/source/api/expression.rst

This file was deleted.

27 changes: 0 additions & 27 deletions docs/source/api/functions.rst

This file was deleted.

27 changes: 0 additions & 27 deletions docs/source/api/object_store.rst

This file was deleted.

56 changes: 26 additions & 30 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,11 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.doctest",
"sphinx.ext.ifconfig",
"sphinx.ext.mathjax",
"sphinx.ext.viewcode",
"sphinx.ext.napoleon",
"myst_parser",
"IPython.sphinxext.ipython_directive",
"autoapi.extension",
]

source_suffix = {
Expand All @@ -70,33 +66,35 @@
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []

# Show members for classes in .. autosummary
autodoc_default_options = {
"members": None,
"undoc-members": None,
"show-inheritance": None,
"inherited-members": None,
}

autosummary_generate = True

autoapi_dirs = ["../../python"]
autoapi_ignore = ["*tests*"]
autoapi_member_order = "groupwise"
suppress_warnings = ["autoapi.python_import_resolution"]
autoapi_python_class_content = "both"

def autodoc_skip_member(app, what, name, obj, skip, options):
exclude_functions = "__init__"
exclude_classes = ("Expr", "DataFrame")

class_name = ""
if hasattr(obj, "__qualname__"):
if obj.__qualname__ is not None:
class_name = obj.__qualname__.split(".")[0]
def autoapi_skip_member_fn(app, what, name, obj, skip, options):
skip_contents = [
# Re-exports
("class", "datafusion.DataFrame"),
("class", "datafusion.SessionContext"),
("module", "datafusion.common"),
# Deprecated
("class", "datafusion.substrait.serde"),
("class", "datafusion.substrait.plan"),
("class", "datafusion.substrait.producer"),
("class", "datafusion.substrait.consumer"),
("method", "datafusion.context.SessionContext.tables"),
("method", "datafusion.dataframe.DataFrame.unnest_column"),
]
if (what, name) in skip_contents:
skip = True

should_exclude = name in exclude_functions and class_name in exclude_classes
return skip

return True if should_exclude else None


def setup(app):
app.connect("autodoc-skip-member", autodoc_skip_member)
def setup(sphinx):
sphinx.connect("autoapi-skip-member", autoapi_skip_member_fn)


# -- Options for HTML output -------------------------------------------------
Expand All @@ -106,9 +104,7 @@ def setup(app):
#
html_theme = "pydata_sphinx_theme"

html_theme_options = {
"use_edit_page_button": True,
}
html_theme_options = {"use_edit_page_button": False, "show_toc_level": 2}

html_context = {
"github_user": "apache",
Expand Down
2 changes: 0 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,5 +104,3 @@ Example
:hidden:
:maxdepth: 1
:caption: API

api
14 changes: 8 additions & 6 deletions docs/source/user-guide/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
.. specific language governing permissions and limitations
.. under the License.

.. _user_guide_concepts:

Concepts
========

Expand Down Expand Up @@ -52,7 +54,7 @@ The first statement group:
# create a context
ctx = datafusion.SessionContext()

creates a :code:`SessionContext`, that is, the main interface for executing queries with DataFusion. It maintains the state
creates a :py:class:`~datafusion.context.SessionContext`, that is, the main interface for executing queries with DataFusion. It maintains the state
of the connection between a user and an instance of the DataFusion engine. Additionally it provides the following functionality:

- Create a DataFrame from a CSV or Parquet data source.
Expand All @@ -72,9 +74,9 @@ The second statement group creates a :code:`DataFrame`,
df = ctx.create_dataframe([[batch]])

A DataFrame refers to a (logical) set of rows that share the same column names, similar to a `Pandas DataFrame <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_.
DataFrames are typically created by calling a method on :code:`SessionContext`, such as :code:`read_csv`, and can then be modified by
calling the transformation methods, such as :meth:`.DataFrame.filter`, :meth:`.DataFrame.select`, :meth:`.DataFrame.aggregate`,
and :meth:`.DataFrame.limit` to build up a query definition.
DataFrames are typically created by calling a method on :py:class:`~datafusion.context.SessionContext`, such as :code:`read_csv`, and can then be modified by
calling the transformation methods, such as :py:func:`~datafusion.dataframe.DataFrame.filter`, :py:func:`~datafusion.dataframe.DataFrame.select`, :py:func:`~datafusion.dataframe.DataFrame.aggregate`,
and :py:func:`~datafusion.dataframe.DataFrame.limit` to build up a query definition.

The third statement uses :code:`Expressions` to build up a query definition.

Expand All @@ -85,5 +87,5 @@ The third statement uses :code:`Expressions` to build up a query definition.
col("a") - col("b"),
)

Finally the :code:`collect` method converts the logical plan represented by the DataFrame into a physical plan and execute it,
collecting all results into a list of `RecordBatch <https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html>`_.
Finally the :py:func:`~datafusion.dataframe.DataFrame.collect` method converts the logical plan represented by the DataFrame into a physical plan and execute it,
collecting all results into a list of `RecordBatch <https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html>`_.
2 changes: 1 addition & 1 deletion docs/source/user-guide/common-operations/aggregations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Aggregation
============

An aggregate or aggregation is a function where the values of multiple rows are processed together to form a single summary value.
For performing an aggregation, DataFusion provides the :meth:`.DataFrame.aggregate`
For performing an aggregation, DataFusion provides the :py:func:`~datafusion.dataframe.DataFrame.aggregate`

.. ipython:: python

Expand Down
8 changes: 4 additions & 4 deletions docs/source/user-guide/common-operations/basic-info.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,26 +34,26 @@ In this section, you will learn how to display essential details of DataFrames u
})
df

Use :meth:`.DataFrame.limit` to view the top rows of the frame:
Use :py:func:`~datafusion.dataframe.DataFrame.limit` to view the top rows of the frame:

.. ipython:: python

df.limit(2)

Display the columns of the DataFrame using :meth:`.DataFrame.schema`:
Display the columns of the DataFrame using :py:func:`~datafusion.dataframe.DataFrame.schema`:

.. ipython:: python

df.schema()

The method :meth:`.DataFrame.to_pandas` uses pyarrow to convert to pandas DataFrame, by collecting the batches,
The method :py:func:`~datafusion.dataframe.DataFrame.to_pandas` uses pyarrow to convert to pandas DataFrame, by collecting the batches,
passing them to an Arrow table, and then converting them to a pandas DataFrame.

.. ipython:: python

df.to_pandas()

:meth:`.DataFrame.describe` shows a quick statistic summary of your data:
:py:func:`~datafusion.dataframe.DataFrame.describe` shows a quick statistic summary of your data:

.. ipython:: python

Expand Down
12 changes: 7 additions & 5 deletions docs/source/user-guide/common-operations/expressions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
.. specific language governing permissions and limitations
.. under the License.

.. _expressions:

Expressions
===========

Expand All @@ -26,16 +28,16 @@ concept shared across most compilers and databases.
Column
------

The first expression most new users will interact with is the Column, which is created by calling :func:`col`.
This expression represents a column within a DataFrame. The function :func:`col` takes as in input a string
The first expression most new users will interact with is the Column, which is created by calling :py:func:`~datafusion.col`.
This expression represents a column within a DataFrame. The function :py:func:`~datafusion.col` takes as in input a string
and returns an expression as it's output.

Literal
-------

Literal expressions represent a single value. These are helpful in a wide range of operations where
a specific, known value is of interest. You can create a literal expression using the function :func:`lit`.
The type of the object passed to the :func:`lit` function will be used to convert it to a known data type.
a specific, known value is of interest. You can create a literal expression using the function :py:func:`~datafusion.lit`.
The type of the object passed to the :py:func:`~datafusion.lit` function will be used to convert it to a known data type.

In the following example we create expressions for the column named `color` and the literal scalar string `red`.
The resultant variable `red_units` is itself also an expression.
Expand All @@ -62,7 +64,7 @@ Functions
---------

As mentioned before, most functions in DataFusion return an expression at their output. This allows us to create
a wide variety of expressions built up from other expressions. For example, :func:`.alias` is a function that takes
a wide variety of expressions built up from other expressions. For example, :py:func:`~datafusion.expr.Expr.alias` is a function that takes
as it input a single expression and returns an expression in which the name of the expression has changed.

The following example shows a series of expressions that are built up from functions operating on expressions.
Expand Down
Loading
Loading