TidierOrg · drizk1 · Sep 29, 2024 · Sep 26, 2024 · Sep 26, 2024 · Sep 26, 2024
diff --git a/README.md b/README.md
@@ -44,9 +44,10 @@ TidierDB.jl currently supports the following top-level macros:
 | **Helper Functions**             | `across`, `desc`, `if_else`, `case_when`, `n`, `starts_with`, `ends_with`, `contains`, `as_float`, `as_integer`, `as_string`, `is_missing`, `missing_if`, `replace_missing` |
 | **TidierStrings.jl Functions** | `str_detect`, `str_replace`, `str_replace_all`, `str_remove_all`, `str_remove`                                                                                               |
 | **TidierDates.jl Functions**   | `year`, `month`, `day`, `hour`, `min`, `second`, `floor_date`, `difftime`                                                                                                   |
-| **Aggregate Functions**          | `mean`, `minimum`, `maximum`, `std`, `sum`, `cumsum`, `cor`, `cov`, `var`, 
+| **Aggregate Functions**          | `mean`, `minimum`, `maximum`, `std`, `sum`, `cumsum`, `cor`, `cov`, `var`, all SQL aggregate
 
-`@summarize` supports any SQL aggregate function in addition to the list above. Simply write the function as written in SQL syntax and it will work.                                                                                                    |
+`@summarize` supports any SQL aggregate function in addition to the list above. Simply write the function as written in SQL syntax and it will work.
+`@mutate` supports nearly all functions built into SQL as well.                                                                                                    
 
 When using the DuckDB backend, if `db_table` recieves a file path (`.parquet`, `.json`, `.csv`, `iceberg` or `delta`), it does not copy it into memory. This allows for queries on files too big for memory. `db_table` also supports S3 bucket locations via DuckDB.
 
@@ -67,7 +68,9 @@ import TidierDB as DB
 db = DB.connect(DB.duckdb());
 path_or_name = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
 
-@chain DB.db_table(db, path_or_name) begin
+mtcars = DB.db_table(db, path_or_name);
+
+@chain DB.t(mtcars) begin
     DB.@filter(!starts_with(model, "M"))
     DB.@group_by(cyl)
     DB.@summarize(mpg = mean(mpg))
@@ -97,7 +100,7 @@ end
 We cannot do this using TidierDB. However, we can call `@pivot_longer()` from TidierData *after* the result of the query has been instantiated as a DataFrame, like this: 
 
 ```julia
-@chain DB.db_table(db, path_or_name) begin
+@chain DB.t(mtcars) begin
     DB.@filter(!starts_with(model, "M"))
     DB.@group_by(cyl)
     DB.@summarize(mpg = mean(mpg))
@@ -136,7 +139,7 @@ end
 We can replace `DB.collect()` with `DB.@show_query` to reveal the underlying SQL query being generated by TidierDB. To handle complex queries, TidierDB makes heavy use of Common Table Expressions (CTE), which are a useful tool to organize long queries.
 
 ```julia
-@chain DB.db_table(db, path_or_name) begin
+@chain DB.t(mtcars) begin
     DB.@filter(!starts_with(model, "M"))
     DB.@group_by(cyl)
     DB.@summarize(mpg = mean(mpg))
@@ -176,7 +179,7 @@ SELECT *
 ## TidierDB is already quite fully-featured, supporting advanced TidierData functions like `across()` for multi-column selection.
 
 ```julia
-@chain DB.db_table(db, path_or_name) begin
+@chain DB.t(mtcars) begin
     DB.@group_by(cyl)
     DB.@summarize(across((starts_with("a"), ends_with("s")), (mean, sum)))
     DB.@collect

diff --git a/docs/examples/UserGuide/from_queryex.jl b/docs/examples/UserGuide/from_queryex.jl
@@ -1,4 +1,4 @@
-# While using TidierDB, you may need to generate part of a query and reuse it multiple times. `from_query()` enables a query portion to be reused multiple times as shown below.
+# While using TidierDB, you may need to generate part of a query and reuse it multiple times. `from_query()` or `t()` enable a query portion to be reused multiple times as shown below.
 
 # ```julia
 # import TidierDB as DB

diff --git a/docs/examples/UserGuide/getting_started.jl b/docs/examples/UserGuide/getting_started.jl
@@ -53,10 +53,10 @@
 # If you are working with a backend where compute cost is important, it will be important to minimize using `db_table` as this will requery for metadata each time. 
 # Compute costs are relevant to backends such as AWS, databricks and Snowflake. 
 
-# To do this, save the results of `db_table` and use them with `from_query`. Using `from_query` pulls the relevant information (metadata, con, etc) from the mutable SQLquery struct, allowing you to repeatedly query and collect the table without requerying for the metadata each time
+# To do this, save the results of `db_table` and use them with `t`. Using `t` pulls the relevant information (metadata, con, etc) from the mutable SQLquery struct, allowing you to repeatedly query and collect the table without requerying for the metadata each time
 # ```julia
 # table = DB.db_table(con, "path")
-# @chain DB.from_query(table) begin
+# @chain DB.t(table) begin
 #     ## data wrangling here 
 # end 
 # ```

diff --git a/docs/examples/UserGuide/ibis_comp.jl b/docs/examples/UserGuide/ibis_comp.jl
@@ -17,8 +17,6 @@
 # ```julia
 # using TidierDB
 # db = connect(duckdb())
-# # This next line is optional, but it will let us avoid writing `db_table` or `from_query` for each query
-# t(table) = from_query(table)
 # ```
 # Of note, TidierDB does not yet have an "interactive mode" so each example result will be collected.
 

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -39,9 +39,10 @@ TidierDB.jl currently supports:
 | **Helper Functions**             | `across`, `desc`, `if_else`, `case_when`, `n`, `starts_with`, `ends_with`, `contains`, `as_float`, `as_integer`, `as_string`, `is_missing`, `missing_if`, `replace_missing` |
 | **TidierStrings.jl Functions** | `str_detect`, `str_replace`, `str_replace_all`, `str_remove_all`, `str_remove`                                                                                               |
 | **TidierDates.jl Functions**   | `year`, `month`, `day`, `hour`, `min`, `second`, `floor_date`, `difftime`                                                                                                   |
-| **Aggregate Functions**          | `mean`, `minimum`, `maximum`, `std`, `sum`, `cumsum`, `cor`, `cov`, `var`, 
+| **Aggregate Functions**          | `mean`, `minimum`, `maximum`, `std`, `sum`, `cumsum`, `cor`, `cov`, `var`, all SQL aggregate
 
-`@summarize` supports any SQL aggregate function in addition to the list above. Simply write the function as written in SQL syntax and it will work.                                                                                                    |
+`@summarize` supports any SQL aggregate function in addition to the list above. Simply write the function as written in SQL syntax and it will work.
+`@mutate` supports nearly all functions built into SQL as well.                                                                                                          
 
 
 When using the DuckDB backend, if `db_table` recieves a file path ( `.parquet`, `.json`, `.csv`, `iceberg` or `delta`), it does not copy it into memory. This allows for queries on files too big for memory. `db_table` also supports S3 bucket locations via DuckDB.

diff --git a/src/TBD_macros.jl b/src/TBD_macros.jl
@@ -552,92 +552,6 @@ macro rename(sqlquery, renamings...)
     end
 end
 
-"""
-$docstring_window_order
-"""
-macro window_order(sqlquery, order_by_expr...)
-    order_by_expr = parse_interpolation2.(order_by_expr)
-
-    return quote
-        sq = $(esc(sqlquery))
-        if isa(sq, SQLQuery)
-            # Convert order_by_expr to SQL order by string
-            order_by_sql = join([expr_to_sql(expr, sq) for expr in $(esc(order_by_expr))], ", ")
-
-            # Update the orderBy field of the SQLQuery instance
-            sq.window_order = order_by_sql
-
-            # If this is the first operation after an aggregation, wrap current state in a CTE
-            if sq.post_aggregation
-                sq.post_aggregation = false
-                cte_name = "cte_" * string(sq.cte_count + 1)
-                cte_sql = "SELECT * FROM " * sq.from
-
-                if !isempty(sq.where)
-                    cte_sql *= " WHERE " * sq.where
-                end
-
-                new_cte = CTE(name=cte_name, select=cte_sql, from=sq.from)
-                push!(sq.ctes, new_cte)
-                sq.cte_count += 1
-
-                # Reset the from to reference the new CTE
-                sq.from = cte_name
-            end
-
-            # Note: Actual window functions would be applied in subsequent @mutate calls or similar,
-            # potentially using the orderBy set here for their OVER() clauses.
-        else
-            error("Expected sqlquery to be an instance of SQLQuery")
-        end
-        sq
-    end
-end
-
-"""
-$docstring_window_frame
-"""
-macro window_frame(sqlquery, frame_start::Int, frame_end::Int)
-    return quote
-        sq = $(esc(sqlquery))
-        if isa(sq, SQLQuery)
-            # Validate frame_start and frame_end
-            if $frame_end < $frame_start
-                error("frame_end must be greater than or equal to frame_start")
-            end
-
-            # Calculate absolute values for frame_start and frame_end
-            abs_frame_start = abs($frame_start)
-            abs_frame_end = abs($frame_end)
-
-            # Determine the direction and clause for frame_start
-            frame_start_clause = if $frame_start < 0
-                string(abs_frame_start, " PRECEDING")
-            else
-                string(abs_frame_start, " FOLLOWING")
-            end
-
-            # Determine the direction and clause for frame_end
-            frame_end_clause = if $frame_end < 0
-                string(abs_frame_end, " PRECEDING")
-            else
-                string(abs_frame_end, " FOLLOWING")
-            end
-
-            # Construct the window frame clause
-            frame_clause = string("ROWS BETWEEN ", frame_start_clause, " AND ", frame_end_clause)
-
-            # Update the windowFrame field of the SQLQuery instance
-            sq.windowFrame = frame_clause
-        else
-            error("Expected sqlquery to be an instance of SQLQuery")
-        end
-        sq
-    end
-end
-
-
-
 
 macro show_query(sqlquery)
     return quote

diff --git a/src/TidierDB.jl b/src/TidierDB.jl
@@ -18,7 +18,8 @@ using GZip
  @distinct, @left_join, @right_join, @inner_join, @count, @window_order, @window_frame, @show_query, @collect, @slice_max, 
  @slice_min, @slice_sample, @rename, copy_to, duckdb_open, duckdb_connect, @semi_join, @full_join, 
  @anti_join, connect, from_query, @interpolate, add_interp_parameter!, update_con,  @head, 
- clickhouse, duckdb, sqlite, mysql, mssql, postgres, athena, snowflake, gbq, oracle, databricks, SQLQuery, show_tables, t
+ clickhouse, duckdb, sqlite, mysql, mssql, postgres, athena, snowflake, gbq, oracle, databricks, SQLQuery, show_tables,
+ t
 
  abstract type SQLBackend end
 
@@ -58,7 +59,7 @@ include("parsing_oracle.jl")
 include("parsing_databricks.jl")
 include("joins_sq.jl")
 include("slices_sq.jl")
-
+include("windows.jl")
 
 
 
@@ -413,7 +414,7 @@ end
 function connect(::duckdb, token::String)
     if token == "md:" 
         return DBInterface.connect(DuckDB.DB, "md:")
-    elseif endswith(token, ".duckdb")
+    elseif endswith(token, ".duckdb") || endswith(token, ".duck.db")
         return DuckDB.DB(token)
     else
         return DBInterface.connect(DuckDB.DB, "md:$token")

diff --git a/src/docstrings.jl b/src/docstrings.jl
@@ -20,7 +20,9 @@ julia> db = connect(duckdb());
 
 julia> copy_to(db, df, "df_mem");
 
-julia> @chain db_table(db, :df_mem) begin
+julia> df_mem = db_table(db, :df_mem);
+
+julia> @chain t(df_mem) begin
          @select(groups:percent)
          @collect
        end
@@ -39,7 +41,7 @@ julia> @chain db_table(db, :df_mem) begin
    9 │ bb          4      0.9
   10 │ aa          5      1.0
 
-julia> @chain db_table(db, :df_mem) begin
+julia> @chain t(df_mem) begin
          @select(contains("e"))
          @collect
        end
@@ -925,20 +927,30 @@ julia> df = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9],
 julia> db = connect(duckdb());
 
 julia> copy_to(db, df, "df_mem");
+
+julia> @chain db_table(db, :df_mem) begin
+        @group_by groups
+        @window_frame(3)
+        @window_order(desc(percent))
+        @mutate(avg = mean(value))
+       #@show_query 
+       end;
 ```
 """
 
 const docstring_window_frame = 
 """
-    @window_frame(sql_query, frame_start::Int, frame_end::Int)
+    @window_frame(sql_query, args...)
 
 Define the window frame for window functions in a SQL query, specifying the range of rows to include in the calculation relative to the current row.
 
 # Arguments
-sql_query: The SQL query to operate on, expected to be an instance of SQLQuery.
-- `frame_start`: The starting point of the window frame. A positive value indicates the start after the current row (FOLLOWING), a negative value indicates before the current row (PRECEDING), and 0 indicates the current row.
-- `frame_end`: The ending point of the window frame. A positive value indicates the end after the current row (FOLLOWING), a negative value indicates before the current row (PRECEDING), and 0 indicates the current row.
-
+- `sqlquery::SQLQuery`: The SQLQuery instance to which the window frame will be applied.
+- `args...`: A variable number of arguments specifying the frame boundaries. These can be:
+    - `from`: The starting point of the frame. Can be a positive or negative integer, 0 or empty. When empty, it will use UNBOUNDED
+    - `to`: The ending point of the frame. Can be a positive or negative integer,  0 or empty. When empty, it will use UNBOUNDED
+    - if only one integer is provided without specifying `to` or `from` it will default to from, and to will be UNBOUNDED.
+    - if no arguments are given, both will be UNBOUNDED
 # Examples 
 ```jldoctest
 julia> df = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9], 
@@ -949,6 +961,36 @@ julia> df = DataFrame(id = [string('A' + i ÷ 26, 'A' + i % 26) for i in 0:9],
 julia> db = connect(duckdb());
 
 julia> copy_to(db, df, "df_mem");
+
+julia> df_mem = db_table(db, :df_mem);
+
+julia> @chain t(df_mem) begin
+        @group_by groups
+        @window_frame(3)
+        @mutate(avg = mean(percent))
+        #@show_query
+       end;
+
+julia> @chain t(df_mem) begin
+        @group_by groups
+        @window_frame(-3, 3)
+        @mutate(avg = mean(percent))
+        #@show_query
+       end;
+
+julia> @chain t(df_mem) begin
+        @group_by groups
+        @window_frame(to = -3)
+        @mutate(avg = mean(percent))
+        #@show_query
+       end;
+
+julia> @chain t(df_mem) begin
+        @group_by groups
+        @window_frame()
+        @mutate(avg = mean(percent))
+        #@show_query
+       end;
 ```
 """