SQL++ to NRAEnv Translation #56

joshuaauerbachwatson · 2017-07-31T22:38:30Z

This is a sub-issue of issue #10.

I am using it to record semantic issues that come up in writing the sqlpp_to_nraenv_top function and its subfunctions.

The text was updated successfully, but these errors were encountered:

joshuaauerbachwatson · 2017-07-31T23:43:15Z

We current define sqlpp as list sqlpp_expr but this is actually counterproductive. It is modeled on what we did in SQL, where there are additional statement types besides queries ("create view" and "drop view"). Note that in SQLtoNRAenv there is the expectation that there can be multiple statements, but (generally) only one query statement. If there is more than one, only the last affects the result. The sexp translation for SQL also converts 'with' clauses into separate 'create view' statements.

Note that there is no create view statement in SQL++ but there is a much richer repertoire of 'with' and 'let' clauses that can bind variables to collections (or to anything, actually) with varied scopes.

Issue #56

joshuaauerbachwatson · 2017-08-02T15:44:28Z

I've simplified by redefining sqlpp as sqlpp_expr rather than a list. This has been drilled up through the front-end logic. Still no actual contents in the NRAenv translator.

joshuaauerbachwatson · 2017-08-15T22:53:21Z

Some of the more obvious clauses in the SQLPPtoNRAEnv are now populated. The biggest issue (select) is still pending. Also, even in the case of simpler expressions there are many open issues, indicated as TODOs in the code.

The following is not prioritized, just listing the issues.

SQL++ has two distinct values for unknown data: null (generally occurring when a JSON object has an explicit null value) and unknown (generally occurring when a field of a JSON object is not present but is referenced). In our data model, we can represent null but we don't have a distinct value for unknown. Also, we don't contemplate that dunit can take on any type, the way null and unknown can in SQL++. For object references in the past we have used optional types to model things that can be null. Nothing precludes using sum types more generally to model things that can have two different kinds of absence as well as being present with a value. However, this will complicate translation in a fairly pervasive way.
SQL++ has both bags and (ordered) lists but our data model has only bags. We actually use Coq lists to implement bags and I think in some places we cheat and take advantage of order. But, to be correct, we really need two distinct first-class collection kinds.
SQL++ has a first-class exponentiation operator. We have no equivalent base operator but we do have exponentiation in our floating point model part. This is probably not a serious issue but we do have to assume that floating point is present in order to translate SQL++ and we have to do whatever is needed to coerce to and from floating point in order to implement this operator.
The like operator in SQL++ is a binary operator between expressions; that is, the pattern does not have to be a literal but can be computed. Our internal like operator is unary and takes the pattern as a (string) parameter. We probably need a binary like operator either in the base or as a foreign operator.
SQL++ has a (JSON) object constructor in which both field names and field values are expressions. Our internal record type (the obvious translation target) requires static field names.

More generally, we probably won't fix all of these mismatches right away or even ever and so we need a clear statement of what features of SQL++ we don't support. Ideally, these usages will be filtered out with diagnostics in the front-end so the user doesn't get too confused.

Issue #56

joshuaauerbachwatson · 2017-08-27T14:20:16Z

We now have translations for many expressions. Still not implemented but planned:

quantified expressions (deferred until after query is done, since a lot of the mechanism of 'select where' can be reused)
query (the biggest part, about to be tackled)
exponent operator
indexing a list via a position
'index any'

I distinguish normal indexing from 'index any' because normal indexing will require solving the lists vs bags issue but 'index any' could be implemented without doing that. The semantics are a random selection from a collection (so order doesn't actually matter). On the other hand, 'index any' requires getting some form of randomness into Coq. I think it would be adequate to have a (not very good) pseudo-random function, which could be implemented in Coq, with the seed being a single global input to the computation that is then passed around and updated. A pain threading it through but otherwise straightforward.

Also, among things that are already translated, the following restrictions:

the second operator of 'like' and all field names in object constructions must be string literals. This is screened in the front end and the restriction is reflected in the Coq AST for SQL++
bag constructors and array constructors are translated identically, producing what are bags in our model (related to the absence of indexing)
equal and fuzzy equal are both translated as equal (I have not found a precise semantic definition of what fuzzy equal even means)
null, missing, and unknown are all translated as null and we do not use sum types to make comparisons type-correct, so (e.g.) a number might be compared with null directly
we are assuming variable references can simply be dot operators on ID, which means that all the 'let' and 'with' expressions in queries must concat bindings to ID and we are just assuming the scoping will all work (and are not using the environment as part of translation)

I would say the two largest issues requiring contributions from others are

distinguishing bags from lists
distinguishing null from missing and deciding how pervasively to use sum types in the translation

If the bag vs list distinction remains unaddressed for a while, I might consider ruling out normal indexing in the front end; I don't think we can rule out the 'array' constructor because it is probably used a lot rather than the bag constructor and often user's don't really care about the order. I think we just need to document what semantics we support.

If we are unable to address null vs missing and use of sum types for a while, we might also impose a restriction that rules out all forms of missing data (including null) but that will be perceived as a huge restriction eliminating some of the advantages of SQL++ for semi-structured JSON data.

joshuaauerbachwatson · 2017-09-06T18:06:31Z

Previous enumerations of progress and issues have left out the interesting case of function calls. The design I'm assuming is the following:

Function names are actually a fixed set governed by an "opt-in" list. Note that all user defined functions are illegal.
Legal function names and also correct number of arguments are known to the Java frontend and checked there so illegal function calls do not reach the Coq AST
In the Coq AST, this set is an enumeration (so function names are no longer strings). Pattern matching should be tighter and we are also guaranteed that there are the correct number of arguments (more subtle errors, such as type errors in the arguments, can still cause late failures).

I am partly along implementing this design. Functions that initially populate this set are

those specified directly in the SQL++ spec (the unrestricted aggregation functions of SQL++ such as coll_avg and array_avg and the SQL-92 compatible restricted aggregation functions such as avg which are provided as sugaring macros in SQL++).
those need to implement times and dates such as get_year
others used in test cases that seem like we ought to support them like substring

jeromesimeon · 2017-09-08T17:17:40Z

It would be nice to have a well-written mechanized semantics for SQL++. This would go hand-in-hand with a proof of semantic preservation for this translation (See #74).

Issue #56

jeromesimeon · 2017-09-25T18:50:14Z

I've just noticed the following Coq warning:

File "./coq/Translation/SQLPPtoNRAEnv.v", line 40, characters 0-6785:
Warning: Not a fully mutually defined
fixpoint
(sqlpp_to_nraenv depends on sqlpp_select_statement_to_nraenv but not
conversely).
Well-foundedness check may fail unexpectedly.
 [non-full-mutual,fixpoints]

It indicates that sqlpp_select_statement_to_nraenv could be a separate function. This may be a temporary thing, due to the current state of the translation, but I thought I would record it.

joshuaauerbachwatson · 2017-09-25T22:07:28Z

Commenting on the previous comment from @jeromesimeon: the sqlpp_select_statement_to_nraenv function is currently a highly incomplete stub but it is intended ultimately to be mutually recursive with sqlpp_to_nraenv (it will need to be). So, yes, it is a temporary thing. If the warning message is judged to be annoying, I could comment out the function entirely and stub out the call to it.

joshuaauerbachwatson self-assigned this Jul 31, 2017

joshuaauerbachwatson added a commit that referenced this issue Aug 2, 2017

The top level production for SQL++ is now a query, not a list.

2510c89

Issue #56

joshuaauerbachwatson added a commit that referenced this issue Aug 15, 2017

Incremental progress on issue #56 (SQLPP->NRAEnv).

66588e8

joshuaauerbachwatson added a commit that referenced this issue Aug 21, 2017

Incremental progress on SQLPPtoNRAenv (issue #56)

6f4148e

joshuaauerbachwatson added a commit that referenced this issue Aug 23, 2017

Remove dangling functions and add comment after Avi's improvements

1d6e077

Issue #56

jeromesimeon added the feature label Aug 26, 2017

joshuaauerbachwatson added a commit that referenced this issue Aug 27, 2017

Incremental progress on SQLPP->NRAEnv. See issue #56

f824647

joshuaauerbachwatson added a commit that referenced this issue Aug 27, 2017

Finish implementing string literal restrictions. See issue #56

53c5035

joshuaauerbachwatson added a commit that referenced this issue Sep 11, 2017

Make function names an enumeration, not strings. Enforce in front-end.

6e34ad5

Issue #56

jeromesimeon pushed a commit that referenced this issue Sep 12, 2017

Make function names an enumeration, not strings. Enforce in front-end.

583c434

Issue #56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL++ to NRAEnv Translation #56

SQL++ to NRAEnv Translation #56

joshuaauerbachwatson commented Jul 31, 2017

joshuaauerbachwatson commented Jul 31, 2017

joshuaauerbachwatson commented Aug 2, 2017

joshuaauerbachwatson commented Aug 15, 2017

joshuaauerbachwatson commented Aug 27, 2017

joshuaauerbachwatson commented Sep 6, 2017

jeromesimeon commented Sep 8, 2017 •

edited

Loading

jeromesimeon commented Sep 25, 2017

joshuaauerbachwatson commented Sep 25, 2017

SQL++ to NRAEnv Translation #56

SQL++ to NRAEnv Translation #56

Comments

joshuaauerbachwatson commented Jul 31, 2017

joshuaauerbachwatson commented Jul 31, 2017

joshuaauerbachwatson commented Aug 2, 2017

joshuaauerbachwatson commented Aug 15, 2017

joshuaauerbachwatson commented Aug 27, 2017

joshuaauerbachwatson commented Sep 6, 2017

jeromesimeon commented Sep 8, 2017 • edited Loading

jeromesimeon commented Sep 25, 2017

joshuaauerbachwatson commented Sep 25, 2017

jeromesimeon commented Sep 8, 2017 •

edited

Loading