Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding QueryType to AccessControlContext #23663

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RohanSidhu
Copy link

@RohanSidhu RohanSidhu commented Sep 16, 2024

Description

Adding a QueryType field into the AccessControlContext object to allow for this information to be available when making access control related decisions.

Motivation and Context

The addition of QueryType allows for downstream function calls to access this context which can be used to alter decision making if needed.

Impact

Minor performance enhancement as rebuilding Session is no longer needed. If additional context needs to be added into Session, we can utilize SessionBuilder to add additional/edit material.

Test Plan

Tested code change by running sample queries and followed the call stack to ensure queryType was present.
Screenshot 2024-09-16 at 13 10 19
Also verified the context was present in the AccessControlContext through unit tests under AbstractTestQueries.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

Copy link

linux-foundation-easycla bot commented Sep 16, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Contributor

@ghelmling ghelmling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test coverage for DispatchManager.createQuery(Internal) to validate that the session is getting populated correctly? I think you could plug in a stubbed out DispatchQueryFactory to validate the Session instance passed to createDispatchQuery().

presto-main/src/main/java/com/facebook/presto/Session.java Outdated Show resolved Hide resolved
@@ -303,6 +306,11 @@ private <C> void createQueryInternal(QueryId queryId, String slug, int retryCoun
Optional.ofNullable(sessionContext.getSchema()),
sessionContext.getIdentity().getPrincipal().map(Principal::getName)));

session = sessionBuilder.build();
if (sessionContext.getTransactionId().isPresent()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference here in that previously, SessionSupplier.createSession() was calling beginTransaction up at line 285. I don't know if that makes a difference.

Copy link
Author

@RohanSidhu RohanSidhu Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the tasks that happen within beginTransactionId and I don't believe there is any dependency between what happens in there and the work done to create preparedQuery. This implies that the order does not matter on if we prepare the query or if we begin the transaction. From testing, I also found no impact in performance or functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we starting a transaction here ?

Copy link
Author

@RohanSidhu RohanSidhu Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transaction was never started after I refactored the code to use SessionBuilder instead of Session. In the original construction of Session, the transaction was started once Session was built. Since we need queryType in the session, I moved the transaction piece out so we could utilize SessionBuilder to add the necessary field. Since the transaction hasn't started, I moved it out and placed it as early as I could while including the addition of queryType.

Copy link
Contributor

@ghelmling ghelmling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed another bit in DispatchManager where we should consolidate the Session creation. Sorry for the add-on, but would be good to clean up some of the duplicate creation that existed.

@@ -131,7 +133,8 @@ public Session(
Map<SqlFunctionId, SqlInvokedFunction> sessionFunctions,
Optional<Tracer> tracer,
WarningCollector warningCollector,
RuntimeStats runtimeStats)
RuntimeStats runtimeStats,
Optional<QueryType> queryType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this optional ? can a session have this empty ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there is an instance where queryType can be empty. If my understanding is correct, all queries that make it to DispatchManager will have some enum of QueryType. However, as a precaution, I wrapped QueryType in an optional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so then lets have it as NOT optional. And fix any issues that my arise.

Copy link
Author

@RohanSidhu RohanSidhu Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the changes, I realized that there are some areas where I put an empty optional instead of null.
Example
Since the session ctor requires all fields to be non-null, I believe it may be the design appropriate choice to keep the queryType as an Optional.

Copy link
Contributor

@NikhilCollooru NikhilCollooru Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not adding QueryType to SessionRepresentation as well ? Then we need NOT send Optional.empty() or null right ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the change in SessionRepresentation and other places, there is another issue if we don't use an Optional. queryType is not defined when we build session. We pass in session in a few places before we even have queryType. So if we want to maintain the non-null rule in session ctor, I believe using an Optional is the best bet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little confusing but I am more than happy to discuss the thought process!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NikhilCollooru the challenge here is the Session usage in DispatchManager#createQueryInternal:

  1. Session is instantiated and passed to createAnalyzerOptions()
  2. Same session is passed to getAnalyzerType()
  3. The SQL string is only parsed on the following line in QueryPreparer#prepareQuery

So currently session is being used before we know what the QueryType actually is. Now the Session usage in (1) and (2) is only to extract system property values, so we could try to separate that concern from the Session representation itself, pulling out SystemPropertyManager and the Map<String, String> of system properties. But that looks like a big refactoring as the use of Session as an interface to system properties seems pretty deeply embedded.

Any thoughts on how we can better clean this up? The current use of Session in DispatchManager#createQueryInternal is a bit wonky to begin with as Session is first created from SessionSupplier (not the builder for some reason), then recreated by SessionPropertyDefaults#newSessionQithDefaultProperties. I don't really understand why the SessionPropertyDefaults part happens after the analyzer and query preparer setup.

@@ -303,6 +306,11 @@ private <C> void createQueryInternal(QueryId queryId, String slug, int retryCoun
Optional.ofNullable(sessionContext.getSchema()),
sessionContext.getIdentity().getPrincipal().map(Principal::getName)));

session = sessionBuilder.build();
if (sessionContext.getTransactionId().isPresent()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we starting a transaction here ?

query = preparedQuery.getFormattedQuery().orElse(query);

// select resource group
Optional<QueryType> queryType = preparedQuery.getQueryType();
sessionBuilder.setQueryType(queryType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why make all these changes instead when you can just do
session = Session.builder(session).setQueryType(queryType).build();

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of this line. My change makes it so we still have access to the SessionBuilder object before we begin the transaction. After we start the transaction, we can't use the ctor that passes in Session. I could use the other ctor but would need to refill in all the Session fields.

@NikhilCollooru
Copy link
Contributor

I took a closer look and the changes look okay to me.

  1. please squash all commits into one commit
  2. Please click "Please click here to be authorized" to get CLA authorized
  3. Please build a custom presto version and run verifier tests to confirm the changes run fine for a bigger set of queries

@RohanSidhu
Copy link
Author

I have completed the verifier tests and did not find any significant failures. I ran over 10000 queries and only 12 have failed. Reasons for their failures are environment errors (i.e. com.facebook.presto.ExceededMemoryLimitException). If there is anything else that I needed or proof for the test, please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants