Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider non-SQL MPP #11

Open
maxheld83 opened this issue Apr 6, 2021 · 0 comments
Open

consider non-SQL MPP #11

maxheld83 opened this issue Apr 6, 2021 · 0 comments

Comments

@maxheld83
Copy link
Contributor

maxheld83 commented Apr 6, 2021

aside from SQL, we sometimes have more involved analyses, which we'd typically run in R.
For example, we might have some complicated regex for license info or some such thing already coded in R.
(This is not a great example, because it could perhaps be done in just SQL and custom functions in BigQuery, but still).

For these expensive, non-SQL analyses we need an MPP solution, ideally tightly integrated with our data warehouse.

We might have several MPP needs:

  • "native" spark (without any additional R pkgs)
  • distributed R with spark_apply() (though this does not use containers and may make dependency management iffy again)
  • MPP with custom containers (such as muggle)
    • spark on k8? (sounds very complicated)
    • just poll a plumber API?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant