Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracker] Findings of demos #15

Open
sbernauer opened this issue Oct 3, 2022 · 0 comments
Open

[Tracker] Findings of demos #15

sbernauer opened this issue Oct 3, 2022 · 0 comments

Comments

@sbernauer sbernauer changed the title [Tracker] Findings of nifi-kafka-druid-water-level-data demo [Tracker] Findings of demos Oct 12, 2022
bors bot referenced this issue in stackabletech/stackablectl Nov 3, 2022
## Description

Needs a larger k8s cluster! I use IONOS k8s with 9x 4 cores (8 threads), 20GB ram and 30GB hdd disk
Maybe we can also offer a smaller variant later on.

Otherwise business as usual. From feature-branch run `stackablectl --additional-stacks-file stacks/stacks-v1.yaml --additional-releases-file releases.yaml --additional-demos-file demos/demos-v1.yaml demo install data-warehouse-iceberg-trino-spark`

I'm not happy with some parts but i think an iterative approach is best:
* Shared bikes are currently not streamed into Kafka (instead on-time job)
* Some high-volume real-time datasource would be great. Currently we use the water levels and duplicate them to get higher volumes.
* Some sort of Upsert or Deletion usecases would be great. But probably not on the large datasets as for our wallet ^^
* Better Dashboards. The current one were thrown together quickly
* I would like to partitions the water_level measurements by day but run into apache/iceberg#5625. There might be ways around by using a dedicated Spark context for compaction but we can easily adopt the partitioning after the issue gets fixed. Sorting during rewrites did cost performance during compaction and did not provide real benefits for my intial measurements. Disabled for now.
* As always tracked my findings in https://github.com/stackabletech/stackablectl/issues/128

To get to the Spark UI `kubectl port-forward $(kubectl get pod -o name | grep 'spark-ingest-into-warehouse-.*-driver') 4040`
@fhennig fhennig transferred this issue from stackabletech/stackablectl Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant