Added deploy with modal. #1805

dat-a-man · 2024-09-13T11:00:25Z

Description

Added deploy with modal

netlify · 2024-09-13T11:00:40Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`e5d9a30`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/6708a44ecb306a000807e990

burnash · 2024-09-13T14:20:12Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+
+### Capturing deletes
+
+One limitation of our simple approach above is that it does not capture updates or deletions of data. This isn’t a hard requirement yet for our use cases, but it appears that `dlt` does have a [Postgres CDC replication feature](https://dlthub.com/docs/dlt-ecosystem/verified-sources/pg_replication) that we are considering.


Please use relative links for the pages in the docs. E.g. ./dlt-ecosystem/...

Thanks @burnash. Updated the link. There's one thing though, the doc is not showing in the GitHub deploy preview here. But when using "npm" locally it shows fine.

burnash

Very good content, @dat-a-man. I've added some suggestions to improve the style.

burnash · 2024-09-16T10:13:10Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+
+## Introduction to Modal
+
+[Modal](https://modal.com/blog/analytics-stack) is a serverless platform designed for developers. It allows you to run and deploy code in the cloud without managing infrastructure.


I think there the link from Modal should go to the https://modal.com/. I can see that the blog post is already linked from another section below.

Yes, thanks! Corrected.

burnash · 2024-09-16T10:14:28Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+
+## Building Data Pipelines with `dlt`
+
+**`dlt`** is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets. It does this through automatic schema inference and evolution. The library simplifies building data pipelines by providing functionality to support the entire extract and load process.


Suggested change

**`dlt`** is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets. It does this through automatic schema inference and evolution. The library simplifies building data pipelines by providing functionality to support the entire extract and load process.

dlt is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets. It does this through automatic schema inference and evolution. The library simplifies building data pipelines by providing functionality to support the entire extract and load process.

Let's tone down the formatting here

burnash · 2024-09-16T10:21:32Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+
+**`dlt`** is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets. It does this through automatic schema inference and evolution. The library simplifies building data pipelines by providing functionality to support the entire extract and load process.
+
+### How does `dlt` integrate with Modal for pipeline orchestration?


Suggested change

### How does `dlt` integrate with Modal for pipeline orchestration?

### How does dlt integrate with Modal for pipeline orchestration?

Through the docs, please use plain "dlt" (no backticks) when referring the dlt as a project. Use backticks only when referring to dlt as a code (e.g. dlt the Python module in the script or dlt the command in context of command line)

Done, thanks!

burnash · 2024-09-16T10:23:01Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+
+To know more, please refer to [Modals's documentation.](https://modal.com/docs)
+
+## Building Data Pipelines with `dlt`


Suggested change

## Building Data Pipelines with `dlt`

## Building data pipelines with dlt

Through the docs please use the sentence case capitalization

Noted, Thanks!

burnash · 2024-09-16T10:36:23Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+
+### How does `dlt` integrate with Modal for pipeline orchestration?
+
+To illustrate setting up a pipeline in Modal, we’ll be using the following example: [Building a cost-effective analytics stack with Modal, dlt, and dbt.](https://modal.com/blog/analytics-stack)


Suggested change

To illustrate setting up a pipeline in Modal, we’ll be using the following example: [Building a cost-effective analytics stack with Modal, dlt, and dbt.](https://modal.com/blog/analytics-stack)

As an example of how to set up a pipeline in Modal, we'll use the [building a cost-effective analytics stack with Modal, dlt, and dbt.](https://modal.com/blog/analytics-stack) case study.

Done, thanks!

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

…odal.md

burnash · 2024-09-18T16:16:30Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+
+Here’s our `dlt` setup copying data from our Postgres read replica into Snowflake:
+
+1. Run the `dlt` SQL database setup to initialize their `sql_database_pipeline.py` template:


Suggested change

1. Run the `dlt` SQL database setup to initialize their `sql_database_pipeline.py` template:

1. Run the `dlt init` CLI command to initialize the SQL database source and setup the `sql_database_pipeline.py` template:

Thank you! However, I don't see these changes on GitHub. Is there a chance you haven't pushed the updates to GitHub?

burnash · 2024-09-18T16:17:05Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+
+## How to run dlt on Modal
+
+Here’s our `dlt` setup copying data from our Postgres read replica into Snowflake:


Suggested change

Here’s our `dlt` setup copying data from our Postgres read replica into Snowflake:

Here’s a dlt project setup to copy data from our Postgres read replica into Snowflake:

burnash · 2024-09-18T16:17:59Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+
+As an example of how to set up a pipeline in Modal, we'll use the [building a cost-effective analytics stack with Modal, dlt, and dbt.](https://modal.com/blog/analytics-stack) case study.
+
+The example demonstrates automating a workflow to load data from Postgres to Snowflake using `dlt`.


Suggested change

The example demonstrates automating a workflow to load data from Postgres to Snowflake using `dlt`.

The example demonstrates automating a workflow to load data from Postgres to Snowflake using dlt.

burnash

Hi @dat-a-man thanks for the updates, please see my review comments

burnash · 2024-09-18T16:19:51Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+
+Here’s our `dlt` setup copying data from our Postgres read replica into Snowflake:
+
+1. Run the `dlt` SQL database setup to initialize their `sql_database_pipeline.py` template:


It's also not clear what do we do with sql_database_pipeline.py? Are we discarding it? Or we're adding the code below to sql_database_pipeline.py?

burnash · 2024-09-18T16:21:44Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+   app = modal.App("dlt-postgres-pipeline", image=image)
+   ```
+
+3. Wrap the provided `load_table_from_database` with the Modal Function decorator, Modal Secrets containing your database credentials, and a daily cron schedule


If we take load_table_from_database from sql_database_pipeline.py we should note that. Otherwise it may be unclear.

added the context

burnash · 2024-09-18T16:23:13Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+       pass
+   ```
+
+4. Write your `dlt` pipeline:


Where the user should put the code from this section? Is it still goes to sql_database_pipeline.py?

It goes to sql_database_pipeline.py

burnash · 2024-09-18T16:34:21Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+4. Write your `dlt` pipeline:
+   ```py
+   # Modal Secrets are loaded as environment variables which are used here to create the SQLALchemy connection string
+   pg_url = f'postgresql://{os.environ["PGUSER"]}:{os.environ["PGPASSWORD"]}@localhost:{os.environ["PGPORT"]}/{os.environ["PGDATABASE"]}'


Use the dlt-native way to configure connection with environment variables: https://dlthub.com/docs/devel/general-usage/credentials/setup#environment-variables that should eliminate the need of manual connection string construction and usage of ConnectionStringCredentials

I added a note about this in step 3; I tested it, too, and it worked for source creds.

hey original author here :). are you saying it's better practice to define the sql connection string as a single env variable and then reassign the env variable in the pipeline? e.g.

Set a Modal secret like POSTGRES_CREDENTIAL_STRING = 'postgresql://sdfsd:sdlfkj' (this gets mounted as an env variable)

In the pipeline, call os.environ["TASK_SOURCES__SQL_DATABASE__CREDENTIALS"] = os.environ["POSTGRES_CREDENTIAL_STRING"]?

hey @kning! I would say it's a matter of taste, if you prefer string connection, use it, if not, don't, dlt supports both. In this example, I think, Anton wants to reduce the amount of code and unnecessary manipulations. For example, in this case you can avoid this

credentials = ConnectionStringCredentials(pg_url)
and
destination=dlt.destinations.snowflake(snowflake_url),

burnash · 2024-09-18T16:42:01Z

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md

+   info = pipeline.run(source_1, write_disposition="merge")
+   print(info)
+   ```
+


Looks like the next step is missing: how this code ends up on Modal? How to trigger runs?

added step 5

This runs the pipeline once, but might be worth adding that you need to run modal deploy to actually schedule the pipelin.

…lt-hub/dlt into docs/how-to-deploy-using-modal

kning · 2024-09-30T18:51:48Z

the more i think about it actually, maybe it makes sense to write a really pared down example for this space that is runnable end-to-end for the user (e.g. using duckdb) and linking out to our blog post for a "real-world example". happy to help contribute a pared down example

kning · 2024-09-30T21:48:33Z

here's a simpler gist that should just work if you run modal run dlt_example.py and will deploy a daily scheduled job with modal deploy dlt_example.py.

i think this section will have better engagement if the user can simply copy-paste a script and it works immediately; we can adapt this to your docs style and perhaps just link out to the original blog post as a more detailed, real-world example of dlt (i also need to update that one to be compatible with 1.1.0).

lmk what you think! also happy to chat i know ive shared a lot of info here haha.

https://gist.github.com/kning/6a2af9e08ebaad0e486968f98c1939be

AstrakhantsevaAA · 2024-10-03T11:27:20Z

@kning hey! Thanks for your thoughts here, your idea with testing is great! We actually practice this, you can find here some getting started snippets that we test on every CI/CD run. We can also add your gist to our testing process, we just need to understand what we call a successful run, we can run this command modal run dlt_example.py on every CI/CD run and stop it immediately if it ran without errors, is that enough? or should it be deployed as well?

kning · 2024-10-03T14:58:43Z

running modal run dlt_example.py should be sufficient, but you'd also need to set up a Modal account and set the MODAL_TOKEN_ID and MODAL_TOKEN_SECRET variables in your CI environment.

also checked out the snippets, are those ever surfaced in the docs? i guess i'd expect it to be synced with the snippets on this page but it looks different.

AstrakhantsevaAA · 2024-10-03T15:05:50Z

@kning

MODAL_TOKEN_ID and MODAL_TOKEN_SECRET variables in your CI environment.

It shouldn't be a problem.

i guess i'd expect it to be synced with the snippets on this page but it looks different.

We changed our docs significantly recently, and getting started page was removed and replaced with intro. Relevant example you can find here: doc and snippets

kning · 2024-10-03T15:16:13Z

i see. how do you think we should move forward then with the modal snippet? ideally i'd like to see a "deploy with modal" page that explains how to create a modal account and the runnable code snippet (which should also regularly be run somehow to ensure that it's correct) and finally a link to the blog post for a "real-world" example. but i guess from what i understand it seems that code in the docs page and CI/CD snippets are managed separately?

AstrakhantsevaAA · 2024-10-08T14:01:23Z

@dat-a-man will do that, he will create a snippet file with example and use tags to ingest this snippet into doc page, here we will run modal command to test it

kning · 2024-10-09T00:23:22Z

amazing looks way cleaner now, thanks!

…lt-hub/dlt into docs/how-to-deploy-using-modal

AstrakhantsevaAA · 2024-10-09T14:08:20Z

Makefile

@@ -65,6 +65,7 @@ lint-and-test-snippets:
 	poetry run mypy --config-file mypy.ini docs/website docs/examples docs/tools --exclude docs/tools/lint_setup --exclude docs/website/docs_processed
 	poetry run flake8 --max-line-length=200 docs/website docs/examples docs/tools
 	cd docs/website/docs && poetry run pytest --ignore=node_modules
+	modal run docs/walkthroughs/deploy-a-pipeline/deploy-with-modal-snippets.py


Suggested change

modal run docs/walkthroughs/deploy-a-pipeline/deploy-with-modal-snippets.py

modal run docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal-snippets.py

…-]+'

…sing-modal

Added deploy with modal.

3d327f2

dat-a-man self-assigned this Sep 13, 2024

A few minor fixes

8a49dce

dat-a-man added the documentation Improvements or additions to documentation label Sep 13, 2024

burnash reviewed Sep 13, 2024

View reviewed changes

dat-a-man force-pushed the docs/how-to-deploy-using-modal branch 2 times, most recently from 0333c54 to 8a49dce Compare September 16, 2024 08:27

dat-a-man assigned adrianbr and unassigned adrianbr Sep 16, 2024

dat-a-man requested a review from adrianbr September 16, 2024 09:30

updated links as per comment

87a1045

dat-a-man requested a review from burnash September 16, 2024 09:34

burnash requested changes Sep 16, 2024

View reviewed changes

Updated as per the comments.

e1c8cbd

dat-a-man requested a review from burnash September 16, 2024 11:08

burnash reviewed Sep 16, 2024

View reviewed changes

docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal.md Outdated Show resolved Hide resolved

Update docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-m…

6bca836

…odal.md

rudolfix force-pushed the devel branch 2 times, most recently from 2ee3eab to e48f641 Compare September 16, 2024 13:20

sh-rp force-pushed the devel branch from ec730e8 to fcc4c45 Compare September 17, 2024 10:04

dat-a-man requested a review from burnash September 18, 2024 04:05

burnash reviewed Sep 18, 2024

View reviewed changes

burnash requested changes Sep 18, 2024

View reviewed changes

dat-a-man added 4 commits September 23, 2024 06:10

Updated

9da1b27

Merge branch 'docs/how-to-deploy-using-modal' of https://github.com/d…

1ad1ee9

…lt-hub/dlt into docs/how-to-deploy-using-modal

Updated as per comments

48674f3

Updated

ba8cdbe

dat-a-man requested a review from AstrakhantsevaAA October 3, 2024 09:29

dat-a-man added 2 commits October 4, 2024 11:36

Merge branch 'devel' into docs/how-to-deploy-using-modal

177ac20

Incorporated comments and new script provided.

fd225f9

dat-a-man added 9 commits October 9, 2024 04:54

Added the snippets

ebbd06e

Updated

2d27c3f

Updated

c783772

Merge branch 'devel' into docs/how-to-deploy-using-modal

a87d1b5

updated poetry.lock

13bcf7d

Merge branch 'docs/how-to-deploy-using-modal' of https://github.com/d…

e397ff6

…lt-hub/dlt into docs/how-to-deploy-using-modal

Updated "poetry.lock"

cc8d5ae

Added "__init__.py"

71efdaa

Updated snippets.py

cf1a092

AstrakhantsevaAA reviewed Oct 9, 2024

View reviewed changes

dat-a-man and others added 9 commits October 9, 2024 14:24

Updated path in MAKEFILE

de31e7d

Added __init__.py in walkthroughs

cd04716

Adjusted for black

1a2d744

Modified mypy.ini added a pattern module_name_pattern = '[a-zA-Z0-9_\…

d1fbc18

…-]+'

updated

71ab82f

renamed deploy-a-pipeline with deploy_a_pipeline

9d0c70b

Merge remote-tracking branch 'origin/devel' into docs/how-to-deploy-u…

4209656

…sing-modal

Updated for errors in linting

1841007

small changes

e5d9a30


		### Capturing deletes

		One limitation of our simple approach above is that it does not capture updates or deletions of data. This isn’t a hard requirement yet for our use cases, but it appears that `dlt` does have a [Postgres CDC replication feature](https://dlthub.com/docs/dlt-ecosystem/verified-sources/pg_replication) that we are considering.


		## Introduction to Modal

		[Modal](https://modal.com/blog/analytics-stack) is a serverless platform designed for developers. It allows you to run and deploy code in the cloud without managing infrastructure.


		## Building Data Pipelines with `dlt`

		`dlt` is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets. It does this through automatic schema inference and evolution. The library simplifies building data pipelines by providing functionality to support the entire extract and load process.

	`dlt` is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets. It does this through automatic schema inference and evolution. The library simplifies building data pipelines by providing functionality to support the entire extract and load process.
	dlt is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets. It does this through automatic schema inference and evolution. The library simplifies building data pipelines by providing functionality to support the entire extract and load process.


		`dlt` is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets. It does this through automatic schema inference and evolution. The library simplifies building data pipelines by providing functionality to support the entire extract and load process.

		### How does `dlt` integrate with Modal for pipeline orchestration?

	### How does `dlt` integrate with Modal for pipeline orchestration?
	### How does dlt integrate with Modal for pipeline orchestration?


		To know more, please refer to [Modals's documentation.](https://modal.com/docs)

		## Building Data Pipelines with `dlt`

	## Building Data Pipelines with `dlt`
	## Building data pipelines with dlt


		### How does `dlt` integrate with Modal for pipeline orchestration?

		To illustrate setting up a pipeline in Modal, we’ll be using the following example: [Building a cost-effective analytics stack with Modal, dlt, and dbt.](https://modal.com/blog/analytics-stack)

	To illustrate setting up a pipeline in Modal, we’ll be using the following example: [Building a cost-effective analytics stack with Modal, dlt, and dbt.](https://modal.com/blog/analytics-stack)
	As an example of how to set up a pipeline in Modal, we'll use the [building a cost-effective analytics stack with Modal, dlt, and dbt.](https://modal.com/blog/analytics-stack) case study.


		Here’s our `dlt` setup copying data from our Postgres read replica into Snowflake:

		1. Run the `dlt` SQL database setup to initialize their `sql_database_pipeline.py` template:

	1. Run the `dlt` SQL database setup to initialize their `sql_database_pipeline.py` template:
	1. Run the `dlt init` CLI command to initialize the SQL database source and setup the `sql_database_pipeline.py` template:


		## How to run dlt on Modal

		Here’s our `dlt` setup copying data from our Postgres read replica into Snowflake:

	Here’s our `dlt` setup copying data from our Postgres read replica into Snowflake:
	Here’s a dlt project setup to copy data from our Postgres read replica into Snowflake:


		As an example of how to set up a pipeline in Modal, we'll use the [building a cost-effective analytics stack with Modal, dlt, and dbt.](https://modal.com/blog/analytics-stack) case study.

		The example demonstrates automating a workflow to load data from Postgres to Snowflake using `dlt`.

	modal run docs/walkthroughs/deploy-a-pipeline/deploy-with-modal-snippets.py
	modal run docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-modal-snippets.py

Added deploy with modal. #1805

Are you sure you want to change the base?

Added deploy with modal. #1805

Conversation

dat-a-man commented Sep 13, 2024

Description

netlify bot commented Sep 13, 2024 • edited Loading

✅ Deploy Preview for dlt-hub-docs canceled.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burnash left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dat-a-man Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burnash Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burnash left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AstrakhantsevaAA Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kning commented Sep 30, 2024

kning commented Sep 30, 2024

AstrakhantsevaAA commented Oct 3, 2024

kning commented Oct 3, 2024

AstrakhantsevaAA commented Oct 3, 2024

kning commented Oct 3, 2024

AstrakhantsevaAA commented Oct 8, 2024

kning commented Oct 9, 2024

Choose a reason for hiding this comment

netlify bot commented Sep 13, 2024 •

edited

Loading

dat-a-man Sep 16, 2024 •

edited

Loading

burnash Sep 18, 2024 •

edited

Loading

AstrakhantsevaAA Oct 3, 2024 •

edited

Loading