Skip to content

Commit

Permalink
Docs: Glue Spark Pipeline Lineage
Browse files Browse the repository at this point in the history
  • Loading branch information
ulixius9 committed Oct 17, 2024
1 parent c6586d3 commit e318316
Show file tree
Hide file tree
Showing 6 changed files with 73 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -343,3 +343,40 @@ spark.openmetadata.transport.timeout 30
```

After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata.


## Using Spark Agent with Glue

Follow the below steps in order to use OpenMetadata Spark Agent with glue.

### 1. Specify the OpenMetadata Spark Agent JAR URL

1. Upload the OpenMetadata Spark Agent Jar to S3
2. Navigate to the glue job,In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path
3. Add the S3 url of OpenMetadata Spark Agent Jar in the Dependent Jars path.

{% image
src="/images/v1.5/connectors/spark/glue-job-jar.png"
alt="Glue Job Configure Jar"
caption="Glue Job Configure Jar"
/%}


### 2. Add Spark configuration in Job Parameters

In the same Job details tab, add a new property under Job parameters:

1. Add the `--conf` property with following value, make sure to customize this configuration as described in the above documentation.

```
spark.extraListeners=org.openmetadata.spark.agent.OpenMetadataSparkListener --conf spark.openmetadata.transport.hostPort=https://your-org.host:port --conf spark.openmetadata.transport.type=openmetadata --conf spark.openmetadata.transport.jwtToken=<jwt-token> --conf spark.openmetadata.transport.pipelineServiceName=glue_spark_pipeline_service --conf spark.openmetadata.transport.pipelineName=glue_pipeline_name --conf spark.openmetadata.transport.timeout=30
```

2. Add the `--user-jars-first` parameter and set its value to `true`

{% image
src="/images/v1.5/connectors/spark/glue-job-params.png"
alt="Glue Job Configure Params"
caption="Glue Job Configure Params"
/%}

Original file line number Diff line number Diff line change
Expand Up @@ -343,3 +343,39 @@ spark.openmetadata.transport.timeout 30
```

After all these steps are completed you can start/restart your compute instance and you are ready to extract the lineage from spark to OpenMetadata.


## Using Spark Agent with Glue

Follow the below steps in order to use OpenMetadata Spark Agent with glue.

### 1. Specify the OpenMetadata Spark Agent JAR URL

1. Upload the OpenMetadata Spark Agent Jar to S3
2. Navigate to the glue job,In the Job details tab, navigate to Advanced properties → Libraries → Dependent Jars path
3. Add the S3 url of OpenMetadata Spark Agent Jar in the Dependent Jars path.

{% image
src="/images/v1.6/connectors/spark/glue-job-jar.png"
alt="Glue Job Configure Jar"
caption="Glue Job Configure Jar"
/%}


### 2. Add Spark configuration in Job Parameters

In the same Job details tab, add a new property under Job parameters:

1. Add the `--conf` property with following value, make sure to customize this configuration as described in the above documentation.

```
spark.extraListeners=org.openmetadata.spark.agent.OpenMetadataSparkListener --conf spark.openmetadata.transport.hostPort=https://your-org.host:port --conf spark.openmetadata.transport.type=openmetadata --conf spark.openmetadata.transport.jwtToken=<jwt-token> --conf spark.openmetadata.transport.pipelineServiceName=glue_spark_pipeline_service --conf spark.openmetadata.transport.pipelineName=glue_pipeline_name --conf spark.openmetadata.transport.timeout=30
```

2. Add the `--user-jars-first` parameter and set its value to `true`

{% image
src="/images/v1.6/connectors/spark/glue-job-params.png"
alt="Glue Job Configure Params"
caption="Glue Job Configure Params"
/%}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e318316

Please sign in to comment.