Skip to content

Commit

Permalink
Merge pull request #36 from kukushking/feat/sagemaker-studio
Browse files Browse the repository at this point in the history
sagemaker studio module
  • Loading branch information
srinivasreddych authored Aug 2, 2023
2 parents a3df053 + 3169b9a commit 06262f3
Show file tree
Hide file tree
Showing 25 changed files with 1,406 additions and 1 deletion.
23 changes: 22 additions & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -219,4 +219,25 @@ jobs:
- name: Static checks and linting (mypy, flake8, black, isort)
run: scripts/validate.sh --language python --path modules/database/neptune
- name: Pytest
run: cd modules/database/neptune && pytest
run: cd modules/database/neptune && pytest

modules-ml-sagemaker-studio:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.9]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
- name: Install Requirements
run: |
python -m pip install --upgrade pip
pip install -r requirements-dev.txt
pip install -r modules/ml/sagemaker-studio/requirements.txt
- name: Static checks and linting (mypy, flake8, black, isort)
run: scripts/validate.sh --language python --path modules/ml/sagemaker-studio/
- name: Pytest
run: cd modules/ml/sagemaker-studio/ && pytest
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### **Added**

- added `sagemaker-studio` module with unit-tests
- enforced TLS version 1.2, node-node encryption and encryption at rest on OS module
- added `emr-serverless` module with unit-tests
- added workflow entries to all IDF modules
Expand Down
2 changes: 2 additions & 0 deletions manifests/local-isolated/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ groups:
path: manifests/local-isolated/replicator-modules.yaml
- name: compute
path: manifests/local-isolated/compute-modules.yaml
- name: ml
path: manifests/local-isolated/ml-modules.yaml
targetAccountMappings:
- alias: primary
accountId:
Expand Down
27 changes: 27 additions & 0 deletions manifests/local-isolated/ml-modules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: sagemaker-studio
path: modules/ml/sagemaker-studio
parameters:
- name: vpc_id
valueFrom:
moduleMetadata:
group: networking
name: basic-networking
key: VpcId
- name: subnet_ids
valueFrom:
moduleMetadata:
group: networking
name: basic-networking
key: PrivateSubnetIds
- name: data_science_users
value:
- ds-user-1
- name: lead_data_science_users
value:
- lead-ds-user-1
- name: server_lifecycle_name
value: studio-auto-shutdown
- name: studio_bucket_name
value: mlops-*
- name: retain_efs
value: 'False'
127 changes: 127 additions & 0 deletions modules/ml/sagemaker-studio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# SageMaker studio Infrastructure

This module contains the resources that are required to deploy the SageMaker Studio infrastructure. It defines the setup for Amazon SageMaker Studio Domain and creates SageMaker Studio User Profiles for Data Scientists and Lead Data Scientists.

**NOTE** To effectively use this repository you would need to have a good understanding around AWS networking services, AWS CloudFormation and AWS CDK.
- [SageMaker studio Infrastructure](#sagemaker-studio-infrastructure)
- [SageMaker Studio Stack](#sagemaker-studio-stack)
- [Inputs and outputs:](#inputs-and-outputs)
- [Required inputs:](#required-inputs)
- [Optional Inputs:](#optional-inputs)
- [Outputs (module metadata):](#outputs-module-metadata)
- [Example Output:](#example-output)
- [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Module Structure](#module-structure)
- [Troubleshooting](#troubleshooting)

### SageMaker Studio Stack

This stack handles the deployment of the following resources:

1. SageMaker Studio Domain requires, along with
2. IAM roles which would be linked to SM Studio user profiles. User Profile creating process is managed by manifests files in `manifests/shared-infra/mlops-modules.yaml`. You can simply add new entries in the list to create a new user. The user will be linked to a role depending on which group you add them to (`data_science_users` or `lead_data_science_users`).

```
- name: data_science_users
value:
- data-scientist
- name: lead_data_science_users
value:
- lead-data-scientist
```

3. Default SageMaker Project Templates are also enabled on the account on the targeted region using a custom resource; the custom resource uses a lambda function, `functions/sm_studio/enable_sm_projects`, to make necessary SDK calls to both Amazon Service Catalog and Amazon SageMaker.

## Inputs and outputs:
### Required inputs:
- `VPC_ID`
- `subnet_ids`
### Optional Inputs:
- `studio_domain_name`
- `studio_bucket_name`
- `app_image_config_name` - custom kernel app config name
- `image_name` - custom kernel image name
- `data_science_users` - a list of data science user names to create
- `lead_data_science_users` - a list of lead data science user names to create
- `retain_efs` - True | False -- if set to True, the EFS volume will persist after domain deletion. Default is True
- `enable_custom_sagemaker_projects` - True | False -- if set to True, custom sagemaker projects will be enabled for the data science and lead data science users. Default is False

### Outputs (module metadata):
- `StudioDomainName` - the name of the domain created by Sagemaker Studio
- `StudioDomainId` - the Id of the domain created by Sagemaker Studio
- `StudioBucketName` - the Bucket (or prefix) given access to Sagemaker Studio
- `StudioDomainEFSId` - the EFS created by Sagemaker Studio
- `DataScientistRoleArn`
- `LeadDataScientistRoleArn`
- `SageMakerExecutionRoleArn`

### Example Output:
```yaml
{
"DataScientistRoleArn": "arn:aws:iam::XXXXXXXXXXXX:role/idf-mlops-sagemaker-sage-smrolesdatascientistrole-DYPIVQ6NUSP9",
"LeadDataScientistRoleArn": "arn:aws:iam::XXXXXXXXXXXX:role/idf-mlops-sagemaker-sage-smrolesleaddatascientist-V1YL0FQONH62",
"SageMakerExecutionRoleArn": "arn:aws:iam::XXXXXXXXXXXX:role/idf-mlops-sagemaker-sage-smrolessagemakerstudioro-F6HGOUX0JGTI",
"StudioBucketName": "idf-*",
"StudioDomainEFSId": "fs-0a550ea71ecac4978",
"StudioDomainId": "d-flfqmvy84hfq",
"StudioDomainName": "idf-mlops-sagemaker-sagemaker-sagemaker-studio-studio-domain"
}
```

## Getting Started

### Prerequisites

This is an AWS CDK project written in Python 3.8. Here's what you need to have on your workstation before you can deploy this project. It is preferred to use a linux OS to be able to run all cli commands and avoid path issues.

* [Node.js](https://nodejs.org/)
* [Python3.8](https://www.python.org/downloads/release/python-380/) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
* [AWS CDK v2](https://aws.amazon.com/cdk/)
* [AWS CLI](https://aws.amazon.com/cli/)
* [Docker](https://docs.docker.com/desktop/)

### Module Structure

```
├── functions <--- lambda functions and layers
│ └── sm_studio <--- sagemaker studio stack related lambda function
│ └── enable_sm_projects <--- lambda function to enable sagemaker projects on the account and links the IAM roles of the domain users (used as a custom resource)
├── helper constructs <--- helper CDK constructs
│ └── sm_roles.py <--- helper construct containing IAM roles for sagemaker studio users
├── scripts <--- helper scripts
│ └── check_lcc_state.sh <--- script to check if sagemaker studio lifecycle config needs an update
│ └── delete-domains.py <--- python helper script to delete sagemaker domains
│ └── delete_efs.py <--- python helper script to delete efs mounts
│ └── on-jupyter-server-start.sh <--- script that installs the idle notebook auto-checker jupyter server extension
├── tests <--- module unit tests
├── app.py <--- cdk application entrypoint
├── coverage.ini <--- test coverage tool parameters file
├── deployspec.yaml <--- file that defines deployment instructions
├── modulestack.yaml <--- cloudformation stack that contains permissions needed to deploy the module
├── pyproject.toml <--- build system requirements and settings file
├── README.md <--- module documentation markdown file
├── requirements.txt <--- cdk packages used in the stacks (must be installed)
├── stack.py <--- stack to create sagemaker studio domain along with related IAM roles and the domain users
├── update-domain-input.template.json <--- json template to update sagemaker domain lifecycle configs
```
## Troubleshooting


* **Resource being used by another resource**

This error is harder to track and would require some effort to trace where is the resource that we want to delete is being used and severe that dependency before running the destroy command again.

**NOTE** You should just really follow CloudFormation error messages and debug from there as they would include details about which resource is causing the error and in some occasion information into what needs to happen in order to resolve it.


* **CDK version X instead of Y**

This error relates to a new update to cdk so run `npm install -g aws-cdk` again to update your cdk to the latest version and then run the deployment step again for each account that your stacks are deployed.

* **`cdk synth`** **not running**

One of the following would solve the problem:

* Docker is having an issue so restart your docker daemon
* Refresh your awscli credentials
80 changes: 80 additions & 0 deletions modules/ml/sagemaker-studio/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
import json
import os
from typing import cast

import aws_cdk
from aws_cdk import CfnOutput

from stack import SagemakerStudioStack

project_name = os.getenv("SEEDFARMER_PROJECT_NAME", "")
deployment_name = os.getenv("SEEDFARMER_DEPLOYMENT_NAME", "")
module_name = os.getenv("SEEDFARMER_MODULE_NAME", "")
app_prefix = f"{project_name}-{deployment_name}-{module_name}"

DEFAULT_STUDIO_DOMAIN_NAME = f"{app_prefix}-studio-domain"
DEFAULT_STUDIO_BUCKET_NAME = f"{app_prefix}-bucket"
DEFAULT_CUSTOM_KERNEL_APP_CONFIG_NAME = None
DEFAULT_CUSTOM_KERNEL_IMAGE_NAME = None
DEFAULT_ENABLE_CUSTOM_SAGEMAKER_PROJECTS = False


def _param(name: str) -> str:
return f"SEEDFARMER_PARAMETER_{name}"


vpc_id = os.getenv(_param("VPC_ID"))
subnet_ids = json.loads(os.getenv(_param("SUBNET_IDS"), "[]"))
studio_domain_name = os.getenv(_param("STUDIO_DOMAIN_NAME"), DEFAULT_STUDIO_DOMAIN_NAME)
studio_bucket_name = os.getenv(_param("STUDIO_BUCKET_NAME"), DEFAULT_STUDIO_BUCKET_NAME)
app_image_config_name = os.getenv(_param("CUSTOM_KERNEL_APP_CONFIG_NAME"), DEFAULT_CUSTOM_KERNEL_APP_CONFIG_NAME)
image_name = os.getenv(_param("CUSTOM_KERNEL_IMAGE_NAME"), DEFAULT_CUSTOM_KERNEL_IMAGE_NAME)
enable_custom_sagemaker_projects = bool(
os.getenv(_param("ENABLE_CUSTOM_SAGEMAKER_PROJECTS"), DEFAULT_ENABLE_CUSTOM_SAGEMAKER_PROJECTS)
)

environment = aws_cdk.Environment(
account=os.environ["CDK_DEFAULT_ACCOUNT"],
region=os.environ["CDK_DEFAULT_REGION"],
)

data_science_users = json.loads(os.getenv(_param("DATA_SCIENCE_USERS"), "[]"))
lead_data_science_users = json.loads(os.getenv(_param("LEAD_DATA_SCIENCE_USERS"), "[]"))

app = aws_cdk.App()
stack = SagemakerStudioStack(
app,
app_prefix,
project_name=project_name,
deployment_name=deployment_name,
module_name=module_name,
vpc_id=cast(str, vpc_id),
subnet_ids=subnet_ids,
studio_domain_name=studio_domain_name,
studio_bucket_name=studio_bucket_name,
data_science_users=data_science_users,
lead_data_science_users=lead_data_science_users,
env=environment,
app_image_config_name=cast(str, app_image_config_name),
image_name=cast(str, image_name),
enable_custom_sagemaker_projects=enable_custom_sagemaker_projects,
)


CfnOutput(
scope=stack,
id="metadata",
value=stack.to_json_string(
{
"StudioDomainName": stack.studio_domain.domain_name,
"StudioDomainEFSId": stack.studio_domain.attr_home_efs_file_system_id,
"StudioDomainId": stack.studio_domain.attr_domain_id,
"StudioBucketName": studio_bucket_name,
"DataScientistRoleArn": stack.sm_roles.data_scientist_role.role_arn,
"LeadDataScientistRoleArn": stack.sm_roles.lead_data_scientist_role.role_arn,
"SageMakerExecutionRoleArn": stack.sm_roles.sagemaker_studio_role.role_arn,
}
),
)

app.synth()
3 changes: 3 additions & 0 deletions modules/ml/sagemaker-studio/coverage.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[run]
omit =
tests/*
46 changes: 46 additions & 0 deletions modules/ml/sagemaker-studio/deployspec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
publishGenericEnvVariables: True
deploy:
phases:
install:
commands:
- npm install -g [email protected]
- pip install -r requirements.txt
- apt-get install gettext-base
build:
commands:
- LCC_CONTENT=`openssl base64 -A -in scripts/on-jupyter-server-start.sh`
- export LCC_CONTENT=$LCC_CONTENT
- aws sagemaker create-studio-lifecycle-config --studio-lifecycle-config-name $SEEDFARMER_PARAMETER_SERVER_LIFECYCLE_NAME --studio-lifecycle-config-content $LCC_CONTENT --studio-lifecycle-config-app-type JupyterServer || true
- export LCC_ARN=$(aws sagemaker describe-studio-lifecycle-config --studio-lifecycle-config-name $SEEDFARMER_PARAMETER_SERVER_LIFECYCLE_NAME | jq -r ."StudioLifecycleConfigArn")
- echo $LCC_ARN
- ./scripts/check_lcc_state.sh
- cdk deploy --require-approval never --progress events --app "python app.py" --outputs-file ./cdk-exports.json
- cat cdk-exports.json
# Export metadata
- seedfarmer metadata convert -f cdk-exports.json || true
- export SEEDFARMER_MODULE_METADATA=$(cat SEEDFARMER_MODULE_METADATA)
- export DOMAIN_ID=$(echo ${SEEDFARMER_MODULE_METADATA} | jq -r ".StudioDomainId")
- echo $DOMAIN_ID
# Update SageMaker domain lifecycle config
- envsubst < "update-domain-input.template.json" > "update-domain-input.json"
- aws sagemaker update-domain --cli-input-json file://update-domain-input.json
destroy:
phases:
install:
commands:
- npm install -g [email protected]
- pip install -r requirements.txt
build:
commands:
- cdk destroy --force --app "python app.py"
- export EFS_ID=$(echo ${SEEDFARMER_MODULE_METADATA} | jq -r ".StudioDomainEFSId")
- export DOMAIN_ID=$(echo ${SEEDFARMER_MODULE_METADATA} | jq -r ".StudioDomainId")
- RETAIN_EFS=$(echo $SEEDFARMER_PARAMETER_RETAIN_EFS | tr '[:lower:]' '[:upper:]')
- echo $RETAIN_EFS
- echo $EFS_ID
- echo $DOMAIN_ID
- >
if [[ $RETAIN_EFS == "FALSE" ]]; then
echo "DELETING EFS"
python scripts/delete_efs.py ${EFS_ID} ${DOMAIN_ID} || true
fi;
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import boto3
import cfnresponse
from botocore.exceptions import ClientError

sm_client = boto3.client("sagemaker")
sc_client = boto3.client("servicecatalog")


def handler(event, context):
try:
if "RequestType" in event and event["RequestType"] in {"Create", "Update"}:
properties = event["ResourceProperties"]
roles = properties.get("ExecutionRoles", [])

for role in roles:
enable_sm_projects(role)

cfnresponse.send(event, context, cfnresponse.SUCCESS, {}, "")
except ClientError as exception:
print(exception)
cfnresponse.send(
event,
context,
cfnresponse.FAILED,
{},
physicalResourceId=event.get("PhysicalResourceId"),
)


def enable_sm_projects(studio_role_arn):
# enable Project on account level (accepts portfolio share)
response = sm_client.enable_sagemaker_servicecatalog_portfolio()

print(response)

# associate studio role with portfolio
response = sc_client.list_accepted_portfolio_shares()

print(response)

portfolio_id = ""

for portfolio in response["PortfolioDetails"]:
if portfolio["ProviderName"] == "Amazon SageMaker":
portfolio_id = portfolio["Id"]
break

response = sc_client.associate_principal_with_portfolio(
PortfolioId=portfolio_id, PrincipalARN=studio_role_arn, PrincipalType="IAM"
)

print(response)
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
cfnresponse
urllib3<2 # Lock to version before braking change to urllib
Empty file.
Loading

0 comments on commit 06262f3

Please sign in to comment.