-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #36 from kukushking/feat/sagemaker-studio
sagemaker studio module
- Loading branch information
Showing
25 changed files
with
1,406 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: sagemaker-studio | ||
path: modules/ml/sagemaker-studio | ||
parameters: | ||
- name: vpc_id | ||
valueFrom: | ||
moduleMetadata: | ||
group: networking | ||
name: basic-networking | ||
key: VpcId | ||
- name: subnet_ids | ||
valueFrom: | ||
moduleMetadata: | ||
group: networking | ||
name: basic-networking | ||
key: PrivateSubnetIds | ||
- name: data_science_users | ||
value: | ||
- ds-user-1 | ||
- name: lead_data_science_users | ||
value: | ||
- lead-ds-user-1 | ||
- name: server_lifecycle_name | ||
value: studio-auto-shutdown | ||
- name: studio_bucket_name | ||
value: mlops-* | ||
- name: retain_efs | ||
value: 'False' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# SageMaker studio Infrastructure | ||
|
||
This module contains the resources that are required to deploy the SageMaker Studio infrastructure. It defines the setup for Amazon SageMaker Studio Domain and creates SageMaker Studio User Profiles for Data Scientists and Lead Data Scientists. | ||
|
||
**NOTE** To effectively use this repository you would need to have a good understanding around AWS networking services, AWS CloudFormation and AWS CDK. | ||
- [SageMaker studio Infrastructure](#sagemaker-studio-infrastructure) | ||
- [SageMaker Studio Stack](#sagemaker-studio-stack) | ||
- [Inputs and outputs:](#inputs-and-outputs) | ||
- [Required inputs:](#required-inputs) | ||
- [Optional Inputs:](#optional-inputs) | ||
- [Outputs (module metadata):](#outputs-module-metadata) | ||
- [Example Output:](#example-output) | ||
- [Getting Started](#getting-started) | ||
- [Prerequisites](#prerequisites) | ||
- [Module Structure](#module-structure) | ||
- [Troubleshooting](#troubleshooting) | ||
|
||
### SageMaker Studio Stack | ||
|
||
This stack handles the deployment of the following resources: | ||
|
||
1. SageMaker Studio Domain requires, along with | ||
2. IAM roles which would be linked to SM Studio user profiles. User Profile creating process is managed by manifests files in `manifests/shared-infra/mlops-modules.yaml`. You can simply add new entries in the list to create a new user. The user will be linked to a role depending on which group you add them to (`data_science_users` or `lead_data_science_users`). | ||
|
||
``` | ||
- name: data_science_users | ||
value: | ||
- data-scientist | ||
- name: lead_data_science_users | ||
value: | ||
- lead-data-scientist | ||
``` | ||
|
||
3. Default SageMaker Project Templates are also enabled on the account on the targeted region using a custom resource; the custom resource uses a lambda function, `functions/sm_studio/enable_sm_projects`, to make necessary SDK calls to both Amazon Service Catalog and Amazon SageMaker. | ||
|
||
## Inputs and outputs: | ||
### Required inputs: | ||
- `VPC_ID` | ||
- `subnet_ids` | ||
### Optional Inputs: | ||
- `studio_domain_name` | ||
- `studio_bucket_name` | ||
- `app_image_config_name` - custom kernel app config name | ||
- `image_name` - custom kernel image name | ||
- `data_science_users` - a list of data science user names to create | ||
- `lead_data_science_users` - a list of lead data science user names to create | ||
- `retain_efs` - True | False -- if set to True, the EFS volume will persist after domain deletion. Default is True | ||
- `enable_custom_sagemaker_projects` - True | False -- if set to True, custom sagemaker projects will be enabled for the data science and lead data science users. Default is False | ||
|
||
### Outputs (module metadata): | ||
- `StudioDomainName` - the name of the domain created by Sagemaker Studio | ||
- `StudioDomainId` - the Id of the domain created by Sagemaker Studio | ||
- `StudioBucketName` - the Bucket (or prefix) given access to Sagemaker Studio | ||
- `StudioDomainEFSId` - the EFS created by Sagemaker Studio | ||
- `DataScientistRoleArn` | ||
- `LeadDataScientistRoleArn` | ||
- `SageMakerExecutionRoleArn` | ||
|
||
### Example Output: | ||
```yaml | ||
{ | ||
"DataScientistRoleArn": "arn:aws:iam::XXXXXXXXXXXX:role/idf-mlops-sagemaker-sage-smrolesdatascientistrole-DYPIVQ6NUSP9", | ||
"LeadDataScientistRoleArn": "arn:aws:iam::XXXXXXXXXXXX:role/idf-mlops-sagemaker-sage-smrolesleaddatascientist-V1YL0FQONH62", | ||
"SageMakerExecutionRoleArn": "arn:aws:iam::XXXXXXXXXXXX:role/idf-mlops-sagemaker-sage-smrolessagemakerstudioro-F6HGOUX0JGTI", | ||
"StudioBucketName": "idf-*", | ||
"StudioDomainEFSId": "fs-0a550ea71ecac4978", | ||
"StudioDomainId": "d-flfqmvy84hfq", | ||
"StudioDomainName": "idf-mlops-sagemaker-sagemaker-sagemaker-studio-studio-domain" | ||
} | ||
``` | ||
|
||
## Getting Started | ||
|
||
### Prerequisites | ||
|
||
This is an AWS CDK project written in Python 3.8. Here's what you need to have on your workstation before you can deploy this project. It is preferred to use a linux OS to be able to run all cli commands and avoid path issues. | ||
|
||
* [Node.js](https://nodejs.org/) | ||
* [Python3.8](https://www.python.org/downloads/release/python-380/) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) | ||
* [AWS CDK v2](https://aws.amazon.com/cdk/) | ||
* [AWS CLI](https://aws.amazon.com/cli/) | ||
* [Docker](https://docs.docker.com/desktop/) | ||
|
||
### Module Structure | ||
|
||
``` | ||
├── functions <--- lambda functions and layers | ||
│ └── sm_studio <--- sagemaker studio stack related lambda function | ||
│ └── enable_sm_projects <--- lambda function to enable sagemaker projects on the account and links the IAM roles of the domain users (used as a custom resource) | ||
├── helper constructs <--- helper CDK constructs | ||
│ └── sm_roles.py <--- helper construct containing IAM roles for sagemaker studio users | ||
├── scripts <--- helper scripts | ||
│ └── check_lcc_state.sh <--- script to check if sagemaker studio lifecycle config needs an update | ||
│ └── delete-domains.py <--- python helper script to delete sagemaker domains | ||
│ └── delete_efs.py <--- python helper script to delete efs mounts | ||
│ └── on-jupyter-server-start.sh <--- script that installs the idle notebook auto-checker jupyter server extension | ||
├── tests <--- module unit tests | ||
├── app.py <--- cdk application entrypoint | ||
├── coverage.ini <--- test coverage tool parameters file | ||
├── deployspec.yaml <--- file that defines deployment instructions | ||
├── modulestack.yaml <--- cloudformation stack that contains permissions needed to deploy the module | ||
├── pyproject.toml <--- build system requirements and settings file | ||
├── README.md <--- module documentation markdown file | ||
├── requirements.txt <--- cdk packages used in the stacks (must be installed) | ||
├── stack.py <--- stack to create sagemaker studio domain along with related IAM roles and the domain users | ||
├── update-domain-input.template.json <--- json template to update sagemaker domain lifecycle configs | ||
``` | ||
## Troubleshooting | ||
|
||
|
||
* **Resource being used by another resource** | ||
|
||
This error is harder to track and would require some effort to trace where is the resource that we want to delete is being used and severe that dependency before running the destroy command again. | ||
|
||
**NOTE** You should just really follow CloudFormation error messages and debug from there as they would include details about which resource is causing the error and in some occasion information into what needs to happen in order to resolve it. | ||
|
||
|
||
* **CDK version X instead of Y** | ||
|
||
This error relates to a new update to cdk so run `npm install -g aws-cdk` again to update your cdk to the latest version and then run the deployment step again for each account that your stacks are deployed. | ||
|
||
* **`cdk synth`** **not running** | ||
|
||
One of the following would solve the problem: | ||
|
||
* Docker is having an issue so restart your docker daemon | ||
* Refresh your awscli credentials |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
import json | ||
import os | ||
from typing import cast | ||
|
||
import aws_cdk | ||
from aws_cdk import CfnOutput | ||
|
||
from stack import SagemakerStudioStack | ||
|
||
project_name = os.getenv("SEEDFARMER_PROJECT_NAME", "") | ||
deployment_name = os.getenv("SEEDFARMER_DEPLOYMENT_NAME", "") | ||
module_name = os.getenv("SEEDFARMER_MODULE_NAME", "") | ||
app_prefix = f"{project_name}-{deployment_name}-{module_name}" | ||
|
||
DEFAULT_STUDIO_DOMAIN_NAME = f"{app_prefix}-studio-domain" | ||
DEFAULT_STUDIO_BUCKET_NAME = f"{app_prefix}-bucket" | ||
DEFAULT_CUSTOM_KERNEL_APP_CONFIG_NAME = None | ||
DEFAULT_CUSTOM_KERNEL_IMAGE_NAME = None | ||
DEFAULT_ENABLE_CUSTOM_SAGEMAKER_PROJECTS = False | ||
|
||
|
||
def _param(name: str) -> str: | ||
return f"SEEDFARMER_PARAMETER_{name}" | ||
|
||
|
||
vpc_id = os.getenv(_param("VPC_ID")) | ||
subnet_ids = json.loads(os.getenv(_param("SUBNET_IDS"), "[]")) | ||
studio_domain_name = os.getenv(_param("STUDIO_DOMAIN_NAME"), DEFAULT_STUDIO_DOMAIN_NAME) | ||
studio_bucket_name = os.getenv(_param("STUDIO_BUCKET_NAME"), DEFAULT_STUDIO_BUCKET_NAME) | ||
app_image_config_name = os.getenv(_param("CUSTOM_KERNEL_APP_CONFIG_NAME"), DEFAULT_CUSTOM_KERNEL_APP_CONFIG_NAME) | ||
image_name = os.getenv(_param("CUSTOM_KERNEL_IMAGE_NAME"), DEFAULT_CUSTOM_KERNEL_IMAGE_NAME) | ||
enable_custom_sagemaker_projects = bool( | ||
os.getenv(_param("ENABLE_CUSTOM_SAGEMAKER_PROJECTS"), DEFAULT_ENABLE_CUSTOM_SAGEMAKER_PROJECTS) | ||
) | ||
|
||
environment = aws_cdk.Environment( | ||
account=os.environ["CDK_DEFAULT_ACCOUNT"], | ||
region=os.environ["CDK_DEFAULT_REGION"], | ||
) | ||
|
||
data_science_users = json.loads(os.getenv(_param("DATA_SCIENCE_USERS"), "[]")) | ||
lead_data_science_users = json.loads(os.getenv(_param("LEAD_DATA_SCIENCE_USERS"), "[]")) | ||
|
||
app = aws_cdk.App() | ||
stack = SagemakerStudioStack( | ||
app, | ||
app_prefix, | ||
project_name=project_name, | ||
deployment_name=deployment_name, | ||
module_name=module_name, | ||
vpc_id=cast(str, vpc_id), | ||
subnet_ids=subnet_ids, | ||
studio_domain_name=studio_domain_name, | ||
studio_bucket_name=studio_bucket_name, | ||
data_science_users=data_science_users, | ||
lead_data_science_users=lead_data_science_users, | ||
env=environment, | ||
app_image_config_name=cast(str, app_image_config_name), | ||
image_name=cast(str, image_name), | ||
enable_custom_sagemaker_projects=enable_custom_sagemaker_projects, | ||
) | ||
|
||
|
||
CfnOutput( | ||
scope=stack, | ||
id="metadata", | ||
value=stack.to_json_string( | ||
{ | ||
"StudioDomainName": stack.studio_domain.domain_name, | ||
"StudioDomainEFSId": stack.studio_domain.attr_home_efs_file_system_id, | ||
"StudioDomainId": stack.studio_domain.attr_domain_id, | ||
"StudioBucketName": studio_bucket_name, | ||
"DataScientistRoleArn": stack.sm_roles.data_scientist_role.role_arn, | ||
"LeadDataScientistRoleArn": stack.sm_roles.lead_data_scientist_role.role_arn, | ||
"SageMakerExecutionRoleArn": stack.sm_roles.sagemaker_studio_role.role_arn, | ||
} | ||
), | ||
) | ||
|
||
app.synth() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[run] | ||
omit = | ||
tests/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
publishGenericEnvVariables: True | ||
deploy: | ||
phases: | ||
install: | ||
commands: | ||
- npm install -g [email protected] | ||
- pip install -r requirements.txt | ||
- apt-get install gettext-base | ||
build: | ||
commands: | ||
- LCC_CONTENT=`openssl base64 -A -in scripts/on-jupyter-server-start.sh` | ||
- export LCC_CONTENT=$LCC_CONTENT | ||
- aws sagemaker create-studio-lifecycle-config --studio-lifecycle-config-name $SEEDFARMER_PARAMETER_SERVER_LIFECYCLE_NAME --studio-lifecycle-config-content $LCC_CONTENT --studio-lifecycle-config-app-type JupyterServer || true | ||
- export LCC_ARN=$(aws sagemaker describe-studio-lifecycle-config --studio-lifecycle-config-name $SEEDFARMER_PARAMETER_SERVER_LIFECYCLE_NAME | jq -r ."StudioLifecycleConfigArn") | ||
- echo $LCC_ARN | ||
- ./scripts/check_lcc_state.sh | ||
- cdk deploy --require-approval never --progress events --app "python app.py" --outputs-file ./cdk-exports.json | ||
- cat cdk-exports.json | ||
# Export metadata | ||
- seedfarmer metadata convert -f cdk-exports.json || true | ||
- export SEEDFARMER_MODULE_METADATA=$(cat SEEDFARMER_MODULE_METADATA) | ||
- export DOMAIN_ID=$(echo ${SEEDFARMER_MODULE_METADATA} | jq -r ".StudioDomainId") | ||
- echo $DOMAIN_ID | ||
# Update SageMaker domain lifecycle config | ||
- envsubst < "update-domain-input.template.json" > "update-domain-input.json" | ||
- aws sagemaker update-domain --cli-input-json file://update-domain-input.json | ||
destroy: | ||
phases: | ||
install: | ||
commands: | ||
- npm install -g [email protected] | ||
- pip install -r requirements.txt | ||
build: | ||
commands: | ||
- cdk destroy --force --app "python app.py" | ||
- export EFS_ID=$(echo ${SEEDFARMER_MODULE_METADATA} | jq -r ".StudioDomainEFSId") | ||
- export DOMAIN_ID=$(echo ${SEEDFARMER_MODULE_METADATA} | jq -r ".StudioDomainId") | ||
- RETAIN_EFS=$(echo $SEEDFARMER_PARAMETER_RETAIN_EFS | tr '[:lower:]' '[:upper:]') | ||
- echo $RETAIN_EFS | ||
- echo $EFS_ID | ||
- echo $DOMAIN_ID | ||
- > | ||
if [[ $RETAIN_EFS == "FALSE" ]]; then | ||
echo "DELETING EFS" | ||
python scripts/delete_efs.py ${EFS_ID} ${DOMAIN_ID} || true | ||
fi; |
52 changes: 52 additions & 0 deletions
52
modules/ml/sagemaker-studio/functions/sm_studio/enable_sm_projects/index.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
import boto3 | ||
import cfnresponse | ||
from botocore.exceptions import ClientError | ||
|
||
sm_client = boto3.client("sagemaker") | ||
sc_client = boto3.client("servicecatalog") | ||
|
||
|
||
def handler(event, context): | ||
try: | ||
if "RequestType" in event and event["RequestType"] in {"Create", "Update"}: | ||
properties = event["ResourceProperties"] | ||
roles = properties.get("ExecutionRoles", []) | ||
|
||
for role in roles: | ||
enable_sm_projects(role) | ||
|
||
cfnresponse.send(event, context, cfnresponse.SUCCESS, {}, "") | ||
except ClientError as exception: | ||
print(exception) | ||
cfnresponse.send( | ||
event, | ||
context, | ||
cfnresponse.FAILED, | ||
{}, | ||
physicalResourceId=event.get("PhysicalResourceId"), | ||
) | ||
|
||
|
||
def enable_sm_projects(studio_role_arn): | ||
# enable Project on account level (accepts portfolio share) | ||
response = sm_client.enable_sagemaker_servicecatalog_portfolio() | ||
|
||
print(response) | ||
|
||
# associate studio role with portfolio | ||
response = sc_client.list_accepted_portfolio_shares() | ||
|
||
print(response) | ||
|
||
portfolio_id = "" | ||
|
||
for portfolio in response["PortfolioDetails"]: | ||
if portfolio["ProviderName"] == "Amazon SageMaker": | ||
portfolio_id = portfolio["Id"] | ||
break | ||
|
||
response = sc_client.associate_principal_with_portfolio( | ||
PortfolioId=portfolio_id, PrincipalARN=studio_role_arn, PrincipalType="IAM" | ||
) | ||
|
||
print(response) |
2 changes: 2 additions & 0 deletions
2
modules/ml/sagemaker-studio/functions/sm_studio/enable_sm_projects/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
cfnresponse | ||
urllib3<2 # Lock to version before braking change to urllib |
Empty file.
Oops, something went wrong.