This repository has been archived by the owner on Apr 5, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
32 changed files
with
390 additions
and
82 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ dist: trusty | |
language: node_js | ||
node_js: | ||
- 'stable' | ||
- '4.2.1' | ||
- '7.9' | ||
services: | ||
- postgresql | ||
jdk: | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# AWS Batch | ||
|
||
Amazon Web Services Batch has been created to provide a simple way of running containers and simple commands on AWS without you having to closely manage the underlying EC2 infrastructure (although a knowledge of the underlying infrastructure will always be useful). While AWS Batch does not have an understanding of CWL like a full-on workflow engine, it does provide one of the simplest ways to run a large number of Dockstore tools at scale. Additionally, it provides an opportunity to run tools and manage resources almost totally from a GUI. | ||
|
||
For this tutorial, we will assume that you've run through the AWS Batch [Getting Started](https://docs.aws.amazon.com/batch/latest/userguide/Batch_GetStarted.html) tutorial as we will mainly be focusing on things that you will need to consider when running Dockstore tools while providing a brief overview of the process. | ||
|
||
Additionally, keep in mind that if you have a knowledge of CWL and/or do not need the Dockstore command-line to do file provisioning, you can decompose the underlying command-line invocation for the tool and use that as the command for your jobs, gaining a bit of performance. This tutorial focuses on using cwltool and using the Dockstore command-line to provide an experience that is more akin to running Dockstore or cwltool [on the command-line](/docs/launch#dockstore-cli) out of the box. | ||
|
||
1. Unfortunately, you will need to do the most difficult step first. You will need to determine how much disk space you want to run your tool. This can vary wildly from tool to tool. For the tools in this tutorial, we went with 100 GB of space for the root disk and 100GB for the Docker volume to run our sample data, up from 8 GB and 22 GB respectively. Next, you will need to create an image or AMI with this setup. Here you have a couple of options: | ||
1. Follow [Creating a Compute Resource AMI](https://docs.aws.amazon.com/batch/latest/userguide/create-batch-ami.html) from scratch | ||
2. Or launch the default [ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI_launch_latest.html), follow these instructions to [expand the EBS volume](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html#console-modify) and then [notify Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html#recognize-expanded-volume-linux) about the increased volume size before creating an AMI. Be careful to delete that touch file mentioned in the first tutorial. In our testing, we went with this second option although both should work. | ||
|
||
1. Create your Compute Environment, start with a managed environment and specify the instance role that you setup in the previous step. You may also want to specify a specific instance type if you want to ensure that only one tool/workflow runs on one VM at a time to conserve disk space. ![Configure compute environment](images/aws-batch-2.png) | ||
1. When you created your compute environment, you picked or created an [IAM role](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-roles.html) for your instances (ecInstanceRole in the screenshot) If you want either your input data or output data to live on S3, add that policy to the role. ![Configure IAM role for ecsInstanceRole](images/aws-batch-1.png) Effectively, this allows programs running on your VMs access to S3 buckets to read input files and write output files. You can also use a read-only policy if you only need to read input files from S3 or create a new policy with access to only specific buckets. | ||
1. Create a job queue. There's not too much to add here. | ||
1. Create your job definition. | ||
1. For your image, you will want to specify `quay.io/dockstore/batch_wrapper:1.0` or the latest tagged version [here](https://quay.io/repository/dockstore/batch_wrapper). This wrapper provides cwltool and the Dockstore CLI as well as some trivial glue and demo code. ![Job definition](images/aws-batch-3.png) | ||
2. Specify a number of CPUs and an amount of memory that is appropriate for your job. Our understanding is that this will not actually kill jobs that float above the threshold, but it will control how many jobs can be stacked in your instances. | ||
3. Specify volumes and mount points. Refer to the following image. `/datastore` is mounted to provide access for file provisioning. `/var/run/docker.sock/` is provided to allow cwltool to launch your desired Docker container using the Docker daemon. | ||
![Docker mounts](images/aws-batch-4.png) | ||
1. Create your job. Here you will specify the tool that you wish to run and the parameters that it will take. | ||
1. For a quick test, you can try the command `/test.sh quay.io/briandoconnor/dockstore-tool-md5sum:1.0.3 https://raw.githubusercontent.com/dockstore/batch_wrapper/master/aws/md5sum.s3.json` after modifying md5sum.s3.json to point to your S3 bucket rather than dockstore.temp and uploading it somewhere accessible. This will run a quick md5sum tool that copies the result to a S3 bucket (credentials are provided via that IAM role) in just a few minutes. ![Job definition](images/aws-batch-6.png) | ||
2. For more realistic jobs, you can try the [PCAWG project](http://icgc.org/working-pancancer-data-aws) BWA and Delly workflows which would use the commands `/test.sh quay.io/pancancer/pcawg-bwa-mem-workflow:2.6.8_1.2 https://raw.githubusercontent.com/dockstore/batch_wrapper/master/aws/bwa.s3.json | ||
` (approximately seven hours) and `/test.sh quay.io/pancancer/pcawg_delly_workflow:2.0.1-cwl1.0 https://raw.githubusercontent.com/dockstore/batch_wrapper/master/aws/delly.local.json` (approximately six hours) respectively. In the first case, modify the S3 bucket for your environment, in the second case the results will be saved to the local VM's `/tmp` directory and will vanish after the VM is terminated. | ||
1. Submit your job, wait for the results to show up in your S3 bucket, and celebrate. You've run jobs on AWS Batch! ![Job definition](images/aws-batch-hurray.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Azure Batch | ||
|
||
[Azure Batch](https://azure.microsoft.com/en-us/services/batch/) has been created to provide a simple way of running containers and simple commands on Azure without you having to closely manage the underlying VM infrastructure (although a knowledge of the underlying infrastructure will always be useful). While Azure Batch does not have an understanding of CWL like a full-on workflow engine, it does provide a very simple way to run a large number of Dockstore tools at scale. | ||
|
||
Azure Batch also provides a client-side tool called [Batch Shipyard](https://github.com/Azure/batch-shipyard) which provides a number of features including a simple command-line interface for submitting batch jobs. | ||
|
||
Of course, keep in mind that if you have a knowledge of CWL and/or do not need the Dockstore command-line to do file provisioning, you can decompose the underlying command-line invocation for the tool and use that as the command for your jobs, gaining a bit of performance. This tutorial focuses on using cwltool and using the Dockstore command-line to provide an experience that is more akin to running Dockstore or cwltool [on the command-line](/docs/launch#dockstore-cli) out of the box. | ||
|
||
1. Run through Azure Shipyard's [Linux Installation Guide](https://github.com/Azure/batch-shipyard/blob/master/docs/01-batch-shipyard-installation.md#step-2a-linux-run-the-installsh-script) and then the [Quickstart](https://github.com/Azure/batch-shipyard/blob/master/docs/02-batch-shipyard-quickstart.md) guide with one of the sample tools such as Torch-CPU. | ||
1. With the shipyyard CLI setup, get the md5sum sample recipes from GitHub | ||
``` | ||
$ git clone https://github.com/dockstore/batch_wrapper.git | ||
$ cd batch_wrapper/azure/ | ||
``` | ||
1. Fill out your `config.json`, `credentials.json`, and `jobs.json` in `config.dockstore.md5sum`. If you have trouble finding your access keys, take a look at this [article](https://docs.microsoft.com/en-us/azure/batch/batch-account-create-portal#view-batch-account-properties). In `jobs.json` note that we use AWS keys to provision or save the final output files. You will also need to modify the parameter json file `md5sum.s3.json` to reflect the location of your S3 bucket. | ||
1. Create a compute pool. Note that this pool is not setup to automatically resize. You may also need to pick a larger VM size with a larger dataset. | ||
``` | ||
$ ./shipyard pool add --configdir config.dockstore.md5sum | ||
``` | ||
1. Submit the job and watch the output (this should take roughly a minute if the pool already exists) | ||
``` | ||
$ ./shipyard jobs add --configdir config.dockstore.md5sum --tail stdout.txt | ||
2017-05-24 14:19:21.543 INFO - Adding job dockstorejob to pool dockstore | ||
2017-05-24 14:19:21.989 INFO - uploading file /tmp/tmp7lgz7_j7 as 'shipyardtaskrf-dockstorejob/dockertask-00012.shipyard.envlist' | ||
2017-05-24 14:19:22.027 DEBUG - submitting 1 tasks (0 -> 0) to job dockstorejob | ||
2017-05-24 14:19:22.090 INFO - submitted all 1 tasks to job dockstorejob | ||
2017-05-24 14:19:22.090 DEBUG - attempting to stream file stdout.txt from job=dockstorejob task=dockertask-00012 | ||
Creating directories for run of Dockstore launcher at: ./datastore//launcher-e849c691-cc47-4bfa-a443-b8830794ae0a | ||
Provisioning your input files to your local machine | ||
Downloading: #input_file from https://raw.githubusercontent.com/briandoconnor/dockstore-tool-md5sum/master/md5sum.input into directory: /mnt/batch/tasks/workitems/dockstorejob/job-1/dockertask-00012/wd/./datastore/launcher-e849c691-cc47-4bfa-a443-b8830794ae0a/inputs/ce735ade-8c46-4736-a7d8-2fc0cb7d2e87 | ||
[##################################################] 100% | ||
Calling out to cwltool to run your tool | ||
... | ||
Final process status is success | ||
Saving copy of cwltool stdout to: /mnt/batch/tasks/workitems/dockstorejob/job-1/dockertask-00012/wd/./datastore/launcher-e849c691-cc47-4bfa-a443-b8830794ae0a/outputs/cwltool.stdout.txt | ||
Saving copy of cwltool stderr to: /mnt/batch/tasks/workitems/dockstorejob/job-1/dockertask-00012/wd/./datastore/launcher-e849c691-cc47-4bfa-a443-b8830794ae0a/outputs/cwltool.stderr.txt | ||
Provisioning your output files to their final destinations | ||
Uploading: #output_file from /mnt/batch/tasks/workitems/dockstorejob/job-1/dockertask-00012/wd/./datastore/launcher-e849c691-cc47-4bfa-a443-b8830794ae0a/outputs/md5sum.txt to : s3://dockstore.temp/md5sum.txt | ||
Calling on plugin io.dockstore.provision.S3Plugin$S3Provision to provision to s3://dockstore.temp/md5sum.txt | ||
[##################################################] 100% | ||
``` | ||
1. You can repeat the process with `config.dockstore.bwa` which is a more realistic bioinformatics workflow from the [PCAWG project](http://icgc.org/working-pancancer-data-aws) and takes roughly seven hours. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.