Warning: This setup process is highly complex and has a lot of moving pieces.
We all used our student AWS developer accounts to test and deploy it. If you want to run the CloudFormation, you'll need to create an account with Serverless and configure your AWS credentials
To run Aerocene locally, you'll need DynamoDB installed: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html
To deploy an Aerocene CloudFormation to AWS, you'll need to install Docker, which requires creating a free account.
-
If you don't have a virtual environment wrapper installed, install
virtualenv
-
Create a virtual environment with Python 3
virtualenv aerocene --python=python3;
cd aerocene;
source bin/activate;
git clone <repo_url> aerocene;
cd aerocene;
(Current directory /aerocene/aerocene/
)
You'll need both the python requirements and the javascript requirements for Serverless. The python requirements can be installed via pip, and the javascript via npm.
Install the serverless framework globally (to be able to use the CLI)
npm install -g serverless
Log in to serverless, configure AWS credentials
sls login
Install python requirements
pip install -r requirements.txt
Install javascript dependencies
npm install
Install dynamodb plugin for serverless
sls dynamodb install
-
Open settings.py and set
DEBUG = True
-
run
sls dynamodb start --migrate
to create a local dynamodb server -
open a new terminal and run
sls wsgi serve
to run the lambda server. (If you get a 'SSL: CERTIFICATE_VERIFY_FAILED' error, try running 'pip install certifi /Applications/Python\ 3.6/Install\ Certificates.command' in your terminal.) -
Open a browser to
http://localhost:5000
. You should receive a successful response. -
Navigate to
http://localhost:5000/scrape_instagram
. You should receive a json response containing Instagram data.
Trial 0 scrapes Instagram by sending requests from your machine. Start with 10 pages of size 10.
time python trial0.py <pages> <page_size>
Trial 1 (in development mode) scrapes instagram by sending requests through the locally running Aerocene.
time python trial1.py <pages> <page_size>
Trial 2 tests the CloudFormation. It requires a production build, which we'll do later.
Trial 3 tests the adversarial server. You need to have the server and dynamodb running locally for it to work
sls dynamodb start --migrate
Separate terminal:
sls wsgi serve
Separate terminal:
time python trial3.py <endpoint>
Trial 4 simulates an ip-rotation strategy against the adversarial server.
time python trial4.py <endpoint>
Note -- this won't work if you don't have an AWS account and your credentials set up, or aren't logged in to serverless.
Open settings.py and change DEBUG = False
. If you don't do this, your production instance will try to query localhost, which won't work.
Use serverless to deploy to AWS
sls deploy --stage production
It should take a couple of minutes.
It will give you a url where your application is hosted. Copy and paste that url into
PRODUCTION_URL
in settings.
If you visit the AWS console you should be able to see your lambda functions created as well as DynamoDB tables, and all of the necessary connections between them established. If you visit the url returned by the deploy
command you should see that your Aerocene deployment is live.
Trials 1 and 2 can interact with the production build
Trial 1 scrapes instagram through AWS Lambda
time python trial1 <pages> <page_size>
Trial 2 scrapes a number of pages using the Aerocene Cloudformation, creating a scrape record and then periodically querying to see if it is finished.
time python trial2.py <pages> <page_size>