Skip to content

nusdbsystem/cool-covid-webapp

Repository files navigation

COOL for COVID19 Demo App

Cool

Introduction

COOL is a cohort online analytical processing system that processes both cohort queries and conventional OLAP queries with superb performance.

As an integrated system with the support of several newly proposed operators on top of a sophisticated storage layer, it processes both cohort queries and conventional OLAP queries with superb performance.

For more information, you can refer to the paper.

In this project, COOL is applied to Covid19 analysis and we also provide a demo video here.

Set up

  • Docker is required to set up the application's dependencies and can be installed here.

Quick Start

You can get the application up and running on your local dev environment with these steps

  • Change current working directory to project directory
  • Pre-install the docker environment first, by building and running the required docker containers
sh docker.sh start
  • restart the containers of the COOL system
sh docker.sh restart
  • Stop the containers of the COOL system
sh docker.sh stop
  • clean the containers and images of the COOL system
sh docker.sh clean
  • manually load docker (if you are running offline). download
docker load --input cool-front.zip
docker load --input cool-backend.zip
  • The application is now running at http://127.0.0.1:8201/

  • For an example user login, use the following details

User ID: admin
Password: zaq12wsx

Requirements for uploaded datasets:

  1. CSV file with "," as delimiter (typically dumped from a database table)

  2. Preferably includes id, time, event,value,time columns, appropriately named

  3. The columns of events in the dataset should match to elements in the event column

  4. time column should follow "YYYY-MM-DD" format

  5. All of the demographic columns (such as age, gender, race) should be corresponding to the value attributes.

  6. Value Columns only accept integer (Int32) values.

  7. Fill the columns of events with None

  8. Remove all the events with no value

  • Example dataset: here.
  • Example video: here.

Directory Descriptions

  • cool_backend & cool_front: These two directories contain settings of dockers for the COOL system.
  • cool_dashboard: The cool_dashboard directory contains settings for Django server.
  • dashboard: The dashboard directory contains the Main Django Application.

Dataset Preparation

Table.yaml

This section describes the schema of the dataset used in data compacting and query processing.

  • Example file: here.

For data compacting

When compacting data, "table.yaml" defines the exact schema of the dataset:

For column i of the dataset, the i-th entry of the schema file describes the meta-data of the column

Each entry has three attributes, i.e., name, fieldType and dataType.

  1. The name attribute is a unique string representing the respective column names.

  2. For fieldType attribute, there are five possible values: "UserKey", "Action", "ActionTime", "Segment" and "Metric".

    • "UserKey", "Action" and "ActionTime" indicates that the respective column contains the user id, event and event time. These three columns jointly compose the primary key of the dataset, and must be preesnt in the dataset.

    • "Segment" indicates that the respective column contains String values.

    • "Metric" indicates that the respective column contains Int32 values.

  3. The dataType attribute only has two possible values: String or Int32.

Note: ActionTime is treated as Int32, althought it may follow a timestamp format.

For query processing

Users can add more entries (used as cohort selection attributes in the query processing) to "table.yaml".

Each entry defines an aggregate function and hence, has two additional attributes, "baseField" and "aggregator" apart from those defined in the original schema file.

  • "name" is specified by user and should be different from the name of other entries.

  • "fieldType" and "dataType" of such entries are fixed and take the value of Metric and Aggregate, respectively.

  • "baseField" indicates which column of the original schema will be aggregated and takes the name attribute of the that column as its value.

  • "aggregator" indicates the aggregate function to apply. For now, it can be RETENTION. More aggregate functions are being developed.

Cube.yaml

For cube.yaml, there are two parts: dimensions and measures.

For now, dimensions is not used and can be omitted.

For each entry in the measures part, it contains three attributes: "aggregator", "name" and "tableFieldName".

  • "aggregator" and "name" has the same meaning as the schema file Table.yaml

  • "tableFieldName" is the same as the baseField attribute of the schema file. The entries of the measures part provides the metrics that can be specified in cohort queries.

  • Example file: here.

Literature References

Contact

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published