CS 425 MP 4

Simple Map Reduce by Minh Phan (minhnp2) and Shivam Pankaj Kumar (shivamk4)

Getting Started

This project is built on Rust. You need the following steps to run:

To install Rust, use the following command on your system and then follow instructions:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Then clone the project:

git clone https://gitlab.engr.illinois.edu/shivamk4/cs-425-mp-4.git

Now on every machine get the nodes up and running:

    cd cs-425-mp-4/sdfs
    cargo build --release
    cargo run --release

An input field will spin up, and you can input your commands/requests.

Note: The scripts used for map and reduce operations must be Python scripts.

List of available commands:

Listing the nodes's membership list (stored using ip addresses):

    list_mem

Listing the nodes's own ip:

    list_self

Leaving the system:

    leave

PUT'ing file onto the filesystem:

    put <local_file_path> <remote_file_name>

Example:

    put /home/tmp/local_file.dat remote_file.dat

GET'ing file from the filesystem:

    get <remote_file_name> <local_file_path>

Example:

    get remote_file.dat /home/tmp/local_file.dat

Listing nodes storing a particular file:

    ls <remote_file_name>

Example:

    ls remote_file.dat

Listing files stored by this particular node:

    store

Initiate GET from the same file on the SDFS by multiple nodes (multi-read):

    multiread <remote_file_name> <local_file_path> <ip_1> <ip_2> <ip_3> ..

The ip's are your nodes' ip addresses. You can add however many ip's as you like. Example:

    multiread remote_file.dat /home/tmp/local_file.dat 127.0.0.1 128.0.0.1 129.0.0.1 130.0.0.1

Perform a map operation:

    maple <local_python_script_path> <num_tasks> <output_prefix> <remote_source_directory> <executable argument 1> <executable argument 2> ..

You can add how many executable arguments as you want. The following example puts a dataset onto the file system then performs a regex search:

    put dataset.csv dataset.csv
    maple /home/scripts/regex_search_map.py 7 regex dataset \w*

Perform a reduce operation:

    juice <local_python_script_path> <num_tasks> <input_prefix> <output_file_name> <true|false>

For the last argument, input true or false to denote whether to delete the input files. The following example is a follow up from the previous one:

    juice /home/scripts/regex_search_reduce.py 7 regex search_output.txt true

Performs a sequel filter using regex:

    SELECT ALL FROM <dataset_directory> WHERE <regex>

The examples from map and reduce can be shortened as:

    SELECT ALL FROM dataset WHERE \w*

Note how you don't need to provide an executable, and don't need to wrap the regex string in quotes. The output file name will be dataset_filter

Performs a sequel join using regex:

    SELECT ALL FROM <dataset_1_directory> <dataset_2_directory> WHERE <d1_field> = <d2_field>

There must be spaces around =. The following example uploads 2 datasets to the filesystem, then performs a join:

    put cars.csv cars.csv
    put trucks.csv trucks.csv
    SELECT ALL FROM cars trucks WHERE cars.price = trucks.price

The output filename will be cars_trucks_join

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
sdfs		sdfs
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS 425 MP 4

Simple Map Reduce by Minh Phan (minhnp2) and Shivam Pankaj Kumar (shivamk4)

Getting Started

List of available commands:

About

Releases

Packages

Languages

MinhPhan8803/simple-map-reduce

Folders and files

Latest commit

History

Repository files navigation

CS 425 MP 4

Simple Map Reduce by Minh Phan (minhnp2) and Shivam Pankaj Kumar (shivamk4)

Getting Started

List of available commands:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages