Skip to content

MinhPhan8803/simple-map-reduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CS 425 MP 4

Simple Map Reduce by Minh Phan (minhnp2) and Shivam Pankaj Kumar (shivamk4)

Getting Started

This project is built on Rust. You need the following steps to run:

  1. To install Rust, use the following command on your system and then follow instructions:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  1. Then clone the project:
git clone https://gitlab.engr.illinois.edu/shivamk4/cs-425-mp-4.git  
  1. Now on every machine get the nodes up and running:
    cd cs-425-mp-4/sdfs
    cargo build --release
    cargo run --release

An input field will spin up, and you can input your commands/requests.

Note: The scripts used for map and reduce operations must be Python scripts.

List of available commands:

  1. Listing the nodes's membership list (stored using ip addresses):
    list_mem
  1. Listing the nodes's own ip:
    list_self
  1. Leaving the system:
    leave
  1. PUT'ing file onto the filesystem:
    put <local_file_path> <remote_file_name>

Example:

    put /home/tmp/local_file.dat remote_file.dat
  1. GET'ing file from the filesystem:
    get <remote_file_name> <local_file_path>

Example:

    get remote_file.dat /home/tmp/local_file.dat
  1. Listing nodes storing a particular file:
    ls <remote_file_name>

Example:

    ls remote_file.dat
  1. Listing files stored by this particular node:
    store
  1. Initiate GET from the same file on the SDFS by multiple nodes (multi-read):
    multiread <remote_file_name> <local_file_path> <ip_1> <ip_2> <ip_3> ..

The ip's are your nodes' ip addresses. You can add however many ip's as you like. Example:

    multiread remote_file.dat /home/tmp/local_file.dat 127.0.0.1 128.0.0.1 129.0.0.1 130.0.0.1
  1. Perform a map operation:
    maple <local_python_script_path> <num_tasks> <output_prefix> <remote_source_directory> <executable argument 1> <executable argument 2> ..

You can add how many executable arguments as you want. The following example puts a dataset onto the file system then performs a regex search:

    put dataset.csv dataset.csv
    maple /home/scripts/regex_search_map.py 7 regex dataset \w*
  1. Perform a reduce operation:
    juice <local_python_script_path> <num_tasks> <input_prefix> <output_file_name> <true|false>

For the last argument, input true or false to denote whether to delete the input files. The following example is a follow up from the previous one:

    juice /home/scripts/regex_search_reduce.py 7 regex search_output.txt true
  1. Performs a sequel filter using regex:
    SELECT ALL FROM <dataset_directory> WHERE <regex>

The examples from map and reduce can be shortened as:

    SELECT ALL FROM dataset WHERE \w*

Note how you don't need to provide an executable, and don't need to wrap the regex string in quotes. The output file name will be dataset_filter

  1. Performs a sequel join using regex:
    SELECT ALL FROM <dataset_1_directory> <dataset_2_directory> WHERE <d1_field> = <d2_field>

There must be spaces around =. The following example uploads 2 datasets to the filesystem, then performs a join:

    put cars.csv cars.csv
    put trucks.csv trucks.csv
    SELECT ALL FROM cars trucks WHERE cars.price = trucks.price

The output filename will be cars_trucks_join

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published