Skip to content

Latest commit

 

History

History
173 lines (142 loc) · 11.6 KB

assignment.md

File metadata and controls

173 lines (142 loc) · 11.6 KB

In this Final Project, you have 2 options; you can choose the option of working on the pre-prepared questions that are a part of the Standard Final project (Option 1), or you can work on one of the provided five datasets and formulate your own data question and Dashboards in Tableau (Option 2).

It's up to you!

Note

If you choose to work on your own project (Option 2) and you feel lost after working on it for an hour or so (you can’t decide on a question based on the dataset, or are finding it difficult to get inspired), we recommend you to switch to the available Standard Final Project (Option 1) to finish the project on time.

Option 1: Standard Final Project

You will be working with data from the Canadian Open Data portal. Specifically, you will work with the following datasets that are linked in this folder for download access. The files in the folder are:

  • Weekly earnings from 1.1.2001 to 15.4.2015 (weekly_earnings - CSV)
  • Housing constructions from 1955 to 2019 (real_estate_numbers - CSV)
  • House prices from 1.1.2005 to 1.9.2020 (real_estate_prices - EXCEL)
  • Housing_price_index from November 1979 to September 2020
  • Office_realestate_index from November 1979 to September 2020
  • Consumer index from November 1979 to September 2020

Tasks:

Use Tableau to answer the following questions and deliver results using a 5-minute PowerPoint or PDF presentation. All questions should be answered using the right visualizations:

  • Show the trend of house prices across Canada in the last 40 years (table housing_price_index).
  • Compare the trend after 2005 with actual benchmark prices in table real_estate_prices to see if there are any differences.
  • Compare this trend with the trend of office prices. Which one is getting more expensive, faster?
  • Create a heatmap of Canada with current house prices for each available district.
  • Are the price differences between different districts increasing?
  • Compare the trend of house prices with earnings. *In case you want to plot monthly salary, be aware that the earnings value is per week.
  • Did people spend more of their earnings in 2014 than they did in 2001?
  • There were several economic crises in the world in the last 40 years, including these four: Black Monday (1987), Recession (early 1990s), dot com bubble (2000 - 2002), Financial crisis (2007 - 2009). Show the effect of these crises on:
    • Earnings
    • House prices
    • Office prices
    • House constructions
    • Consumer index
  • Plot consumer_index together with housing_price_index and fit the regression line between them. Can we predict consumer_index from the housing_price_index?
  • Try to find an interesting pattern, trend, outlier, etc. from the data used in the above questions.
    • HINT : Double check all units in the table before any comparison.

Option 2: Creating on Your Own Question and Dashboard

If you decide to go with Option 2, this folder includes the 5 datasets you can choose from. They are also all linked below.

I) FAA Wildlife Strikes, 2015:

This is a cleaned table of wildlife strikes from 2000-2015 in the United States. Visit the FAA Wildlife Strike Database which contains records of wildlife strikes reported by airlines, airports, pilots, and other sources. The dataset is available here as faa_data_subset.xlsx

Follow the steps below:

  1. Connect your data
  2. Detect different data types in your data
  3. Build at least 5 different visualizations to learn more about the dataset
  4. There are three main different categorical features in the table. Try to learn more about these categories and find appropriate numerical features to study different trends.
  • Effect
  • When
  • Wildlife
  1. In your final project you should show visualizations with:
  • Maps
  • Date and time
  • Analytical visuals (Forecasting - Clustering)
  • Show Me tables
  1. Try to find an interesting pattern, trend, outlier, etc. from the data used in the above steps.
  2. From step 5, try to detect meaningful keypoints. This is the starting point to think about your dashboard.
  3. Now that you are familiar with your dataset and your columns, in this step, come up with different questions which you will be answering and presenting at the end of this project
  4. Create the dashboard to answer your questions that you came up with in step 8 and try to revise your questions along the way.
  5. Get ready to present your dashboard

II) FIFA2018 Player Ratings:

This is a cleaned data set from FIFA players in FIFA 2018 with more than 17k players with 70 attributes extracted from the game. The dataset is available here as fif18_clean.csv.

Follow the steps below:

  1. Connect your data
  2. Detect different data types in your data - Which important data type is missing?
  3. Build at least 5 different visualizations to learn more about the dataset.
  4. This table has 70 different features. Can you detect the most important features ? Why do you think these are important? Could you define a question for yourself? Try to learn more about these categories and find appropriate numerical features to study different trends in them.
  • For example , the following features could be important for your analysis :
  • Value
  • Overall
  • Acceleration
  • Balance
  • Ball Control
  • Finishing
  • KG reflexes
  • Reactions
  • Nationalities
  1. In your final project presentation, you should show visualizations with:
  • Maps
  • Analytical visuals (Forecasting - Clustering)
  • Show Me tables
  1. Try to find an interesting pattern, trend, outlier, etc. from the data used in the above steps.
  2. From step 5, try to detect meaningful keypoints. This is the starting point to think about your dashboard.
  3. Now that you are familiar with your dataset and your columns, in this step, come up with different questions which you will be answering and presenting at the end of this project.
  4. Create the dashboard to answer your questions that you came up with in step 8 and try to revise your questions along the way.
  5. Get ready to present your dashboard

III) AIRBNB:

This table contains 30,478 Airbnb listings in New York City. This data was compiled from Inside Airbnb. The dataset is available here as airbnb.xlsx.

Follow the steps below:

  1. Connect your data
  2. Detect different data types in your data
  3. Build at least 5 different visualizations to learn more about the dataset.
  4. There are three main different categorical features in the table. Try to learn more about these categories and find appropriate numerical features to study different trends in them.
  • Room Type
  • Beds
  • Price
  • Neighbourhood
  • Host Since
  1. In your final project presentation, you should show visualizations with:
  • Maps
  • Date and time
  • Analytical visuals (Forecasting - Clustering)
  • Show Me tables
  1. Try to find an interesting pattern, trend, outlier, etc. from the data used in the above steps.
  2. From step 5, try to detect meaningful keypoints. This is the starting point to think about your dashboard.
  3. Now that you are familiar with your dataset and your columns, in this step, come up with different questions which you will be answering and presenting at the end of this project
  4. Create the dashboard to answer your questions that you came up with in step 8 and try to revise your questions along the way.
  5. Get ready to present your dashboard

IV) Tuberculosis Burden by Country:

The World Health Organization estimates the prevalence and mortality of Tuberculosis by country. The dataset is available here as TD_Burden_Country.csv.

  1. Connect your data
  2. Detect different data types in your data
  3. Build at least 5 different visualizations to learn more about the dataset.
  4. This table has 47 different features. Can you detect the most important features ? Why do you think these are important? Can you define a question for yourself? Try to learn more about these categories and find appropriate numerical features to study different trends in them.
  • For example , the following features could be important for your analysis:
    • Country or territory name
    • Year
    • Estimated total population number
    • Estimated prevalence of TB (all forms) per 100 000 population
    • Estimated prevalence of TB (all forms)
    • Method to derive prevalence estimates
    • Estimated number of deaths from TB (all forms, excluding HIV)
    • Method to derive mortality estimates
    • Case detection rate (all forms), percent
  1. In your final project you should show visualizations with:
  • Maps
  • Date and time
  • Analytical visuals (Forecasting - Clustering)
  • Show Me tables
  1. Try to find an interesting pattern, trend, outlier, etc. from the data used in the above steps.
  2. From step 5, try to detect meaningful keypoints. This is the starting point to think about your dashboard.
  3. Now that you are familiar with your dataset and your columns, in this step, come up with different questions which you will be answering and presenting at the end of this project.
  4. Create the dashboard to answer your questions that you came up with in step 8 and try to revise your questions along the way.
  5. Get ready to present your dashboard.

V) Causes of Death - Our World In Data:

The Global Burden of Disease is a major global study on the causes of death and disease published in the medical journal The Lancet. These estimates of the annual number of deaths dataset are shown here. The dataset is available from the Kaggle website: Cause of Death - Our World In Data.

  1. Connect your data
  2. Detect different data types in your data
  3. Build at least 5 different visualizations to learn more about the dataset.
  4. This table has 36 different features. Can you detect the most important features ? Why do you think these are important? Can you define a question for yourself? Try to learn more about these categories and find appropriate numerical features to study different trends in them. In this dataset, every column has the same importance because it explains one of the conditions that leads to death. Your task is to identify the most important reasons of death (features) in all countries as well as within each country.
  5. In your final project you should show visualizations with:
  • Maps
  • Date and time
  • Analytical visuals (Forecasting - Clustering)
  • Show Me tables
  1. Try to find an interesting pattern, trend, outlier, etc. from the data used in the above steps.
  2. From step 5, try to detect meaningful keypoints. This is the starting point to think about your dashboard.
  3. Now that you are familiar with your dataset and your columns, in this step, come up with different questions which you will be answering and presenting at the end of this project.
  4. Create the dashboard to answer your questions that you came up with in step 8 and try to revise your questions along the way.
  5. Get ready to present your dashboard.