Skip to content

DataBricks scripts for data ingestion, wrangling, processing, and basic visualizations.

Notifications You must be signed in to change notification settings

NRCan/DataHub-Databricks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DataHub-Databricks

NRCan Datahub leverages Databricks for Data Projects to enable data ingestion, wrangling, processing, and basic visualizations.

This repository contains Sample Projects, Sample Code and demos.

Sample Projects

  • CITSM - Databricks is used to ingest data from Elsevier API of publications related to NRCAN and related citation data and structures the result into Hive tables. Power BI is used to connect to the data source and summarize the results.
  • Departmental Resource Framework - Ingest project tracker data from an excel file and load into a hive table (delta table) to be consumbed by Power BI for analysis

sample code

  • Load GeoChem Data.ipynb Contails sample code for loading extracting data from multiple worksheets in Excel spreadsheet using python (pandas)

Demo

  • Contains code snippets used by Datahub to demo Databricks and its capabilities

About

DataBricks scripts for data ingestion, wrangling, processing, and basic visualizations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published