Skip to content

Commit

Permalink
Merge pull request #30 from InfuseAI/feature/upgrade-dbt-16
Browse files Browse the repository at this point in the history
[Feature] Bump to dbt 1.6 and support new metric
  • Loading branch information
wcchang1115 authored Sep 7, 2023
2 parents fd5aa1a + 1f75528 commit c04e4a0
Show file tree
Hide file tree
Showing 18 changed files with 74 additions and 247 deletions.
7 changes: 0 additions & 7 deletions .piperider/config.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,6 @@
dataSources: []
dbt:
projectDir: .
tag: 'piperider'

profiler:
table:
# the maximum row count to profile. (Default unlimited)
# limit: 1000000
duplicateRows: true

telemetry:
id: f3373c578173414fb4af8574d1a9725f
9 changes: 2 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
all: fetch load transform report piperider

fetch_gh:
@python -m git_repo_analytics.fetch_github_api
@echo

clone_repos:
@python -m git_repo_analytics.clone_repos
@echo

fetch: clone_repos fetch_gh
fetch: clone_repos

load:
@python -m git_repo_analytics.load
Expand All @@ -28,6 +24,5 @@ piperider:
@piperider run -o data/piperider --open
@echo


clean:
@rm -rf data
@rm -rf data
7 changes: 2 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,7 @@ This is a demo project for [PipeRider](https://github.com/InfuseAI/piperider). I
make load
```

The file `git_repo.duckdb` is generated

> Note: If rate limit exceeded, you can get a higher rate limit with authenticated requests to get a higher rate limit. <br/> You could set the `GITHUB_TOKEN` environment variable with your token value. <br/> `export GITHUB_TOKEN=XXXX`<br/> Check out the [GitHub documentation](https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting) for more details.
The file `git_repo.duckdb` is generated under `./data`

1. Run dbt
```
Expand All @@ -53,5 +51,4 @@ make

# Screenshots

![](assets/screenshot3.png)
![](assets/screenshot4.png)
![](assets/screenshot5.png)
Binary file added assets/screenshot5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 0 additions & 3 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,6 @@ models:
git_repo_analytics:
staging:
+tags: [piperider]
intermediate:
+materialized: table
+tags: [piperider]
marts:
+materialized: table
+tags: [piperider]
71 changes: 0 additions & 71 deletions git_repo_analytics/fetch_github_api.py

This file was deleted.

23 changes: 6 additions & 17 deletions git_repo_analytics/gen_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,39 +16,28 @@ def generate_table():
{'selector': 'td',
'props': [('padding', '5px')]}]

df = conn.query('SELECT * FROM repos').fetchdf()
df = conn.query(
'SELECT count(distinct(author)) as contributors, count(distinct(hash)) as commits '
'FROM stg_commits '
'GROUP BY repo'
).fetchdf()
df = df.fillna('')
df = df.replace({pd.NaT: None})
styled_table = df.style.set_table_styles(css)

styled_table = styled_table.format({'repo': lambda url: f'<a href="https://github.com/{url}">{url}</a>',
'homepage': lambda url: f'<a href="{url}">{url}</a>'})
styled_table = styled_table.format({'repo': lambda url: f'<a href="https://github.com/{url}">{url}</a>'})
html_table = styled_table.to_html(index=False)
return html_table


def generate_chart(path):
with duckdb.connect(database='data/git_repo.duckdb') as conn:
df = conn.query('SELECT * FROM commit_weekly').fetchdf()
fig, ax = plt.subplots()

for key, grp in df.groupby('repo'):
ax = grp.plot(ax=ax, kind='line', x='date_week', y='total_commits', label=key)
plt.legend(loc='best')
plt.savefig(path)


def generate():
html_table = generate_table()
os.makedirs('data/report', exist_ok=True)
generate_chart('data/report/weekly_commits.png')
html_path = 'data/report/index.html'
with open(html_path, 'w') as f:
f.write('<html><body>')
f.write('<h2>Git repo analytics</h2>')
f.write(html_table)
f.write('<h3>Weekly commits</h3>')
f.write("<img src='weekly_commits.png'>")
f.write('</body></html>')

abs_html_path = os.path.abspath(html_path)
Expand Down
18 changes: 0 additions & 18 deletions git_repo_analytics/load.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,23 +69,6 @@ def commit(c):
conn.executemany("INSERT INTO raw_commits VALUES(?, ?, ?, ?, ?, ?, ?)", map(commit, commits))


def load_github(conn):
print("Loading github data...")

conn.execute(
'''
create table raw_repos as
select * from 'data/gh/*/repo.csv';
'''
)
conn.execute(
'''
create table raw_contributers as
select * from 'data/gh/*/contributers.csv';
'''
)


if __name__ == '__main__':
fname = 'data/git_repo.duckdb'

Expand All @@ -94,4 +77,3 @@ def load_github(conn):

with duckdb.connect(database=fname) as conn:
load_repos(conn)
load_github(conn)
9 changes: 0 additions & 9 deletions models/marts/commit_weekly.sql

This file was deleted.

58 changes: 36 additions & 22 deletions models/marts/metrics.yml
Original file line number Diff line number Diff line change
@@ -1,31 +1,45 @@
version: 2

semantic_models:
- name: commits
model: ref('stg_commits')
description: commit information from git repo's raw commit
defaults:
agg_time_dimension: datetime

entities:
- name: repo
type: primary

dimensions:
- name: datetime
type: time
type_params:
time_granularity: day

measures:
- name: total_commits
description: "The total number of commits in the repo"
agg: count_distinct
expr: hash

- name: active_authors
description: "The total number of active authors in the repo"
agg: count_distinct
expr: author

metrics:

- name: total_commits
description: "The total number of commits in the repo"
type: simple
label: Total commits
model: ref('stg_commits')

calculation_method: count_distinct
expression: hash

timestamp: datetime
time_grains: [ day, week, month, quarter, year, all_time ]
dimensions: [ repo ]

tags:
- piperider
type_params:
measure: total_commits

- name: active_authors
description: "The total number of active authors in the repo"
type: simple
label: Active authors
model: ref('stg_commits')

calculation_method: count_distinct
expression: author

timestamp: datetime
time_grains: [ day, week, month ]
dimensions: [ repo ]

tags:
- piperider
type_params:
measure: active_authors
26 changes: 0 additions & 26 deletions models/marts/repos.sql

This file was deleted.

24 changes: 24 additions & 0 deletions models/metricflow_time_spine.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{{
config(
materialized = 'table',
)
}}

with days as (

{{
dbt_utils.date_spine(
'day',
"strptime('01/01/2000','%m/%d/%Y')",
"strptime('01/01/2027','%m/%d/%Y')"
)
}}

),

final as (
select cast(date_day as date) as date_day
from days
)

select * from final
2 changes: 0 additions & 2 deletions models/sources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,3 @@ sources:
schema: main
tables:
- name: raw_commits
- name: raw_repos
- name: raw_contributers
3 changes: 0 additions & 3 deletions models/staging/stg_contributers.sql

This file was deleted.

3 changes: 0 additions & 3 deletions models/staging/stg_repos.sql

This file was deleted.

4 changes: 2 additions & 2 deletions packages.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
packages:
- package: dbt-labs/metrics
version: [">=1.5.0", "<1.6.0"]
- package: dbt-labs/dbt_utils
version: 1.1.1
Loading

0 comments on commit c04e4a0

Please sign in to comment.