Planet blog feed analysis #4

shakthimaan · 2018-01-29T08:12:32Z

A report is required for dgplug students and planet.dgplug.org to answer:

The number of posts per month
The interval between posts per user
The users who have not posted for over a month
The number of words per blog post per user

If the information can be fed into a database periodically using an application container, a Grafana dashboard can be constructed for the same.

farhaanbukhsh · 2018-02-01T18:57:00Z

This seems really interesting, I have a little experience with grafana but let me do a setup and lets see how we can better visualize it.

farhaanbukhsh · 2018-02-02T03:20:01Z

So I tried setting up grafana, was able to do this with the docker image that grafana has. I am thinking of using feedparser and give the github raw url to the feedparser of planet pages [1] and [2]. For now I am thinking we could run this script as a cron and generate the data. I have not explored the data source part but I feel a simple MySQL or Postgres can do it, but what I really loved and would like to use here is influxDB [3].

Once data is captured performing queries over it should not be very difficult. My only concern is a neat way to get data for each blog and populate it in influxdb and this should be done incrementally for example what if new blog is updated now I don't want all the information what I want is just the new blog.

I am thinking about writing a service which can listen to such kind of events. Frankly with grafana I feel the visualization is taken care of, the data collection part is the challenge here.

Schubisu · 2018-02-09T13:01:42Z

@farhaanbukhsh I'm not sure if I understand that correctly;
When using feedparser, to answer the questions from @shakthimaan above, imho you would need to save the following fields:

['source']['id'] -> the blog identifier (since the author field may be empty)
['id'] -> unique blog post identifier
a word count
['updated'] -> creation or last update date

if you check your db for the unique post id before inserting data, you're not going to have duplicates. It could also be discussed to link multiple blogs of single authors, as this special case might occur more often. This would however require some manual editing of the db.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planet blog feed analysis #4

Planet blog feed analysis #4

shakthimaan commented Jan 29, 2018

farhaanbukhsh commented Feb 1, 2018

farhaanbukhsh commented Feb 2, 2018 •

edited

Loading

Schubisu commented Feb 9, 2018

Planet blog feed analysis #4

Planet blog feed analysis #4

Comments

shakthimaan commented Jan 29, 2018

farhaanbukhsh commented Feb 1, 2018

farhaanbukhsh commented Feb 2, 2018 • edited Loading

Schubisu commented Feb 9, 2018

farhaanbukhsh commented Feb 2, 2018 •

edited

Loading