ScrapingSystem

Scraping System

Summary

steve is used to scrape metadata and links from lists of videos, put that data into a form for a richard instance and then push that data to a richard instance.

Previously we used the vidscraper library. That's no longer maintained and it no longer works well, so we need to implement something ourselves.

Because there are so many different kinds of video sites out there and we get data in all forms, it's important to have a flexible system that minimizes the amount of time it takes for someone to assemble video data.

This wiki page mulls over that problem domain.

Requirements

steve fetch takes a url and downloads all the video metadata for videos at that url
steve scrapevideo takes a url for a single video and returns the video metadata for that single video
easy to generate processing pipeline which given a url will fetch the data and push the data through a series of transforms until it reaches a point where it's good for Richard
easy to build new scrapers and have steve use them without installing software
easy to create, reuse and maintain a set of default scrapers that come with steve

Architecture

FIXME: work through this

Scrapers

YouTube playlist
YouTube user
YouTube channel
others?

Provide feedback

Saved searches