Open Edu Hub Search ETL

Step 1: Project Setup - Python 3.12 (manual approach)

make sure you have python3 installed (https://docs.python-guide.org/starting/installation/)
- (Python 3.12 or newer is required)
go to project root
Run the following commands:

sudo apt install python3-dev python3-pip python3-venv libpq-dev -y
python3 -m venv .venv

source .venv/bin/activate (on Linux Unix)

.venv\Scripts\activate.bat (on Windows)

pip3 install poetry

poetry install

Step 1 (alternative): Project Setup - Python (automated, via `poetry`)

Step 1: Make sure that you have Poetry v1.5.0+ installed
- for detailed instructions, please consult the Poetry Installation Guide
Step 2: Open your terminal in the project root directory:
- Step 2.1: If you want to have your .venv to be created inside the project root directory:
  - poetry config virtualenvs.in-project true
    - (this is an optional, strictly personal preference)
Step 3: Install dependencies (according to pyproject.toml) by running: poetry install

Step 2: Project Setup - required Docker Containers

If you have Docker installed, use docker-compose up to start up the multi-container for Splash and Playwright-integration.

As a last step, set up your config variables by copying the .env.example-file and modifying it if necessary:

cp converter/.env.example converter/.env

Running crawlers

A crawler can be run with scrapy crawl <spider-name>.
- (It assumes that you have an edu-sharing v6.0+ instance in your .env settings configured which can accept the data.)
If a crawler has Scrapy Spider Contracts implemented, you can test those by running scrapy check <spider-name>

Running crawlers via Docker

git clone https://github.com/openeduhub/oeh-search-etl
cd oeh-search-etl
cp converter/.env.example .env
# modify .env with your edu sharing instance
export CRAWLER=your_crawler_id_spider # i.e. wirlernenonline_spider
docker compose build scrapy
docker compose up

Building a Crawler

We use Scrapy as a framework. Please check out the guides for Scrapy spider (https://docs.scrapy.org/en/latest/intro/tutorial.html)
To create a new spider, create a file inside converter/spiders/<myname>_spider.py
We recommend inheriting the LomBase class in order to get out-of-the-box support for our metadata model
You may also Inherit a Base Class for crawling data, if your site provides LRMI metadata, the LrmiBase is a good start. If your system provides an OAI interface, you may use the OAIBase
As a sample/template, please take a look at the sample_spider.py or sample_spider_alternative.py
To learn more about the LOM standard we're using, you'll find useful information at https://en.wikipedia.org/wiki/Learning_object_metadata

Still have questions? Check out our GitHub-Wiki!

If you need help getting started or setting up your work environment, please don't hesitate to visit our GitHub Wiki at https://github.com/openeduhub/oeh-search-etl/wiki

Name		Name	Last commit message	Last commit date
Latest commit History 1,230 Commits
.github/workflows		.github/workflows
.run		.run
converter		converter
csv		csv
edu_sharing_openapi		edu_sharing_openapi
logs		logs
schulcloud		schulcloud
tests		tests
valuespace_converter		valuespace_converter
zip_download		zip_download
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Readme.md		Readme.md
crawl.sh		crawl.sh
crawl_schulcloud.sh		crawl_schulcloud.sh
docker-compose-dev.yml		docker-compose-dev.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
scrapy.cfg		scrapy.cfg
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Edu Hub Search ETL

Step 1: Project Setup - Python 3.12 (manual approach)

Step 1 (alternative): Project Setup - Python (automated, via `poetry`)

Step 2: Project Setup - required Docker Containers

Running crawlers

Running crawlers via Docker

Building a Crawler

Still have questions? Check out our GitHub-Wiki!

About

Releases 59

Packages

Languages

hpi-schul-cloud/oeh-search-etl

Folders and files

Latest commit

History

Repository files navigation

Open Edu Hub Search ETL

Step 1: Project Setup - Python 3.12 (manual approach)

Step 1 (alternative): Project Setup - Python (automated, via poetry)

Step 2: Project Setup - required Docker Containers

Running crawlers

Running crawlers via Docker

Building a Crawler

Still have questions? Check out our GitHub-Wiki!

About

Resources

Stars

Watchers

Forks

Releases 59

Packages 0

Languages

Step 1 (alternative): Project Setup - Python (automated, via `poetry`)

Packages