From 115c0a0abeb518487dd8a4ef1146205a77ad42fe Mon Sep 17 00:00:00 2001 From: Dan Saattrup Nielsen Date: Fri, 19 Apr 2024 07:54:08 +0200 Subject: [PATCH] docs: Update readme --- README.md | 153 ++++-------------------------------------------------- 1 file changed, 11 insertions(+), 142 deletions(-) diff --git a/README.md b/README.md index 584db26..104e002 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ -# foqa +# FoQA -Faroese question-answering dataset. +Faroese question-answering dataset, generated by GPT-4. ______________________________________________________________________ [![Code Coverage](https://img.shields.io/badge/Coverage-100%25-brightgreen.svg)](https://github.com/alexandrainst/foqa/tree/main/tests) @@ -16,149 +16,18 @@ Developer(s): - Dan Saattrup Nielsen (dan.nielsen@alexandra.dk) -## Setup +## Quickstart -### Installation - -1. Run `make install`, which sets up a virtual environment and all Python dependencies therein. +1. Run `make install`, which sets up a virtual environment and all Python dependencies + therein. 2. Run `source .venv/bin/activate` to activate the virtual environment. +3. Run `python src/scripts/create_dataset.py` to create the dataset. -### Adding and Removing Packages - -To install new PyPI packages, run: -``` -poetry add -``` - -To remove them again, run: -``` -poetry remove -``` - -To show all installed packages, run: -``` -poetry show -``` - - -## A Word on Modules and Scripts -In the `src` directory there are two subdirectories, `foqa` -and `scripts`. This is a brief explanation of the differences between the two. - -### Modules -All Python files in the `foqa` directory are _modules_ -internal to the project package. Examples here could be a general data loading script, -a definition of a model, or a training function. Think of modules as all the building -blocks of a project. - -When a module is importing functions/classes from other modules we use the _relative -import_ notation - here's an example: - -``` -from .other_module import some_function -``` - -### Scripts -Python files in the `scripts` folder are scripts, which are short code snippets that -are _external_ to the project package, and which is meant to actually run the code. As -such, _only_ scripts will be called from the terminal. An analogy here is that the -internal `numpy` code are all modules, but the Python code you write where you import -some `numpy` functions and actually run them, that a script. - -When importing module functions/classes when you're in a script, you do it like you -would normally import from any other package: - -``` -from foqa import some_function -``` - -Note that this is also how we import functions/classes in tests, since each test Python -file is also a Python script, rather than a module. - - -## Features - -### Docker Setup - -A Dockerfile is included in the new repositories, which by default runs -`src/scripts/your_script.py`. You can build the Docker image and run the Docker -container by running `make docker`. - -### Automatic Documentation - -Run `make docs` to create the documentation in the `docs` folder, which is based on -your docstrings in your code. You can view this by running `make view-docs`. - -### Automatic Test Coverage Calculation - -Run `make test` to test your code, which also updates the "coverage badge" in the -README, showing you how much of your code base that is currently being tested. - -### Continuous Integration - -Github CI pipelines are included in the repo, running all the tests in the `tests` -directory, as well as building online documentation, if Github Pages has been enabled -for the repository (can be enabled on Github in the repository settings). - -### Code Spaces +The raw dataset will be stored in `data/raw` and will be updated continuously during +creation, and the final dataset will appear in your `data/final`. -Code Spaces is a new feature on Github, that allows you to develop on a project -completely in the cloud, without having to do any local setup at all. This repo comes -included with a configuration file for running code spaces on Github. When hosted on -`alexandrainst/foqa` then simply press the `<> Code` button -and add a code space to get started, which will open a VSCode window directly in your -browser. +## Docker -## Project structure -``` -. -├── .devcontainer -│   └── devcontainer.json -├── .github -│   └── workflows -│   ├── ci.yaml -│   └── docs.yaml -├── .gitignore -├── .pre-commit-config.yaml -├── CODE_OF_CONDUCT.md -├── CONTRIBUTING.md -├── Dockerfile -├── LICENSE -├── README.md -├── config -│   ├── __init__.py -│   ├── config.yaml -│   └── hydra -│   └── job_logging -│   └── custom.yaml -├── data -│   ├── final -│   │   └── .gitkeep -│   ├── processed -│   │   └── .gitkeep -│   └── raw -│   └── .gitkeep -├── docs -│   └── .gitkeep -├── gfx -│   ├── .gitkeep -│   └── alexandra_logo.png -├── makefile -├── models -│   └── .gitkeep -├── notebooks -│   └── .gitkeep -├── poetry.toml -├── pyproject.toml -├── src -│   ├── scripts -│   │   ├── fix_dot_env_file.py -│   │   └── your_script.py -│   └── foqa -│   ├── __init__.py -│   └── your_module.py -└── tests - ├── __init__.py - └── test_dummy.py -``` +You can also run the `Dockerfile` directly, which builds the dataset without having to +set up a Python environment.