Skip to content

Commit

Permalink
docs: Update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
saattrupdan committed Apr 19, 2024
1 parent 6191930 commit 115c0a0
Showing 1 changed file with 11 additions and 142 deletions.
153 changes: 11 additions & 142 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<a href="https://github.com/alexandrainst/foqa"><img src="https://github.com/alexandrainst/foqa/raw/main/gfx/alexandra_logo.png" width="239" height="175" align="right" /></a>
# foqa
# FoQA

Faroese question-answering dataset.
Faroese question-answering dataset, generated by GPT-4.

______________________________________________________________________
[![Code Coverage](https://img.shields.io/badge/Coverage-100%25-brightgreen.svg)](https://github.com/alexandrainst/foqa/tree/main/tests)
Expand All @@ -16,149 +16,18 @@ Developer(s):
- Dan Saattrup Nielsen ([email protected])


## Setup
## Quickstart

### Installation

1. Run `make install`, which sets up a virtual environment and all Python dependencies therein.
1. Run `make install`, which sets up a virtual environment and all Python dependencies
therein.
2. Run `source .venv/bin/activate` to activate the virtual environment.
3. Run `python src/scripts/create_dataset.py` to create the dataset.

### Adding and Removing Packages

To install new PyPI packages, run:
```
poetry add <package-name>
```

To remove them again, run:
```
poetry remove <package-name>
```

To show all installed packages, run:
```
poetry show
```


## A Word on Modules and Scripts
In the `src` directory there are two subdirectories, `foqa`
and `scripts`. This is a brief explanation of the differences between the two.

### Modules
All Python files in the `foqa` directory are _modules_
internal to the project package. Examples here could be a general data loading script,
a definition of a model, or a training function. Think of modules as all the building
blocks of a project.

When a module is importing functions/classes from other modules we use the _relative
import_ notation - here's an example:

```
from .other_module import some_function
```

### Scripts
Python files in the `scripts` folder are scripts, which are short code snippets that
are _external_ to the project package, and which is meant to actually run the code. As
such, _only_ scripts will be called from the terminal. An analogy here is that the
internal `numpy` code are all modules, but the Python code you write where you import
some `numpy` functions and actually run them, that a script.

When importing module functions/classes when you're in a script, you do it like you
would normally import from any other package:

```
from foqa import some_function
```

Note that this is also how we import functions/classes in tests, since each test Python
file is also a Python script, rather than a module.


## Features

### Docker Setup

A Dockerfile is included in the new repositories, which by default runs
`src/scripts/your_script.py`. You can build the Docker image and run the Docker
container by running `make docker`.

### Automatic Documentation

Run `make docs` to create the documentation in the `docs` folder, which is based on
your docstrings in your code. You can view this by running `make view-docs`.

### Automatic Test Coverage Calculation

Run `make test` to test your code, which also updates the "coverage badge" in the
README, showing you how much of your code base that is currently being tested.

### Continuous Integration

Github CI pipelines are included in the repo, running all the tests in the `tests`
directory, as well as building online documentation, if Github Pages has been enabled
for the repository (can be enabled on Github in the repository settings).

### Code Spaces
The raw dataset will be stored in `data/raw` and will be updated continuously during
creation, and the final dataset will appear in your `data/final`.

Code Spaces is a new feature on Github, that allows you to develop on a project
completely in the cloud, without having to do any local setup at all. This repo comes
included with a configuration file for running code spaces on Github. When hosted on
`alexandrainst/foqa` then simply press the `<> Code` button
and add a code space to get started, which will open a VSCode window directly in your
browser.

## Docker

## Project structure
```
.
├── .devcontainer
│   └── devcontainer.json
├── .github
│   └── workflows
│   ├── ci.yaml
│   └── docs.yaml
├── .gitignore
├── .pre-commit-config.yaml
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Dockerfile
├── LICENSE
├── README.md
├── config
│   ├── __init__.py
│   ├── config.yaml
│   └── hydra
│   └── job_logging
│   └── custom.yaml
├── data
│   ├── final
│   │   └── .gitkeep
│   ├── processed
│   │   └── .gitkeep
│   └── raw
│   └── .gitkeep
├── docs
│   └── .gitkeep
├── gfx
│   ├── .gitkeep
│   └── alexandra_logo.png
├── makefile
├── models
│   └── .gitkeep
├── notebooks
│   └── .gitkeep
├── poetry.toml
├── pyproject.toml
├── src
│   ├── scripts
│   │   ├── fix_dot_env_file.py
│   │   └── your_script.py
│   └── foqa
│   ├── __init__.py
│   └── your_module.py
└── tests
├── __init__.py
└── test_dummy.py
```
You can also run the `Dockerfile` directly, which builds the dataset without having to
set up a Python environment.

0 comments on commit 115c0a0

Please sign in to comment.