Skip to content

Commit

Permalink
Merge pull request #319 from csc-training/update-allas
Browse files Browse the repository at this point in the history
shorten and simplify allas slides
  • Loading branch information
rkronberg authored Feb 6, 2024
2 parents ffefb82 + b99b3bb commit 7df149f
Showing 1 changed file with 27 additions and 70 deletions.
97 changes: 27 additions & 70 deletions _slides/07_allas.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,45 +18,23 @@ Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creati
</small>
</div>

# How to get access to Allas service

- Use [https://my.csc.fi](https://my.csc.fi) to
1. Create a CSC account (log in with Haka/Virtu)
- If Haka/Virtu is not an option, contact <[email protected]>
2. Set up a project at CSC (Principal Investigator)
3. Apply for Puhti and Allas services, quota and billing units for your project
4. Add other registered users to your project
5. Members have to register and accept the terms of use in [My CSC](https://my.csc.fi)
- All project members have equal access to the data in Allas and Puhti (`/scratch` and `/projappl` disks)

# The Allas object storage: what is it?

- Allas is a storage service for all CSC computing and cloud services
- CEPH-based object storage
- Allas is a CEPH-based object storage service for all CSC computing and cloud services
- Possible to upload data from personal laptops or organizational storage systems into Allas
- Meant for data storage during project lifetime
- Default quota is 10 TB per project
- All project members have equal access to the data in Allas
- Default quota is 10 TB per project
- Clients available on Puhti and Mahti
- See Docs CSC for instructions on [accessing Allas from LUMI](https://docs.csc.fi/data/Allas/allas_lumi/)

# Connections to Allas

<div class="column">
- Data can be moved to and from Allas directly without using Puhti or Mahti
- Usage through S3 and Swift APIs are supported
- Data can be shared publicly to the Internet, which is otherwise not easily possible at CSC
</div>
<div class="column">
![](img/allas.png "Allas"){width=90%}
</div>

# The Allas object storage: what it is NOT

- **Allas is not a file system** (even though many tools try to fool you to think so)
- It is just a place for static data objects
- **Allas is not a data management environment**
- Tools for search, metadata, version control and access management are minimal
- **Allas is not a back up service**
- **Allas is not a proper backup service**
- Project members can delete all the data with just one command

# Storing files in Allas
Expand All @@ -67,6 +45,7 @@ Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creati
- Data cannot be modified in the object storage
- For computation, the data has to be typically copied to a file system on some computer
- Some data management features are built on top of Allas
- Data can be shared publicly to the Internet, which is otherwise not easily possible at CSC

# Allas buckets

Expand All @@ -89,15 +68,10 @@ Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creati

- S3 (used by `s3cmd`, `rclone`, `a-tools`)
- Swift (used by `swift`, `rclone`, `a-tools`, `cyberduck`)
- Authentication is different
- S3: permanent key-based authentication -- nice, easy and unsecure
- Swift: authentication based on temporary tokens -- more secure, requires authentication every 8 hours
- File handling is different
- Metadata is handled in different ways
- Files larger than 5 GB are managed in different ways
- **Avoid cross-using Swift and S3-based objects!**
- Authentication and file handling is different for the protocols
- **Avoid cross-using Swift and S3-based objects!**

# Allas Clients
# Allas clients

- **Puhti, Mahti, Linux servers, Mac:**
- `rclone`, `swift`, `s3cmd`, `a-tools`
Expand All @@ -109,61 +83,44 @@ Unported License, [http://creativecommons.org/licenses/by-sa/4.0/](http://creati
# Allas -- first steps

- Use [My CSC](https://my.csc.fi) to apply for Allas access for your project -- Allas is not automatically available
- In Puhti/Mahti, setup connection to Allas with the commands:

```bash
module load allas
allas-conf
```

- Study the manual and [start using Allas with `rclone` or `a-tools`](https://docs.csc.fi/data/Allas/)
- This [course](https://csc-training.github.io/csc-env-eff/#7-allas-and-where-to-keep-your-data) includes also hands-on tutorials and a tutorial video about Allas
- In Puhti/Mahti, setup connection to Allas using the commands:
```bash
module load allas
allas-conf
```
- [Study the manual and start using Allas with `rclone` or `a-tools`](https://docs.csc.fi/data/Allas/)
- [This course](https://csc-training.github.io/csc-env-eff/part-1/allas/) includes also hands-on tutorials and a tutorial video about Allas

# Allas -- `rclone`

- Straightforward power-user tool with a wide range of features
- Fast and efficient
- Available for Linux, Mac and Windows
- Overwrites and removes data without asking!
- The default configuration at CSC uses `swift`-protocol, but S3 can also be used
- Use with care: [`rclone` instructions at Docs CSC](https://docs.csc.fi/data/Allas/using_allas/rclone/)
- **Overwrites and removes data without asking!**
- Use with care: [`rclone` instructions at Docs CSC](https://docs.csc.fi/data/Allas/using_allas/rclone/)

# Allas -- `a-tools`

- `rclone`-based scripts for using Allas in Puhti and Mahti
- `a-tools` provide an easy and safe way to use Allas for occasional Allas users
- Default bucket names are based on directories on Puhti/Mahti
- Unlike `rclone`, `a-tools` does not overwrite or remove data without asking!
- Developed for the CSC supercomputers, but you can install the tools in other Linux and Mac machines as well
- Automatic packing (compression can be enabled as well if needed)
- [a-commands instructions at Docs CSC](https://docs.csc.fi/data/Allas/using_allas/a_commands/)

# `a-put`/`a-get`: pros and cons

<div class="column">
➕ Saving data as a tar package preserves time stamps, access settings, and internal links of the directory
➕ Optional `zstdmt` compression reduces size
➕ The default bucket name and the metadata reflect the directory structure on Puhti/Mahti
➕ Checks to prevent overwriting data accidentally
</div>
<div class="column">
➖ Usage of objects created by `a-put` can be complicated when other object storage tools are used
➖ Usage from Windows is problematic
➖ Each object has an additional `_ameta` object
</div>


# Issues with Allas

- 8-hour connection limit with `swift`
- No way to check quota
- Moving data inside Allas is not possible (`swift`)
- No way to freeze data
- Use two projects if you need to prevent others from editing your data
- Use two projects if you need to prevent others from editing your data
- Different interfaces may work in different ways

# Questions that users should consider

- Should I store each file as a separate object or should I collect them into bigger chunks?
- Should I store each file as a separate object, or should I collect them into bigger chunks?
- In general: consider how you use the data
- Should I use compression?
- Who can use the data: projects and access rights?
Expand All @@ -176,16 +133,16 @@ allas-conf
- Suitable for all static digital research material and related metadata
- Free of charge for users in Finnish higher education institutions and research institutes
- **[IDA](https://ida.fairdata.fi):** storage for research data
- **[Qvain](https://qvain.fairdata.fi/):** Describe you dataset and get a persistent indentifier for it
- **[Qvain](https://qvain.fairdata.fi/):** Describe your dataset and get a persistent indentifier for it
- **[Etsin](https://etsin.fairdata.fi/):** Discover datasets based on metadata

# Sensitive data services

- [CSC Sensitive Data Services](https://docs.csc.fi/data/sensitive-data/) for processing sensitive data
- **SD Desktop** [https://sd-desktop.csc.fi](https://sd-desktop.csc.fi) is a secure virtual desktop
- Controlled access
- Data importing _only_ through the [**SD Connect**](https://sd-connect.csc.fi) service
- Isolation from the Internet
- No direct data export
- [**SD Desktop**](https://sd-desktop.csc.fi) is a secure virtual desktop
- Controlled access
- Data importing _only_ through the [**SD Connect**](https://sd-connect.csc.fi) service
- Isolation from the Internet
- No direct data export
- Allas could be used for sensitive data, but _only_ if the data is properly encrypted
- The [**SD Connect**](https://sd-connect.csc.fi) procedure does the encryption
- The [**SD Connect**](https://sd-connect.csc.fi) procedure does the encryption

0 comments on commit 7df149f

Please sign in to comment.