PROJECT Data Storage¶
The PROJECT data storage is a central storage for projects' and users' data on IT4Innovations. The PROJECT data storage is accessible from all IT4Innovations clusters and allows to share data across clusters. The storage is intended to be used throughout the whole project's lifecycle.
Technical Overview¶
The PROJECT storage consists of three equal file storages (blocks) called PROJ1, PROJ2, and PROJ3. Each file storage implements GPFS file system exported via NFS protocol using three NFS servers. File storages provide high-availability and redundancy.
Specification | Total | Per Block |
---|---|---|
Protocol | NFS over GPFS | |
Total capacity | 15PB | 5PB |
Throughput | 39GB/s | 13GB/s |
IO Performance | 57kIOPS | 19kIOPS |
Accessing PROJECT¶
All aspects of allocation, provisioning, accessing, and using the PROJECT storage are driven by project paradigm. Storage allocation and access to the storage are based on projects (i.e. computing resources allocations) and project membership.
A project directory (actually implemented as an independent fileset) is created for every active project. Default limits (quotas), default file permissions, and ACLs are set. The project directory life cycle strictly follows the project's life cycle. The project directory is removed after the project's data expiration.
POSIX File Access¶
Mountpoints
PROJECT file storages are accessible at mountpoints /mnt/proj1
, /mnt/proj2
, and /mnt/proj3
.
The PROJECT storage can be accessed via the following nodes:
Cluster | Node(s) |
---|---|
Karolina | Login, Compute, Visualization |
Barbora | Login, Compute, Visualization |
To show the path to your project's directory on the PROJECT storage, use the it4i-get-project-dir
command:
$ it4i-get-project-dir OPEN-XX-XX
/mnt/proj3/open-XX-XX
Project Quotas¶
The PROJECT storage enforces quotas on projects' usage (used capacity and allocated inodes). Default quotas for capacity and amount of inodes per project are set by IT4Innovations.
Project default quota | |
---|---|
Space quota | 20TB |
Inodes quota | 20 mil. |
You can check the actual usage of the PROJECT storage (e.g. location of project directory, used capacity, allocated inodes, etc.) by executing the it4ifsusage
command from the Login nodes' command line. The command lists all projects associated with the user.
[vop999@login1.barbora ~]$ it4ifsusage
Quota Type Cluster / PID File System Space used Space limit Entries used Entries limit Last update
------------- --------------- ------------- ------------ ------------- -------------- --------------- -------------------
User barbora /home 11.1 MB 25.0 GB 122 500,000 2021-08-24 07:50:09
User karolina /home 354.6 MB 25.0 GB 3,194 500,000 2021-08-24 08:20:08
User barbora /scratch 256.5 GB 10.0 TB 169 10,000,000 2021-08-24 07:50:19
User karolina /scratch 52.5 GB 100.0 TB 967 20,000,000 2021-08-24 08:20:18
Project open-XX-XX proj1 3.9 TB 20.0 TB 212,377 5,000,000 2021-08-24 08:20:02
Project open-YY-YY proj3 9.5 MB 20.0 TB 182 5,000,000 2021-08-24 08:20:02
Project open-ZZ-ZZ proj2 844.4 GB 20.0 TB 797 5,000,000 2021-08-24 08:20:02
The information can also be found in IT4Innovations' SCS information system.
Note
At this time, only PIs can see the quotas of their respective projects in IT4Innovations' SCS information system. We are working on making this information available to all users assigned to their projects.
Increasing Project Quotas¶
It is preferred that you request additional storage space allocation in advance in your application for computational resources. Alternatively, if the project is already active, contact IT4I support.
ACL and File Permissions¶
Access to a project directory and containing files is restricted by Unix file permissions and file access control lists (ACLs). Default file permissions and ACLs are set by IT4Innovations during project directory provisioning.
Backup and Safety¶
Data Backup
Data on the PROJECT storage is not backed up.
The PROJECT storage utilizes fully redundant design, redundant devices, highly available services, data redundancy, and snapshots. For increased data protection, disks in each disk array are connected in Distributed RAID6 with two hot-spare disks, meaning the disk array can recover full redundancy after two simultaneous disk failures.
However, the storage does not provide data backup, so we strongly recommend using the CESNET storage for making independent copies of your data.
Snapshots¶
The PROJECT storage provides snapshot functionality. A snapshot represents a state of a filesystem at a particular point in time. Snapshots are created for all projects on fileset (i.e. project directory) level. Snapshots are created every day, snapshots older than seven days are deleted.
Files in snapshots are accessible directly by users in the special subdirectory of each project directory named .snapshots
.
Snapshots are read-only.
Snapshots' names have the YYYY-MM-DD-hhmmss
format.
[vop999@login1.karolina ~]# ls -al /mnt/proj3/open-XX-XX/.snapshots
total 4
dr-xr-xr-x. 2 root root 4096 led 14 12:14 .
drwxrws---. 16 vop999 open-XX-XX 4096 led 20 16:36 ..
drwxrws---. 16 vop999 open-XX-XX 4096 led 20 16:36 2021-03-01-022441
drwxrws---. 16 vop999 open-XX-XX 4096 led 20 16:36 2021-03-02-022544
drwxrws---. 16 vop999 open-XX-XX 4096 led 20 16:36 2021-03-03-022949
drwxrws---. 16 vop999 open-XX-XX 4096 led 20 16:36 2021-03-04-023454
drwxrws---. 16 vop999 open-XX-XX 4096 led 20 16:36 2021-03-05-024152
drwxrws---. 16 vop999 open-XX-XX 4096 led 20 16:36 2021-03-06-020412
drwxrws---. 16 vop999 open-XX-XX 4096 led 20 16:36 2021-03-07-021446
Computing on PROJECT¶
I/O Intensive Jobs
Stage files for intensive I/O calculations onto the SCRATCH storage.
The PROJECT storage is not primarily intended for computing and it is strongly recommended to avoid using it directly for computing in majority of cases.
On the other hand, the PROJECT storage is accessible from compute nodes and can be used for computing jobs with low I/O demands, when copying data to other storage for computing is not feasible or efficient. However, be aware of overloading the storage, as this will result in degraded performance for other users of the PROJECT storage or its unavailability.
For maximum performance, you should always copy the files of I/O intensive jobs onto the SCRATCH storage. The files should be copied to SCRATCH from Login nodes before submitting the job.
Summary¶
PROJECT Storage | |
---|---|
Mountpoint | /mnt/proj{1,2,3} |
Capacity | 15PB |
Throughput | 39GB/s |
IO Performance | 57kIOPS |
Default project space quota | 20TB |
Default project inodes quota | 20 mil. |