PROJ4 Object Storage¶

OpenStack Swift is a highly scalable, distributed object storage system designed to store and retrieve large amounts of unstructured data. It is an open-source project that provides a simple, scalable, and durable storage system for applications and services. Swift is built to be highly available, fault-tolerant, and scalable, making it an ideal choice for storing large amounts of data.

Swift is designed to be highly modular, with a simple API that allows developers to easily integrate it into their applications. It provides a RESTful API that can be accessed using a variety of programming languages, making it easy to integrate with existing applications.

One of the key features of Swift is its ability to scale horizontally, allowing it to handle large amounts of data and high levels of traffic. It is also designed to be highly durable, with data being replicated across multiple nodes to ensure that it is always available.

Overall, OpenStack Swift is a powerful and flexible object storage system that is well-suited for a wide range of applications and use cases.

Accessing PROJ4¶

PROJ4 is accessible from all IT4Innovations clusters' login nodes as well as from the outside. Additionally, it allows to share data across clusters, etc.

User has to be part of a project, which is allowed to use S3 storage. If you haven't received your S3 credentials after your project was created, please send a request to support@it4i.cz asking for the "S3 PROJECT ACCESS", stating your IT4I login and name of your project (in the OPEN-XX-YY format or similar). After that an active role on the S3 storage will be created and you will obtain via na email the credentials for using the S3 storage.

How to Configure S3 Client¶

$ sudo yum install s3cmd -y ## for debian based systems use apt-get

$ s3cmd --configure
.
.
.
.
  Access Key: ***your_access***
  Secret Key: ***your_secret_key***
  Default Region: US
  S3 Endpoint: obj.proj4.it4i.cz
  DNS-style bucket+hostname:port template for accessing a bucket: obj.proj4.it4i.cz
  Encryption password: RANDOM
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: True
  HTTP Proxy server name:
  HTTP Proxy server port: 0
.
.
.

Configuration saved to '/home/IT4USER/.s3cfg'
.
.

Please note, that the Encryption password should be defined by you instead of using the value "RANDOM".

Now you have to make some bucket for your data with your_policy, referencing your project (e.g. OPEN-XX-YY - if OPEN-XX-YY is your active and eligible project). If you make a bucket without policy, we will not able to manage your project's data expiration and you might loose the data before the end of your actuall project - so please use the policy.

~ s3cmd --add-header=X-Storage-Policy:OPEN-XX-YY mb s3://test-bucket

~ $ s3cmd put test.sh s3://test-bucket/
upload: 'test.sh' -> 's3://test-bucket/test.sh'  [1 of 1]
1239 of 1239   100% in    0s    19.59 kB/s  done

~ $ s3cmd ls
2023-10-17 13:00  s3://test-bucket

~ $ s3cmd ls s3://test-bucket
2023-10-17 13:09      1239   s3://test-bucket/test.sh

There is no possibility to set permissions for all members of a project, so you have to set permissions for each user in a project. Permission can be set only by the owner of the bucket.

~ s3cmd setacl s3://test-bucket/test1.log --acl-grant=full\_control:user1
    s3://test-bucket/test1.log: ACL updated

~ s3cmd setacl s3://test-bucket/test1.log --acl-grant=full\_control:user2
    s3://test-bucket/test1.log: ACL updated

~ s3cmd setacl s3://test-bucket/test1.log --acl-grant=read:user3
    s3://test-bucket/test1.log: ACL updated

~ s3cmd setacl s3://test-bucket/test1.log --acl-grant=write:user4
    s3://test-bucket/test1.log: ACL updated

~ s3cmd info s3://test-bucket/test1.log
    s3://test-bucket/test1.log (object):
       File size: 1024000000
       Last mod:  Mon, 09 Oct 2023 08:06:12 GMT
       MIME type: application/xml
       Storage:   STANDARD
       MD5 sum:   b5c667a723a10a3485a33263c4c2b978
       SSE:       none
       Policy:    none
       CORS:      none
       ACL:       OBJtest:user2: FULL\_CONTROL
       ACL:       \*anon\*: READ
       ACL:       user1: FULL\_CONTROL
       ACL:       user2: FULL\_CONTROL
       ACL:       user3: READ
       ACL:       user4: WRITE
       URL:       [http://195.113.250.1:8080/test-bucket/test1.log](http://195.113.250.1:8080/test-bucket/test1.log)
       x-amz-meta-s3cmd-attrs: atime:1696588450/ctime:1696588452/gid:1001/gname:******/md5:******/mode:33204/mtime:1696588452/uid:******/uname:******

Access to Multiple Projects¶

If a user needs to access multiple projects' data, it is needed to repeat the step asking the IT4I support for new credentials for the additional projects. In case you don't have the credentials assigned with the project activation, please send a request to support@it4i.cz.

As the first step, rename your current S3 configuration, so that it uniquely identifies your current project or organize it on your local storage accordingly.

$ mv /home/IT4USER/.s3cfg /home/IT4USER/.s3cfg-OPEN-XX-YY

Then create new S3 configuration for the additional project (e.g. OPEN-AA-BB).

$ s3cmd --configure

Rename or organize you newly created config.

$ mv /home/IT4USER/.s3cfg /home/IT4USER/.s3cfg-OPEN-AA-BB

When acccessing the data of the different project specify the right configuration using the S3 commands.

~ s3cmd -c /home/IT4USER/.s3cfg-OPEN-AA-BB --add-header=X-Storage-Policy:OPEN-AA-BB mb s3://test-bucket

~ $ s3cmd -c /home/IT4USER/.s3cfg-OPEN-AA-BB put test.sh s3://test-bucket/
upload: 'test.sh' -> 's3://test-bucket/test.sh'  [1 of 1]
1239 of 1239   100% in    0s    19.59 kB/s  done

~ $ s3cmd -c /home/IT4USER/.s3cfg-OPEN-AA-BB ls
2023-10-17 13:00  s3://test-bucket

~ $ s3cmd -c /home/IT4USER/.s3cfg-OPEN-AA-BB ls s3://test-bucket
2023-10-17 13:09      1239   s3://test-bucket/test.sh

Bugs & Features¶

By default, the S3CMD client uses the so-called "multipart upload", which means that it splits the uploaded file into "chunks" with a default size of 15 MB. However, this upload method has major implications for the data capacity of the filesystem/fileset when overwriting existing files. When overwriting an existing file in a "multipart" mode, the capacity is duplicated (the file is not overwritten, but rewritten and the original file remains - but the capacity is allocated by both files). This is a described swift bug for which there is no fix yet. But there is a workaround and that is to disable "multipart upload" on the S3CMD client side.

~ s3cmd --disable-multipart put /install/test1.log s3://test-bucket1
upload: '/install/test1.log' -> 's3://test-bucket1/test1.log'  [1 of 1]
 1024000000 of 1024000000   100% in    9s    99.90 MB/s  done

This method is not recommended for large files, because it is not as fast and reliable as multipart upload, but it is the only way how to overwrite files without duplicating capacity.