Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Overview

The OURRstore system, an NSF-funded project, is a very cost-effective way to archive data that needs to be reliably kept.  OURRstore uses LTO media, which has a bit error rate 10 times lower than spinning disk, with an expected lifetime of at least 8 years.  (The tapes theoretically can last longer if properly stored, but the hardware that reads the tapes might not.)  OURRstore increases the reliability and safety of storage by creating redundant copies of the data.  OURRstore procedures creates at least 2 copies of archived data, though we strongly recommend using the 3 copies option.  One copy stays in the OURRstore robot, and is the 'online' copy, retrievable at any time.  A second copy gets shipped back to UAMS, where we store it in an environmentally controlled, physically locked location, giving an offsite backup of our data stored in OURRstore.  In a pinch, we can recover data from the copy sent to us.  The optional (but highly recommended) third copy is taken out of the robot and stored in an environmentally controlled storage facility at the University of Oklahoma Health Sciences Campus.  This third copy allows them to generate replacements for the primary copy should that be necessary without the risk of shipping the backup copy kept at UAMS to Oklahoma.

OURRstore is the least expensive, most reliable archival storage option available to UAMS basic science researchers.

Costs to use OURRstore

The cost to use OURRstore is only the media costs (extremely low compared to other options) plus a small additional amount for shipping cartridges to and from Oklahoma.  All the equipment and management costs are covered by NSF and OU.  All a researcher has to do is make certain that their research project, their lab, their department, or someone has provided cartridges and return shipping materials to the OURRstore team.  The HPC Admins will assist researchers, labs, and departments in ordering the cartridges, to make certain that they get credited appropriately.

...

We cannot guarantee that your data is stored on the cartridges that you purchase.   Instead, the usable storage capacity of the cartridges would be added to your quota for you to draw from, and your cartridges would be put into the pool of cartridges that we use for everybody.  If you need tighter controls (i.e. your data and nothing else goes on the cartridges that you purchase), then you would need to used the "complicated" option for archiving data to OURRstore mentioned below (i.e., you would need to get your own OURRstore account and manage your own archiving, shipping and storage instead of using the pooled services that the HPC admins provide).

Restrictions on data that can be archived on OURRstore

Being an NSF project, there are certain stipulations on the kind of data that can be placed in OURRstore.  OURRstore is intended for NON-CLINICAL STEM RESEARCH DATA that is NOT LEGALLY REGULATED.  Non-STEM data is currently FORBIDDEN on OURRstore, because OURRstore was funded by the NSF, and non-STEM data is outside the NSF’s mandate. 

...

Warning
If you decide to use OURRstore for archiving your data, you must insure that your data complies with the above rules.


How to request storage on OURRstore

If you have a means for covering the costs and agree to the data restrictions of OURRstore, please send a request via e-mail to hpcadmin@uams.edu confirming that you agree to the terms, and requesting access.  We will then work with you in archiving your data.

How data is stored on OURRstore

Data stored on OURRstore should be collected into compressed archive files, preferably between 20 and 200 GB in length, for the best storage efficiency without excessive access times.  Currently, the absolute minimum size of an archived file is 1 GB.  The absolute maximum size is 1 TB.  These archive files need to be created at UAMS prior to electronic transfer to the OURRstore system.  The initial transfer is disk to disk, hence goes pretty quickly.  Once the data is in the OURRstore disk cache, the OURSStore archive management software will start copying the data onto a media cartridge for safekeeping.  When a cartridge is full, the system makes a copy of the cartridge, ejects it from the system, and the OURRstore team, using the prepaid label that we provide them, ships that copy back to us.  We then store that copy offline in a locked, environmentally controlled location in Arkansas.  If the optional third copy is requested, the OURRstore system makes that third copy, which is ejected from the system and stored in an environmentally controlled location in Oklahoma.

Using OURRstore

Archiving data to OURRstore

Because of the need to collect data into bundled, compressed archive files, we offer several options to assist users in archiving data.  In the simplest option, you just create a directory tree or bucket where you place the data to be archved, and the HPC admins take care of bundling up the data for OURRstore.  Or you can create the compressed archives yourself, and then ask the HPC Admins to transmit to OURRstore using the pooled account.  Power users who want complete control of the process are welcome to work directly with the OURRstore team to get trained in using OURRstore, get a private account, and  manage the archiving process themselves (not recommended, but possible).

Simple option for data on Grace (possibly the research NAS)

For the simple option, all you need to do is collect your data to be archived into a sub-directory tree with just the files to be archived.  Please move the sub-directory under a parent directory named "ToBeArchived" in your home directory.  Please name (or rename) the sub-directory tree to be archived with the current date in "yyyy-mm-dd" format, for example "/home/john/ToBeArchived/2021-08-07/".  The sub-sub-directory tree under the dated subdirectory can be organized any way you see fit.  Please use the "mv" command, not "cp" or "rsync" to collect data, since you eventually want the data to disappear from Grace once it is safely in OURRstore, and you don't want to run into a space crunch while organizing your archive directory.  (Remember, Grace's storage is only supposed to be a temporary holding place for running jobs.) 

...

This method can also be used for data on the research NAS, assuming that the HPC admins can get access to that storage area on the NAS.  For data on the NAS, just create the "ToBeArchive" directory somewhere in your storage area, and let the HPC admin team know the storage area and path to the directory when you submit the archive request.

Simple option for data on ROSS

For data on ROSS, the simplest way to archive that data is to collect the data to be archived into a bucket.  When ready for archiving, you could optionally alter the permissions on the bucket to read-only to minimize the chance of accidental changes, if you wish.  Then send a request via e-mail to hpcadmin@uams.edu giving them the name of the bucket to archived and the namespace that the bucket lives in.   

...

If you prefer to create your own compressed archive files ready for OURRstore you could simply create them and place them in a bucket.  In this case, when you e-mail hpcadmin@uams.edu the name of the bucket and namespace, let them know that you have already generated the compressed archive files in the bucket.  Or alternatively, you could used the "Slightly less simple option for data on Grace", temporarily placing copies of your archive files on Grace.

Slightly less simple option for data on Grace (possibly the research NAS)

If you would rather create the compressed tar files yourself, feel free to do so, and then collect the archive files into the top level of your "ToBeArchived" subdirectory in your home directory.  In this option, feel free to use any method of your choosing (e.g. tar, zip, or some custom format) that can collect the data you want archived into files.  The ideal choice should keep the archive files between 20 to 200 GB in length, though OURRstore will accept anything between 1 GB to 1 TB in length.  We encourage you to use compression and encryption for efficiency and safety, but that is your choice. If you do encrypt, please safeguard your encryption key, since no one but you likely knows it.  (The HPC admins would not know your encryption keys, for example.)  In this option you are responsible for creating your own manifests of the content of your archive files, if desired.  The names of the archive files must be globally unique.  In other words, do not name any 2 archive files with the same name.  They are all going into a single directory in OURRstore, so none of the names of any of the archive files that you create can clash with the name of any archive file that you previously created.  Otherwise you run the risk of losing the previous archive file (i.e. it could get overwritten).

...

This method can also be used for data on the research NAS, assuming that the HPC admins can get access to that storage area on the NAS.  For data on the NAS, just create the "ToBeArchive" directory somewhere in your storage area, and let the HPC admin team know the storage area and path to the directory when you send the archive request e-mail.

Complicated Option

To exercise the complicated option, you would need to approach the OURRstore team directly, sign agreements, and go through the mandatory training to get your own account on OURRstore.  You would then be responsible for following all of their rules, for purchasing your own media, for shipping to and from Oklahoma, for creating and tracking your own archive files, etc.  This option really is for the power user who wants full control over the process of archiving and retrieving data with only minimal or no assistance from the the HPC admin team.  This option is also appropriate for users whose data is on systems that the HPC Admins do not have access to.  While we do not encourage people to use this option due to the complications and responsibility of getting your own account on OURRstore, it is a possibility for those who prefer.  For more information, see OURRstore: OU & Regional Research Store

Retrieving Data from OURRstore

How to retrieve archived data depends on which of the above options you used to archive it.

Simple option for data archived from Grace (possibly the research NAS)

When the HPC Admins archived data for you, they left manifests of which files or objects are in what compressed tar file in your in your /home/archived directory.  You can search through those manifests (e.g. using grep) to find which archive file or files the data you are interested in is located.  Send a request via e-mail to hpcadmin@uams.edu with the name of the subdirectory tree that you would like retrieved and the name of the archive file or files it is in..  If you lost the manifests, don't fret.  The HPC Admins kept a backup copy and can help.   In the case of a lost manifest, still send a request via e-mail to hpcadmin@uams.edu with the name of the subdirectory tree that you would like retrieved and the approximate date that it was archived.  The HPC admins will do their best to find the archive file names in their manifest backup copies.  However, since yours is the primary copy, please do not lose it, as there is always a chance that the backup copy gets lost as well.

Once the HPC admins receive your e-mail, they will pull the pertinent archive files from OURRstore and restore the archived data to your /home/<username>/RestoredArchives directory.  In general, your data should be restored within 1 business/work day (i.e. things might not get restored on weekends and holidays).  The HPC admins will then notify you by e-mail that your data are restored.

Simple option for data archived from ROSS

Simply send a request via e-mail to hpcadmin@uams.edu with the name of the bucket that you want restored, and a name prefix of the objects that you want restored.  Leave the name prefix blank if you want the entire bucket restored.  You should also include the namespace where the bucket should be located.  

Once the HPC admins receive your request, they will restore the objects into the bucket in the namespace, and notify you when it is ready.

Slightly less simple option for data archived from Grace (possibly the research NAS)

In this option, since you created the archive files yourself, the HPC Admins did not create manifest files.  It is up to you to keep track of what data is in which archive file.  When you want to retrieve one of your archive files, send a request via e-mail to hpcadmin@uams.edu with the names of the archive files that you would like retrieved.  

...

Once you get notification that your archive files have been restored, you may then use whatever means you chose to pull data from those files.  Don't forget that you could have compressed and encrypted the archive files before you archived them.  Remember that the HPC Admins would not know the encryption key if you encrypted the files before archiving, and cannot help if you have lost it.  So do make certain when you created encrypted archive files that you safely store your encryption key.

Complicated Option

You are in complete control of your retrieval of archived data, since the archived data is on your OURRstore account.  The HPC admins are not involved.