Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The server-side encryption can either be turned on at the namespace level, where all buckets in a namespace are required to be encrypted, or on the bucket level for namespaces that do not have 'encryption required' turned on.  If you want namespace level encryption, please inform the HPC admins when requesting a namespace.  The encryption choice must be made at namespace or bucket creation time, and cannot be changed afterwards.  (One can copy data from an unencrypted bucket to an encrypted one, then destroy the unencrypted bucket, or visa versa, should a change be needed after bucket creation.)

...

Preparing to

...

Use ROSS 

Info
Before requesting access to ROSS, in addition to reading this article, please also read the Access to the Research Object Store System (ROSS) article on this wiki, since it describes key concepts needed to understand how storage is laid out and accessed. 

To initiate access to ROSS for a new personal or group (e.g. project, lab, department) namespace, please send a request via e-mail to HPCAdmin@uams.edu.  In your request,

  • Please indicate whether this is for a personal namespace (e.g. primary storage for processing data on Grace), or for a group (shared storage). 
    • For a personal namespace, please indicate
      • your name
      • your e-mail
      • your departmental affiliation
      • what your login name (not your password) and domain you will use to access ROSS's administrative interface, e.g. johndoe@hpc.uams.edu (for personal namespaces we prefer that you use your HPC username,)
    • For a group namespace, please give
      • a name and brief description for the group
      • the primary e-mail contact for the group
      • the departmental affiliation of the group
      • who will be the namespace administrators - we need
        • their names
        • their e-mail address
        • their login name (not their password) and domain e.g. janedoe@ad.uams.edu (for group namespaces we generally prefer campus Active Directory usernames (e.g. the name the namespace administrator might use to login to Outlook Mail))
        • you may ask for more than one namespace administrator
        • if all the members of a particular campus AD group should be namespace administrators, you could also just give us the name and domain of the group instead of their individual names
  • Please estimate approximately how much storage you or your group intend to use in ROSS for the requested namepace, divided into local and replicated amounts.  The HPC Admins will use this information in setting the initial quotas and for capacity planning.  You will be allowed to request increases in quota if needed and space is available. 
  • We would also appreciate a brief description of what you will be using the storage for.  The "what it is used for" assists us in drumming up support (and possibly dollars) for expanding the system. 

A particular username@domain can only be a namespace administrator for one namespace.  If a particular person needs to be the namespace administrator in more than one namespace, for example a personal namespace as well as a group namespace, they must use different login names and domains.  This is why we suggest using your HPC credentials for personal namespaces, and other credentials for group namespaces.

Don't forget that a namespace administrator is not the same as an object user.  See Access to the Research Object Store System (ROSS) for the difference between a namespace administrator and an object user.  An object user has a different API-specific set of credentials.

If you wish to access an existing group namespace as an object user, please contact the namespace administrator for that namespace and ask to be added as an object user for that namespace.

Using ROSS

In experiments we have notices that write times to ROSS are considerably slower than read times, and slower than many POSIX file systems.  However, read times are significantly faster than write times on ROSS.  In other words, it takes longer to store new data into ROSS than to pull existing data out of ROSS.  We also notice (as is typically of most file systems) that transfers of large objects can go significantly faster than tiny objects.  Please keep these facts in mind when planning your use of ROSS - ROSS favors reads over writes and big things over little things.

In theory, a bucket can hold millions of files.  We have noticed, though, that with path prefix searching the more objects that have a particular prefix, the longer the search takes.  While this is not dissimilar to POSIX filesystems, where the more files there are in a directory. the longer it takes to do a directory lookup, on an object store, being a flat namespace (no directory hierarchy), such searches can be much slower.  The total number of objects in a bucket also has a mild, though not dramatic, impact on the look up speed particular objects in a bucket.  

...

If you are are good with the data restrictions and prepared to cover costs that might be incurred for using ROSS, there are steps that must be  completed before you can actually move data between ROSS and Grace.  First, you must request or be assigned to a namespace.  Second, decide which APIs you might use. Third, your namespace administrator using the ECS Portal may wish to pre-create the buckets that you might use.  Creating buckets in the ECS Portal can be a more convenient than creating them using APIs or tools.  Finally, get access credentials for the APIs that you might use.


Requesting a namespace assignment on ROSS

All namespaces have namespace administrators who manage buckets and object users within that namespace.  A particular username@domain can only be a namespace administrator for one namespace.  If a particular person needs to be the namespace administrator in more than one namespace, for example a personal namespace as well as a group namespace, they must use different login names and domains.  This is why we suggest using your HPC credentials for personal namespaces, and other credentials (e.g. UAMS AD) for group namespaces.

Don't forget that a namespace administrator is not the same as an object user.  See Access to the Research Object Store System (ROSS) for the difference between a namespace administrator and an object user.  An object user has a different API-specific set of credentials.

The steps to gain access to a namespace differ, depending on whether the namespace already exists or not. 

New namespaces

To initiate access to ROSS for a new personal or group (e.g. project, lab, department) namespace, please send a request via e-mail to HPCAdmin@uams.edu.  In your request,

  • Please indicate whether this is for a personal namespace (e.g. primary storage for processing data on Grace), or for a group (shared storage). 
    • For a personal namespace, please indicate
      • your name
      • your e-mail
      • your departmental affiliation
      • what your login name (not your password) and domain you will use to access ROSS's administrative interface, e.g. johndoe@hpc.uams.edu (for personal namespaces we prefer that you use your HPC username)
    • For a group namespace, please give
      • a name and brief description for the group
      • the primary e-mail contact for the group
      • the departmental affiliation of the group
      • who will be the namespace administrators - we need
        • their names
        • their e-mail address
        • their login name (not their password) and domain e.g. janedoe@ad.uams.edu (for group namespaces we generally prefer campus Active Directory usernames (e.g. the name the namespace administrator might use to login to Outlook Mail))
        • you may ask for more than one namespace administrator
        • if all the members of a particular campus AD group should be namespace administrators, you could also just give us the name and domain of the group instead of their individual names
  • Please estimate approximately how much storage you or your group intend to use in ROSS for the requested namepace, divided into local and replicated amounts.  Remember that replicated data costs twice as much as non-replicated data.  The HPC Admins will use this information in setting the initial quotas and for capacity planning.  You will be allowed to request increases in quota if needed and space is available. 
  • We would also appreciate a brief description of what you will be using the storage for.  The "what it is used for" assists us in drumming up support (and possibly dollars) for expanding the system. 

Existing namespace

If you wish to access an existing group namespace as an object user, please contact the namespace administrator of that namespace and ask to be added as an object user for that namespace.  If you need assistance determining what namespaces are available and who is the namespace administrator feel free to contact the HPC Admins via e-mail (hpcadmin@uams.edu).

If you supervised an employee or student who is no longer with you, for example due to a change of assignment or leaving the university, then you are responsible for the final disposition of the storage used by that person.  In this case you may request access to their personal namespace by sending that request to the HPC Admins via e-mail (hpcadmin@uams.edu).  If the person is still with UAMS, the HPC Admins will coordinate with the two of you, in case the person wishes keep their namespace, including portions of the existing data.

APIs supported by ROSS

ROSS supports multiple object protocol APIs.  The most common is S3, followed by Swift (from the Open Stack family).  Although ROSS can support the NFS protocol, our experiments show that the native NFS support is slower for many use cases than other options that simply emulate a POSIX file system using object protocols.  But NFS is available for those who would like to use it.  ROSS can also support HDFS (used in Hadoop clusters) and a couple of Dell/EMC proprietary protocols (ATMOS and CAS); however, we have not tested any of these other protocols, hence the HPC Admins could only offer limited support for these other options. 

...

The file-based protocols NFS and HDFS must be enabled on a bucket at creation time, if you intend to use them, since the underlying system adds additional metadata when file access is enabled.  When enabled, NFS and HDFS, too, can be used interchangeably with the S3 and Swift object protocols on the same bucket.  However, a bucket with file access enabled loses some of the object features, such as life cycle management.  So consider carefully before enabling them.  (You can still access a bucket using file system semantics with some external tools even when the bucket is not enabled for file access.)

How do I get access credentials?

In order to use either S3 or Swift, you need credentials. 

The credentials are tied to a particular object user, who belongs in a particular namespace.  For personal namespaces, the HPC Admins already added the owner as an object user as well as the namespace administrator.  For group namespaces, the namespace administrators may add object users to the namespace using the ECS Portal following the instructions in the "Add an Object User" section of the ECS Administration Guide.  Note that the object user names are not necessarily tied to any domain, though sometimes people add a domain-like suffix to differentiate object user names.  Although an object user name is only tied to one namespace, they need to be unique throughout all namespaces within ROSS.  A domain-like suffix linked to the namespace (e.g. "@<namespace-name>") can help insure uniqueness.

The namespace administrator can retrieve an object user's credentials by logging into the ECS Portal at https://ross.hcp.uams.edu using the credentials that are tied to the namepace for the namespace administrator.  Remember, the owner of a personal namespace is their own namespace administrator, and typically logs in with their HPC credentials in the form of <user>@hpc.uams.edu.  Once in the portal, click on "Manage" in the left side menu, then click on "Users".  Make certain that the "Object Users" tab is highlighted.  The type into the "Search" box the object user name for whom you wish to retrieve credentials and click the search icon (a magnifying glass).  Find the line with the correct object user's name, and the correct namespace name.  In the "Actions" column click the "Edit" button associated with the name.  This should bring you to the object user's profile.  

For S3 credentials, click the "Show Secret Key" checkbox.  This should allow you to see, as well as copy to the clipboard, the S3 Secret Access Key.  The object user's name, visible at the top of the profile page, serves as the S3 Access Key ID.

For Swift credentials, Note the "Swift Group" and "Swift Password" fields.  If this is a new object users, you may need to fill in the group and password for Swift.  Note that we do not currently run a keystone server, hence only Swift v1 or v2 credentials work.

How do I create a bucket?

There are three ways to create buckets (aka containers in the Swift world):

  1. Many object tools have options to create and manipulate buckets, which is quite convenient, generally portable (i.e. would work on any object storage system), but are also limited.  In general, tools cannot create buckets with ECS options, such as encryption, replication and file access.  You have to use one of the other bucket creation options.  Object tools only need the object user's access credentials.  Please see the tools' documentation for details.
  2. The RESTful APIs, including S3, Swift, and an ECS-specific management API, can be used to create and manipulate buckets.  With the appropriate headers and parameters, ECS options can be enabled.  More information can be had in the ECS Data Access Guide and the ECS API Reference (hint - use the search function). 
    1. The protocol-specific S3 or Swift REST APIs need an object user's credentials, and the namespace administrator must have given appropriate permissions to the object user.
    2. The ECS-specific management APIs (which we do not support users in using) require namespace administrator credentials.
  3. Use the ECS Portal.  This is the simplest option that gives full control over bucket characteristics.  The ECS Portal can only be accessed by the namespace administrator.  (That would be you, for your personal namespace.)  The namespace administrator logs into the ECS Portal (https://ross.hpc.uams.edu) with the credentials tied to that namespace.  Details on how to use the ECS Portal to manage buckets can be found in the Buckets chapter of the ECS Administration Guide.

What tools do you recommend for accessing ROSS

The bottom line - any tool that uses any of the supported protocols theoretically could work with ROSS.  There are a number of free and commercial tools that work nicely with object stores like ROSS.  Like most things in life, there are pros and cons to different tools.  Here is some guidance as to what we look for in tools.

Being inherently parallel accessible (remember the 23+ storage nodes), the best performance is gain when operations proceed in parallel.  Keep that in mind if building your own scripts (e.g. using ECS's variant of s3curl) or when comparing tools.  More parallelism (i.e. multiple transfers happening simultaneously across multiple storage nodes) generally yields better performance, up to a limit.  Eventually the parallel transfers become limited by other factors such as memory or network bandwidth constraints.  Generally there is a 'sweet spot' in the number of parallel transfer threads that can run before stepping on each other's toes. 

The S3 protocol includes support for multipart upload of objects, where a large object upload can be broken into multiple pieces and transferred in parallel.  Tools that support multipart upload likely will have better performance than tools that don't.  In addition, tools that support moving entire directory trees and that work in parallel will have better performance than tools that move files one object at a time.

After evaluating several tools, the HPC admins settled on 2 tools, ecs-sync and rclone, as 'best of breed' for moving data between ROSS and Grace's cluster storage system, where the /home, /scratch, and /storage directory live.  The ecs-sync program is the more efficient and the fastest of the two for bulk data moves.  It consumes fewer compute resources and less memory than rclone.  Yet when properly tweaked for number of threads (i.e. when the sweet spot is found) it moves data significantly faster than rclone.  The rclone program has more features than ecs-sync, including ways to browse data in ROSS, to mount a ROSS bucket as if it were a POSIX file system, and to synchronize content using a familiar rsync-like command syntax.  While ecs-sync is great for fast, bulk moves, rclone works very well for nuanced access to ROSS and small transfers.

ecs-sync

The ecs-sync program is specifically designed for moving data from one storage technology to another.  It comes from the Dell/EMC support labs, and is what EMC support engineers use for migrating data.  Due to security issues, we do not offer the job-oriented service mode described in the ecs-sync documentation.  (If the ecs-sync maintainers ever fixed the security holes, we could reconsider.)  Instead we only support running ecs-sync in what its documentation calls "Alternate (legacy) CLI execution", where a user runs ecs-sync as a command, instead of queuing up a job.  This allows the command to be run in the context of the user's login, honoring permissions.  One side effect is that the copied data could end up being owned by the user running the command, regardless of who owned the source data, which might not be desirable for all cases.

A command alias exists on Grace's login and compute nodes for running ecs-sync.  The alias actually runs the command on a data transfer node, so does not bog down the node on which the command is issued.  In other words, feel free to use ecs-sync from the login node, from the command prompt in Grace's Open OnDemand portal, or even from a compute job, if needed, though is it somewhat wasteful of resources to run ecs-sync from a compute job. 

The syntax for calling ecs-sync interactively on Grace is:

Code Block
titleInteractive ecs-sync command
ecs-sync --xml-config <config-file>.xml

The <config-file>.xml are the set of instructions of what is to move where in how many threads using which storage nodes.  Of course, replace <config-file> with a name of your choosing.  We will describe the XML config contents content later.

In running interactively, you see all the messages coming back from ecs-sync as it does its job, giving instant feedback.  But if the shell dies, the command may stop.  To avoid this, you can use nohup to run ecs-sync as a background job that continues when you log out and redirect standard out/err into a file, for example by using the following syntax:

Code Block
titleRunning ecs-sync in the background
nohup ecs-sync --xml-config <config-file>.xml > <log-file>.log &

Again, replace <config-file> and <log-file> with any name you wish.

The most complicated task in using ecs-sync is setting up the xml config file.  The config file consists of three parts: 

  • the global settings define various parameters of how the sync should be run, such as buffer sizes, number of retries, number of parallel threads, etc. that are not specific to either the source or the target
  • the source settings define where the data to be copied is coming from, including the type of storage, the location of the data, and access information
  • the target settings defing where the data being copied should be stored, including the type of storage, the destination bucket or directory, and other access related information

The basic layout for the xml config file is as follows:

Code Block
languagexml
titleOverview of config file
<syncConfig xmlns="http://www.emc.com/ecs/sync/model">  
    <options>
        <!--- ... the global settings ... --->
    </options>

    <source>
        <filesystemConfig>
            <!--- ... file copy settings ... --->
        </filesystemConfig>
    </source>

    <target>
        <ecsS3Config>
            <!--- ... ecs s3 copy settings ... --->
        </ecsS3Config>         
    </target>
</syncConfig>

Note that the contents of <source> and <target> are interchangeable, and the order of the blocks doesn't matter.  So the direction of the copy can be switched by simply swapping the <source> and <target> keywords.

Although ecs-sync knows abaoaut other types of storage, since this guide is for moving data between Grace's storage and ROSS, we will only describe the filesystem (for Grace's storage) and ecsS3 (for ROSS's storage) types, since those are the 2 needed.  If you are interested in other storage types, please consult the ecs-sync documentation.

The easiest way to create an xml config file is to copy an exiting template, and just change the pertinent values.  To help guide, here are examples of the 3 sections with descriptions of the parameters

Global Options

This example shows available global parameter options.  These are typical values, and may be changed.  The important options are deleteSource, fourceSync, recursive, retryAttempts, threadcount, verify and verifyOnly,

Creating a bucket

There are three ways to create buckets (aka containers in the Swift world):

  1. Many object tools have options to create and manipulate buckets, which is quite convenient, generally portable (i.e. would work on any object storage system), but are also limited.  In general, tools cannot create buckets with ECS options, such as encryption, replication and file access.  You have to use one of the other bucket creation options.  Object tools only need the object user's access credentials.  Please see the tools' documentation for details.
  2. The RESTful APIs, including S3, Swift, and an ECS-specific management API, can be used to create and manipulate buckets.  With the appropriate headers and parameters, ECS options can be enabled.  More information can be had in the ECS Data Access Guide and the ECS API Reference (hint - use the search function). 
    1. The protocol-specific S3 or Swift REST APIs need an object user's credentials, and the namespace administrator must have given appropriate permissions to the object user.
    2. The ECS-specific management APIs (which we do not support users in using) require namespace administrator credentials.
  3. Use the ECS Portal.  This is the simplest option that gives full control over bucket characteristics.  The ECS Portal can only be accessed by the namespace administrator.  (That would be you, for your personal namespace.)  The namespace administrator logs into the ECS Portal (https://ross.hpc.uams.edu) with the credentials tied to that namespace.  Details on how to use the ECS Portal to manage buckets can be found in the Buckets chapter of the ECS Administration Guide.

Getting access credentials

In order to use either S3 or Swift, you need credentials. 

The credentials are tied to a particular object user, who belongs in a particular namespace.  For personal namespaces, the HPC Admins already added the owner as an object user as well as the namespace administrator.  For group namespaces, the namespace administrators may add object users to the namespace using the ECS Portal following the instructions in the "Add an Object User" section of the ECS Administration Guide.  Note that the object user names are not necessarily tied to any domain, though sometimes people add a domain-like suffix to differentiate object user names.  Although an object user name is only tied to one namespace, they need to be unique throughout all namespaces within ROSS.  A domain-like suffix linked to the namespace (e.g. "@<namespace-name>") can help insure uniqueness.

The namespace administrator can retrieve an object user's credentials by logging into the ECS Portal at https://ross.hcp.uams.edu using the credentials that are tied to the namepace for the namespace administrator.  Remember, the owner of a personal namespace is their own namespace administrator, and typically logs in with their HPC credentials in the form of <user>@hpc.uams.edu.  Once in the portal, click on "Manage" in the left side menu, then click on "Users".  Make certain that the "Object Users" tab is highlighted.  The type into the "Search" box the object user name for whom you wish to retrieve credentials and click the search icon (a magnifying glass).  Find the line with the correct object user's name, and the correct namespace name.  In the "Actions" column click the "Edit" button associated with the name.  This should bring you to the object user's profile.  

For S3 credentials, click the "Show Secret Key" checkbox.  This should allow you to see, as well as copy to the clipboard, the S3 Secret Access Key.  The object user's name, visible at the top of the profile page, serves as the S3 Access Key ID.

For Swift credentials, Note the "Swift Group" and "Swift Password" fields.  If this is a new object users, you may need to fill in the group and password for Swift.  Note that we do not currently run a keystone server, hence only Swift v1 or v2 credentials work.

Using ROSS

After getting access credentials for your chosen APIs, you are ready to use ROSS.  

In experiments we have notices that write times to ROSS are considerably slower than read times, and slower than many POSIX file systems.  However, read times are significantly faster than write times on ROSS.  In other words, it takes longer to store new data into ROSS than to pull existing data out of ROSS.  We also notice (as is typically of most file systems) that transfers of large objects can go significantly faster than tiny objects.  Please keep these facts in mind when planning your use of ROSS - ROSS favors reads over writes and big things over little things.

In theory, a bucket can hold millions of files.  We have noticed, though, that with path prefix searching the more objects that have a particular prefix, the longer the search takes.  While this is not dissimilar to POSIX filesystems, where the more files there are in a directory. the longer it takes to do a directory lookup, on an object store, being a flat namespace (no directory hierarchy), such searches can be much slower than the typical POSIX file system.  The total number of objects in a bucket also has a mild, though not dramatic, impact on the look up speed particular objects in a bucket.  Again, keep this in mind when planning your use of ROSS.  One way to get around this speed penalty is to use an external database (e.g. sqlite) to track in which buckets and objects your data resides

Tools recommendations for accessing ROSS

The bottom line - any tool that uses any of the supported protocols theoretically could work with ROSS.  There are a number of free and commercial tools that work nicely with object stores like ROSS.  Like most things in life, there are pros and cons to different tools.  Here is some guidance as to what we look for in tools.

Being inherently parallel accessible (remember the 23+ storage nodes), the best performance is gain when operations proceed in parallel.  Keep that in mind if building your own scripts (e.g. using ECS's variant of s3curl) or when comparing tools.  More parallelism (i.e. multiple transfers happening simultaneously across multiple storage nodes) generally yields better performance, up to a limit.  Eventually the parallel transfers become limited by other factors such as memory or network bandwidth constraints.  Generally there is a 'sweet spot' in the number of parallel transfer threads that can run before stepping on each other's toes. 

The S3 protocol includes support for multipart upload of objects, where a large object upload can be broken into multiple pieces and transferred in parallel.  Tools that support multipart upload likely will have better performance than tools that don't.  In addition, tools that support moving entire directory trees and that work in parallel will have better performance than tools that move files one object at a time.

After evaluating several tools, the HPC admins settled on 2 tools, ecs-sync and rclone, as 'best of breed' for moving data between ROSS and Grace's cluster storage system, where the /home, /scratch, and /storage directory live.  The ecs-sync program is the more efficient and the fastest of the two for bulk data moves.  It consumes fewer compute resources and less memory than rclone.  Yet when properly tweaked for number of threads (i.e. when the sweet spot is found) it moves data significantly faster than rclone.  The rclone program has more features than ecs-sync, including ways to browse data in ROSS, to mount a ROSS bucket as if it were a POSIX file system, and to synchronize content using a familiar rsync-like command syntax.  While ecs-sync is great for fast, bulk moves, rclone works very well for nuanced access to ROSS and small transfers.

ecs-sync

The ecs-sync program is specifically designed for moving data from one storage technology to another.  It comes from the Dell/EMC support labs, and is what EMC support engineers use for migrating data.  Due to security issues, we do not offer the job-oriented service mode described in the ecs-sync documentation.  (If the ecs-sync maintainers ever fixed the security holes, we could reconsider.)  Instead we only support running ecs-sync in what its documentation calls "Alternate (legacy) CLI execution", where a user runs ecs-sync as a command, instead of queuing up a job.  This allows the command to be run in the context of the user's login, honoring permissions.  One side effect is that the copied data could end up being owned by the user running the command, regardless of who owned the source data, which might not be desirable for all cases.

A command alias exists on Grace's login and compute nodes for running ecs-sync.  The alias actually runs the command on a data transfer node, so does not bog down the node on which the command is issued.  In other words, feel free to use ecs-sync from the login node, from the command prompt in Grace's Open OnDemand portal, or even from a compute job, if needed, though is it somewhat wasteful of resources to run ecs-sync from a compute job. 

The syntax for calling ecs-sync interactively on Grace is:

Code Block
titleInteractive ecs-sync command
ecs-sync --xml-config <config-file>.xml

The <config-file>.xml are the set of instructions of what is to move where in how many threads using which storage nodes.  Of course, replace <config-file> with a name of your choosing.  We will describe the XML config contents content later.

In running interactively, you see all the messages coming back from ecs-sync as it does its job, giving instant feedback.  But if the shell dies, the command may stop.  To avoid this, you can use nohup to run ecs-sync as a background job that continues when you log out and redirect standard out/err into a file, for example by using the following syntax:

Code Block
titleRunning ecs-sync in the background
nohup ecs-sync --xml-config <config-file>.xml > <log-file>.log &

Again, replace <config-file> and <log-file> with any name you wish.

The most complicated task in using ecs-sync is setting up the xml config file.  The config file consists of three parts: 

  • the global settings define various parameters of how the sync should be run, such as buffer sizes, number of retries, number of parallel threads, etc. that are not specific to either the source or the target
  • the source settings define where the data to be copied is coming from, including the type of storage, the location of the data, and access information
  • the target settings defing where the data being copied should be stored, including the type of storage, the destination bucket or directory, and other access related information

The basic layout for the xml config file is as follows:

Code Block
languagexml
titleOverview of config file
Code Block
languagexml
titleGlobal Options
<syncConfig xmlns="http://www.emc.com/ecs/sync/model">  
    <options><syncConfig xmlns="http://www.emc.com/ecs/sync/model">
        <bufferSize>524288</bufferSize>
        <dbConnectString>dbConnectString</dbConnectString>
        <dbFile>ecsdbFile</dbFile>
        <dbTable>dbTable</dbTable>
        <deleteSource>false</deleteSource>
        <forceSync>false</forceSync>
        <ignoreInvalidAcls>false</ignoreInvalidAcls>
        <logLevel>quiet</logLevel>
        <monitorPerformance>true</monitorPerformance>
        <recursive>true</recursive>
        <rememberFailed>false</rememberFailed>
        <retryAttempts>2</retryAttempts>
        <sourceListFile>sourceListFile</sourceListFile>
        <syncAcl>false</syncAcl>
        <syncData>true</syncData>
        <syncMetadata>true</syncMetadata>
        <syncRetentionExpiration>false</syncRetentionExpiration>
        <threadCount>16</threadCount>
        <timingWindow>1000</timingWindow>
        <timingsEnabled>false</timingsEnabled>
        <verify>false</verify>
        <verifyOnly>false</verifyOnly>
</options>

File System Options

This example shows the file system as a source.  It could also be a target.  The example shows typical values, which generally need to change to fit your situation.

Code Block
languagexml
titleFile System as Soure
    <source>
        <filesystemConfig>
            <path>/home/user/the/subdirectory/to/copy</path>
            <followLinks>true</followLinks>
            <storeMetadata>false</storeMetadata>
        </filesystemConfig>
    </source>

Ross (ECS S3) Options

ROSS uses the ECS S3 storage system type.  This example shows ROSS as a destination.  It could also be a target.  The example shows typical values, some of which must be changed for the sync to work, such as accessKey, bucketName, keyPrefix, and secretKey.

  <options>
        <!--- ... the global settings ... --->
    </options>

    <source>
        <filesystemConfig>
            <!--- ... file copy settings ... --->
        </filesystemConfig>
    </source>

    <target>
        <ecsS3Config>
            <!--- ... ecs s3 copy settings ... --->
        </ecsS3Config>         
    </target>
</syncConfig>

Note that the contents of <source> and <target> are interchangeable, and the order of the blocks doesn't matter.  So the direction of the copy can be switched by simply swapping the <source> and <target> keywords.

Although ecs-sync knows abaoaut other types of storage, since this guide is for moving data between Grace's storage and ROSS, we will only describe the filesystem (for Grace's storage) and ecsS3 (for ROSS's storage) types, since those are the 2 needed.  If you are interested in other storage types, please consult the ecs-sync documentation.

The easiest way to create an xml config file is to copy an exiting template, and just change the pertinent values.  To help guide, here are examples of the 3 sections with descriptions of the parameters

Global Options

This example shows available global parameter options.  These are typical values, and may be changed.  The important options are deleteSource, fourceSync, recursive, retryAttempts, threadcount, verify and verifyOnly,

Code Block
languagexml
titleGlobal Options
<syncConfig xmlns="http://www.emc.com/ecs/sync/model">
    <options><syncConfig xmlns="http://www.emc.com/ecs/sync/model">
        <bufferSize>524288</bufferSize>
        <!-- <dbConnectString>dbConnectString</dbConnectString>
Code Block
titleROSS ecs S3 as Target
    <target>
        <ecsS3Config>
            <accessKey>accessKey</accessKey>
            <apacheClientEnabled>false</apacheClientEnabled>
            <bucketName>bucketName</bucketName>
            <createBucket>false</createBucket>
            <decodeKeys>false</decodeKeys><dbEncPassword>ChangeMe1234</dbEncPassword> -->
        <dbFile>~/ecs-dbFile</dbFile>
        <dbTable>ecs-dbTable</dbTable>
        <deleteSource>false</deleteSource>
         <estimationEnabled>true</estimationEnabled>
      <enableVHosts>false<  <forceSync>false</enableVHosts>forceSync>
        <ignoreInvalidAcls>false</ignoreInvalidAcls>
        <logLevel>quiet</logLevel>
        <geoPinningEnabled>false<<monitorPerformance>true</geoPinningEnabled>monitorPerformance>
        <recursive>true</recursive>
        <rememberFailed>false</rememberFailed>
        <host>host<<retryAttempts>2</host>retryAttempts>
        <!-- <sourceListFile>sourceListFile</sourceListFile> -->
          <includeVersions>false<<syncAcl>false</includeVersions>syncAcl>
        <syncData>true</syncData>
        <syncMetadata>true</syncMetadata>
        <keyPrefix>keyPrefix<<syncRetentionExpiration>false</keyPrefix>syncRetentionExpiration>
        <threadCount>24</threadCount>
        <timingWindow>1000</timingWindow>
        <mpuDisabled>false<<timingsEnabled>false</mpuDisabled>timingsEnabled>
        <verify>false</verify>
            <mpuPartSizeMb>128</mpuPartSizeMb>
            <mpuThreadCount>4</mpuThreadCount><verifyOnly>false</verifyOnly>
</options>


File System Options

This example shows the file system as a source.  It could also be a target.  The example shows typical values, which generally need to change to fit your situation.

Code Block
languagexml
titleFile System as Soure
    <source>
            <mpuThresholdMb>512</mpuThresholdMb><filesystemConfig>
            <port>0</port>
            <preserveDirectories>false</preserveDirectories><path>/home/user/the/subdirectory/to/copy</path>
            <protocol>protocol<<followLinks>true</protocol>followLinks>
            <secretKey>secretKey<<storeMetadata>false</secretKey>storeMetadata>
        </filesystemConfig>
    </source>


Ross (ECS S3) Options

ROSS uses the ECS S3 storage system type.  This example shows ROSS as a destination.  It could also be a target.  The example shows typical values, some of which must be changed for the sync to work, such as accessKey, bucketName, keyPrefix, and secretKey.

Code Block
titleROSS ecs S3 as Target
    <target>
        <ecsS3Config>
            <accessKey>accessKey</accessKey>
            <apacheClientEnabled>false</apacheClientEnabled>
            <bucketName>bucketName</bucketName>
            <createBucket>false</createBucket>
            <decodeKeys>false</decodeKeys>
            <enableVHosts>false</enableVHosts>
            <geoPinningEnabled>false</geoPinningEnabled>
            <!-- <host>host</host> -->
            <includeVersions>false</includeVersions>
            <keyPrefix>keyPrefix</keyPrefix>
            <mpuDisabled>false</mpuDisabled> <mpuEnabled>true</mpuEnabled>
            <mpuPartSizeMb>128</mpuPartSizeMb>
            <mpuThreadCount>4</mpuThreadCount>
            <mpuThresholdMb>512</mpuThresholdMb>
            <port>9020</port>
            <preserveDirectories>false</preserveDirectories>
            <protocol>http</protocol>
            <remoteCopy>false</remoteCopy>
            <resetInvalidContentType>true</resetInvalidContentType>
            <secretKey>secretKey</secretKey>
            <smartClientEnabled>true</smartClientEnabled>
            <socketConnectTimeoutMs>15000</socketConnectTimeoutMs>
            <socketReadTimeoutMs>60000</socketReadTimeoutMs>
            <urlEncodeKeys>false</urlEncodeKeys>
            <vdcs>uams(ross01.hpc.uams.edu,ross02.hpc.uams.edu,ross03.hpc.uams.edu,ross04.hpc.uams.edu,ross05.hpc.uams.edu,ross06.hpc.uams.edu,ross07.hpc.uams.edu,ross08.hpc.uams.edu,ross09.hpc.uams.edu,ross10.hpc.uams.edu,ross11.hpc.uams.edu,ross12.hpc.uams.edu,ross13.hpc.uams.edu,ross14.hpc.uams.edu,ross15.hpc.uams.edu)</vdcs>         <smartClientEnabled>true</smartClientEnabled>
            <socketConnectTimeoutMs>15000</socketConnectTimeoutMs>
            <socketReadTimeoutMs>60000</socketReadTimeoutMs>
            <vdcs>vdcs</vdcs>
        </ecsS3Config>
     </target>