- 
                Notifications
    
You must be signed in to change notification settings  - Fork 96
 
WIP data-mover #2474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
WIP data-mover #2474
Changes from 5 commits
ac3f09f
              fe73277
              3d0fe01
              88c863a
              7047565
              c01f224
              c36d49f
              4db9e9c
              058cffb
              8865c71
              fdb2c47
              ebbf57c
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # Data-mover | ||
| 
     | 
||
| Data-mover is a tool to move data between Puhti and Mahti local filesystems and | ||
| Allas and LUMI-O object storage servers, when | ||
| [simple transfers](../faq/how-to-move-data-between-puhti-and-allas.md#move-data-with-rclone) | ||
| are not practical, either because there are many small files, or the size of the | ||
| dataset is large. | ||
| 
     | 
||
| We wish the data-mover tool `dm` to be simple to use, and handle all possible | ||
| hard corner cases. It is basically a wrapper around [Restic backup tool](https://restic.readthedocs.io) | ||
| , and stores the data in Restic repositories. | ||
| Restic in turn uses [Rclone](https://rclone.org) for the actual data transfers to | ||
| the object storage servers and back. In addition, the data-mover tool does the | ||
| data transfers in the background, using batch jobs, allowing larger transfers | ||
| than would be practical in regular interactive login sessions. | ||
| 
     | 
||
| Below is a guide for a simple scenario, moving data from Puhti project scratch | ||
| directory to corresponding project in Allas, and then back. Similar works with | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. similar what works? Unclear There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess you mean exact same instructions work on mahti, which is true. Lumi-O is slightly different, need to specify more There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These object storages are really bad from the perspective of traditional HPC use. The mapping between filesystem to object storage is far from 1-to-1, object storage is completely separate machine with it's own authentication and authorisation, there are many different transfer tools/clients, APIs, and object storage server configurations, all different and often incompatible, instead of OS just handling it... I started writing it all out, noticed that it would be a long article, wrote TLDR text (what it is now), and deleted the start of the more complete guide. This tool is supposed to be easy to use. If the documentation is long, it means the tool is not easy to use :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll see if I can redirect the reader quicker to more comprehensive docs for using other services than puhti and allas There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. I reread the doc. Similar is exactly how it is. Very unclear, but truthfully so :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you could just say that easy instructions work the same way in Mahti. Cross service usage, using Lumi-O instead of allas is possible, please go read the advanced section if you are interested  | 
||
| Mahti and LUMI-O. Please have a look at `dm help` and `dm <sub-command> --help` | ||
| for additional documentation. | ||
| 
     | 
||
| ## Setting up the connection from Puhti to Allas | ||
| 
     | 
||
| 1. Your CSC project needs to have Allas service enabled. The project PI can add | ||
| Allas service for the project in [my.csc.fi](https://my.csc.fi) , if not already enabled, and | ||
| the project members need to [accept the service terms](../../accounts/how-to-add-service-access-for-project.md). | ||
| 
     | 
||
| 2. Create a configuration for rclone and store the authentication token in the | ||
| file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from | ||
| [Puhti web interface](https://puhti.csc.fi). Open "Cloud storage configuration" from the | ||
| "Tools" drop-down menu, and | ||
| [create Allas S3 rclone configuration for the project](../../computing/webinterface/file-browser.md#accessing-allas-and-lumi-o). | ||
| 
     | 
||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. So this uses the the Open on demand style s3 configuration. This will be bit confusing for the old users of S3 allas.  | 
||
| 4. Open a terminal to Puhti, and take the data-mover tool `dm` into use with | ||
| ``` | ||
| module load .data-mover | ||
| ``` | ||
| 
     | 
||
| ## Moving data from Puhti to Allas | ||
| 
     | 
||
| 1. Put the data in a single directory, for example | ||
                
       | 
||
| `/scratch/project_<projid>/exampledir` in Puhti, _deleting all the files that | ||
| you do not need_. There is no need to compress the files. | ||
| 
     | 
||
| 2. Move the data to Allas | ||
| ``` | ||
| dm export /scratch/project_<projid>/exampledir | ||
| ``` | ||
| 
     | 
||
| 3. Check the status of the data transfer with | ||
| ``` | ||
| dm status | ||
| ``` | ||
| 
     | 
||
| ## Listing the data in Allas | ||
| 
     | 
||
| ``` | ||
| dm list | ||
| ``` | ||
| 
     | 
||
| ## Moving data from Allas to Puhti | ||
| 
     | 
||
| Import data back to the original directory with | ||
| ``` | ||
| dm import /scratch/project_<projid>/exampledir | ||
| ``` | ||
| 
     | 
||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens for the overlapping files in exampledir? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do you remove old exports? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean by overlapping files? Deleting an export from Allas means you moved something that you should have simply deleted in the first place :D Ok, there is   | 
||
| ## Links to related material | ||
| 
     | 
||
| - [Lue tool for data inventory](lue.md) | ||
| - [Data cleaning](clean-up-data.md) | ||
| - [Allas introduction](../../data/Allas/introduction.md) | ||
Uh oh!
There was an error while loading. Please reload this page.