HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:
- Performant and Reliable: Snapshot creation is atomic and instantaneous, no matter the size or depth of the directory subtree
- Scalable: Snapshots do not create extra copies of blocks on the file system. Snapshots are highly optimized in memory and stored along with the NameNode’s file system namespace
$ hdfs dfsadmin -allowSnapshot /data/myfolder
Create a snapshot
$ hdfs dfs -createSnapshot /data/myfolder
This will create a snapshot and give it a default name which matches the timestamp, the folder will be something similar to this:
Recovering from data lost
Now imagine some guy deleted a folder, for example: /data/myfolder/test
And we need to recover this folder, this folder will be at the snapshot dir and to recover it we just need to copy this folder from snapshot to the original folder.
Locate the specific dir at the snapshot:
hdfs dfs -ls /data/myfolder/.snapshot/s20130903-000941.091/test
Restore to the specific dir:
hdfs dfs -cp /data/myfolder/.snapshot/s20130903-000941.091/test /data/myfolder/
SysAdmin/DevOps Professional with strong Linux focus, experience with design and support of high availability webscale infrastructures and resilient database deployments.
Deep understanding of Linux, Cloud, Information Security and outsource support.
If you need to send me an encrypted mail. Please import my GPG key:
gpg –keyserver hkp://keyserver.ubuntu.com –recv-keys 63B19B1C52B7AC98033EAC670F6A2073E0EE5DC5