HDFS Snapshots

HDFS Snapshots

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:

  • Performant and Reliable: Snapshot creation is atomic and instantaneous, no matter the size or depth of the directory subtree
  • Scalable: Snapshots do not create extra copies of blocks on the file system. Snapshots are highly optimized in memory and stored along with the NameNode’s file system namespace

Examples

Enable snapshots

$ hdfs dfsadmin -allowSnapshot /data/myfolder

Create a snapshot

$ hdfs dfs -createSnapshot /data/myfolder

This will create a snapshot and give it a default name which matches the timestamp, the folder will be something similar to this:

/data/myfolder/.snapshot/s20150803-000922.092

Recovering from data lost

Now imagine some guy deleted a folder, for example: /data/myfolder/test

And we need to recover this folder, this folder will be at the snapshot dir and to recover it we just need to copy this folder from snapshot to the original folder.
Locate the specific dir at the snapshot:

hdfs dfs -ls /data/myfolder/.snapshot/s20130903-000941.091/test

Restore to the specific dir:

hdfs dfs -cp /data/myfolder/.snapshot/s20130903-000941.091/test /data/myfolder/

More details:
http://hortonworks.com/products/hortonworks-sandbox/#tutorial_gallery
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_user-guide/content/user-guide-hdfs-snapshots.html

Kafka Basics – Topics, Producers and Consumers

Kafka Basics

Workdir on HortonWorks

cd /usr/hdp/2.3.4.0-3485/kafka/bin

Create new topic “TEST”:

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --topic test

List topics:

cd /usr/hdp/2.3.4.0-3485/kafka/bin
./kafka-topics.sh --create --zookeeper localhost:2181

Details about a specific topic:

./kafka-topics.sh --describe --zookeeper localhost:2181 --topic test
Topic:test	PartitionCount:1	ReplicationFactor:1	Configs:
	Topic: test	Partition: 0	Leader: 0	Replicas: 0	Isr: 0

Send some messages

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default each line will be sent as a separate message.

Run the producer and then type a few messages into the console to send to the server.

./kafka-console-producer.sh --broker-list localhost:9092 --topic test 
This is a message
This is another message

Start a consumer

Kafka also has a command line consumer that will dump out messages to standard output.

./kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
This is a message
This is another message

More info: http://kafka.apache.org/081/quickstart.html