Category Archives: Uncategorized

Kafka tools – dump / debug a topic

How to use kafkacat + ssl

kafkacat -b "$BROKERS" \
  -X security.protocol=SSL \
  -X ssl.key.location=/etc/ssl/kafka-client.key \
  -X ssl.certificate.location=/etc/ssl/kafka-client.crt \
  -X -J \
  -C -o beginning \
  -t "$tpc"

Kafkacat Options

General options:
  -C | -P | -L | -Q  Mode: Consume, Produce, Metadata List, Query mode
  -G       Mode: High-level KafkaConsumer (Kafka 0.9 balanced consumer groups)
                     Expects a list of topics to subscribe to
  -t          Topic to consume from, produce to, or list
  -p      Partition
  -b <brokers,..>    Bootstrap broker(s) (host[:port])
  -D          Message delimiter character:
                     a-z.. | \r | \n | \t | \xNN
                     Default: \n
  -E                 Do not exit on non fatal error
  -K          Key delimiter (same format as -D)
  -c            Limit message count
  -X list            List available librdkafka configuration properties
  -X prop=val        Set librdkafka configuration property.
                     Properties prefixed with "topic." are
                     applied as topic properties.
  -X dump            Dump configuration and exit.
  -d <dbg1,...>      Enable librdkafka debugging:
  -q                 Be quiet (verbosity set to 0)
  -v                 Increase verbosity
  -V                 Print version
  -h                 Print usage help

Producer options:
  -z snappy|gzip     Message compression. Default: none
  -p -1              Use random partitioner
  -D          Delimiter to split input into messages
  -K          Delimiter to split input key and message
  -l                 Send messages from a file separated by
                     delimiter, as with stdin.
                     (only one file allowed)
  -T                 Output sent messages to stdout, acting like tee.
  -c            Exit after producing this number of messages
  -Z                 Send empty messages as NULL messages
  file1 file2..      Read messages from files.
                     With -l, only one file permitted.
                     Otherwise, the entire file contents will
                     be sent as one single message.

Consumer options:
  -o         Offset to start consuming from:
                     beginning | end | stored |
                       (absolute offset) |
                     - (relative offset from end)
  -e                 Exit successfully when last message received
  -f          Output formatting string, see below.
                     Takes precedence over -D and -K.
  -J                 Output with JSON envelope
  -D          Delimiter to separate messages on output
  -K          Print message keys prefixing the message
                     with specified delimiter.
  -O                 Print message offset using -K delimiter
  -c            Exit after consuming this number of messages
  -Z                 Print NULL messages and keys as "NULL"(instead of empty)
  -u                 Unbuffered output

Metadata options (-L):
  -t          Topic to query (optional)

Query options (-Q):
  -t :
:    Get offset for topic ,
, timestamp .
                     Timestamp is the number of milliseconds
                     since epoch UTC.
                     Requires broker >= and librdkafka >= 0.9.3.
                     Multiple -t .. are allowed but a partition
                     must only occur once.

Format string tokens:
  %s                 Message payload
  %S                 Message payload length (or -1 for NULL)
  %R                 Message payload length (or -1 for NULL) serialized
                     as a binary big endian 32-bit signed integer
  %k                 Message key
  %K                 Message key length (or -1 for NULL)
  %T                 Message timestamp (milliseconds since epoch UTC)
  %t                 Topic
  %p                 Partition
  %o                 Message offset
  \n \r \t           Newlines, tab
  \xXX \xNNN         Any ASCII character
  -f 'Topic %t [%p] at offset %o: key %k: %s\n'

Consumer mode (writes messages to stdout):
  kafkacat -b  -t  -p 
  kafkacat -C -b ...

High-level KafkaConsumer mode:
  kafkacat -b  -G  topic1 top2 ^aregex\d+

Producer mode (reads messages from stdin):
  ... | kafkacat -b  -t  -p 
  kafkacat -P -b ...

Metadata listing:
  kafkacat -L -b  [-t ]

Query offset by timestamp:
  kafkacat -Q -b broker -t ::</dbg1,...></brokers,..>

HortonWorks Hive enable auth using Ambari users

HortonWorks Hive enable auth using Ambari users

configure the admin users on Ambari

Access -> Hive -> Configs -> Advanced -> Custom hive-site -> Add Property,admin

Save and reload hive.

Connect to the SSH of you hadoop cluster then run beeline and connect using hive and password hive command:

$ beeline
Beeline version 1.2.1000. by Apache Hive
beeline> !connect jdbc:hive2://hadoop-2:10000 hive hive  org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://hadoop-2:10000
Connected to: Apache Hive (version 1.2.1000.
Driver: Hive JDBC (version 1.2.1000.

Set the admin role to hive user

0: jdbc:hive2://hadoop-2:10000> SET ROLE ADMIN;
No rows affected (2.192 seconds)

Create basic roles (*Optional*)

Create 2 roles one for read-only users and another one for read-write users;

1: jdbc:hive2://hadoop-2:10000> create role RWUSER;
No rows affected (0.229 seconds)
1: jdbc:hive2://hadoop-2:10000> create role ROUSER;
No rows affected (0.221 seconds)
1: jdbc:hive2://hadoop-2:10000> ALTER DATABASE DEFAULT SET OWNER ROLE RWUSER;

Enable Security Authorization

Go to Hive -> Config -> Settings -> Choose Authorization -> SQLStdAuth.

Save and Reload Hive

Kubernetes – Guestbook

git clone

# Install Kubernetes
gcloud components install kubectl

# Create container cluster
gcloud container clusters create guestbook


Hadoop Admin Bigdata

Study Review

  1. What are the three V’s commonly used to describe Big Data?
    Volume, velocity and variety
  2. What are the three Big Data formats?
    Structured, semi-structured, and unstructured
  3. List one example of structured data.
    A relational database or a data warehouse
  4. What are the goals of Hadoop?
    To leverage inexpensive enterprise-grade hardware to create large clusters, and to create massively scalable clusters through distributed storage and processing
  5. Which type of Hadoop cluster nodes provide resources for data processing?
    Worker nodes
  6. Which service manages cluster CPU and memory resources?
  7. Which service manages cluster storage resources?
  8. Which framework provides a high-performance coordination service for distributed applications?
  9. Which framework provides provisioning, management, and monitoring capabilities?

Finding process with heavy disk i/o

  • iotop
  • sar
  • iostat


Install my vimrc -O ~/.vimrc

Force sudo on save

:w !sudo tee %

or if you have my .vimrc just use


Editing multiples columns using visual mode

Press CTRL+v to select multiples columns or lines.

Then press I and type what you want to insert in all selection then press ESC.

Search and Replace

To search just press / and type what you want to find

To replace all:


If you wanna to replace only a block of lines:

:set nu

Then get the first line you want to replace and the last line then:

:10,30 s/foo/bar/g

The /g is to replace multiple times.



Will replace "foo foo foo"  to "bar foo foo"

Using /g will replace it all.


Will replace "foo foo foo"  to "bar bar bar"