Hadoop Admin Bigdata

Study Review

  1. What are the three V’s commonly used to describe Big Data?
    Volume, velocity and variety
  2. What are the three Big Data formats?
    Structured, semi-structured, and unstructured
  3. List one example of structured data.
    A relational database or a data warehouse
  4. What are the goals of Hadoop?
    To leverage inexpensive enterprise-grade hardware to create large clusters, and to create massively scalable clusters through distributed storage and processing
  5. Which type of Hadoop cluster nodes provide resources for data processing?
    Worker nodes
  6. Which service manages cluster CPU and memory resources?
  7. Which service manages cluster storage resources?
  8. Which framework provides a high-performance coordination service for distributed applications?
  9. Which framework provides provisioning, management, and monitoring capabilities?

File Handles and Open Files

Show Kernel Limits

 sysctl fs.file-max

To determine the current usage of file handles

cat /proc/sys/fs/file-nr
1154    133     8192

The file-nr file displays three parameters:

  • the total allocated file handles.
  • the number of currently used file handles (with the 2.4 kernel); or the number of currently unused file handles (with the 2.6 kernel).
  • the maximum file handles that can be allocated (also found in /proc/sys/fs/file-max).


Introduction to lsof

# lsof | more
COMMAND     PID   TID                            USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
systemd       1                                  root  cwd       DIR              254,0       4096          2 /
systemd       1                                  root  rtd       DIR              254,0       4096          2 /
systemd       1                                  root  txt       REG              254,0    1309064   15205040 /lib/systemd/systemd
systemd       1                                  root  mem       REG              254,0      18640   15204380 /lib/x86_64-linux-gnu/libattr.so.1.1.0
systemd       1                                  root  mem       REG              254,0      14664   15204731 /lib/x86_64-linux-gnu/libdl-2.19.so
systemd       1                                  root  mem       REG              254,0     448440   15204402 /lib/x86_64-linux-gnu/libpcre.so.3.13.1
systemd       1                                  root  mem       REG              254,0      31784   15204765 /lib/x86_64-linux-gnu/librt-2.19.so
systemd       1                                  root  mem       REG              254,0      92888   15204453 /lib/x86_64-linux-gnu/libkmod.so.2.2.8
systemd       1                                  root  mem       REG              254,0      19016   15204533 /lib/x86_64-linux-gnu/libcap.so.2.24
systemd       1                                  root  mem       REG              254,0     113024   15204445 /lib/x86_64-linux-gnu/libaudit.so.1.0.0
systemd       1                                  root  mem       REG              254,0      64024   15204422 /lib/x86_64-linux-gnu/libpam.so.0.83.1
systemd       1                                  root  mem       REG              254,0     142728   15204457 /lib/x86_64-linux-gnu/libselinux.so.1
systemd       1                                  root  mem       REG              254,0    1738176   15204667 /lib/x86_64-linux-gnu/libc-2.19.so

By default One file per line is displayed. Most of the columns are self explanatory. We will explain the details about couple of cryptic columns (FD and TYPE).

FD – Represents the file descriptor. Some of the values of FDs are,

  • cwd – Current Working Directory
  • txt – Text file
  • mem – Memory mapped file
  • mmap – Memory mapped device
  • NUMBER – Represent the actual file descriptor. The character after the number i.e ‘1u’, represents the mode in which the file is opened. r for read, w for write, u for read and write.

TYPE – Specifies the type of the file. Some of the values of TYPEs are:
REG – Regular File
DIR – Directory
FIFO – First In First Out
CHR – Character special file


Finding Process using heavy disk io: