How use Sqoop to import and append to Hive

How use Sqoop to import and append to Hive

# Import from mysql to hive

sqoop import --connect jdbc:mysql://<HOST>/<DB> \
  --username <MYUSER> \
  --password <MYPASS> \
  --table <MYTABLE> \
  --hive-import --hive-table <DBMAMEONHIVE>.<TABLE> \
  --fields-terminated-by ','

# Change the hive table to external (will be stored on HDFS)

alter table <TABLE NAME> SET TBLPROPERTIES('EXTERNAL'='TRUE')

# Verify were the table are stored on HDFS looking for the Location field.

DESCRIBE FORMATTED <TABLE NAME>

# Than now you can import and append with sqoop direct to the hdfs file which will reflect direct on the external table.

sqoop import --connect jdbc:mysql://<HOST>/ \
  --username <MYUSER> \
  --password <MYPASS> \
  --table <MYTABLE> \
  --target-dir '<HDFS_LOCATION_OUTPUT>' --incremental append --check-column '<PRIMARY_KEY_COLUMN>' --last-value <LAST_VALUE_IMPORTED>

How to set up automatic filesystem checks and repair on Linux

Trigger Automatic Filesystem Check upon Boot
If you want to trigger fsck automatically upon boot, there are distro-specific ways to set up unattended fschk during boot time.

On Debian, Ubuntu or Linux Mint, edit /etc/default/rcS as follows.

$ sudo vi /etc/default/rcS
# automatically repair filesystems with inconsistencies during boot
FSCKFIX=yes
On CentOS, edit /etc/sysconfig/autofsck (or create it if it doesn’t exist) with the following content.

$ sudo vi /etc/sysconfig/autofsck
AUTOFSCK_DEF_CHECK=yes

Force One-Time Filesystem Check on the Next Reboot
If you want to trigger one-time filesystem check on your next reboot, you can use this command.

$ sudo touch /forcefsck

Kubernetes – Guestbook

git clone https://github.com/kubernetes/kubernetes.git

# Install Kubernetes
gcloud components install kubectl

# Create container cluster
gcloud container clusters create guestbook

…..

How to debug Yarn / Hive Jobs

How to debug Yarn / Hive Jobs

The easy way is access the Hadoop Resource Manager UI, normally is running at port 8088, for example:

http://yourambari:8088/cluster/apps
ResrouceManagerUI
Click on the link to application details and you will see a very detailed information about your Job.

Useful commands

To list running jobs use:

yarn application -list

                Application-Id	    Application-Name	    Application-Type	      User	     Queue	             State	       Final-State	       Progress	                       Tracking-URL
application_1480129450160_0003	HIVE-5cb06afa-2102-418c-8716-4726c1d13f35	                 TEZ	   testq	   default	          ACCEPTED	         UNDEFINED	             0%	                                N/A
application_1480129450160_0002	HIVE-c4a9098b-530e-4737-9c29-d3f9cc8e45ba	                 TEZ	   testq	   default	           RUNNING	         UNDEFINED	             0%	    http://testserver:32917/ui/

To debug a job, after run the application -list command you could check the “Tracking-URL” example on the example output the Application-ID application_1480129450160_0002 url is http://testserver:32917/ui/

To kill a job use:

yarn application -kill Application-Id