Hortonworks Hadoop tuning

Tez

tez.task.resource.memory.mb
tez.am.resource.memory.mb

MapReduce2

MR Map Java Heap Size
MR Reduce Java Heap Size
MR AppMaster Java Heap Size

Yarn

yarn.scheduler.capacity.maximum-am-resource-percent=80 (this is a MUST)

  • Memory:
    • Node
  • Container:
    • Minimum container size
    • Maxmum containcer size

Hive

  • Tez:
    • Tez Container Size
    • Hold containers to reduce latency = true
    • Number of containers held = 10
    • Memory (For Map Join)
  • hive-site:
    • set hive.execution.engine=tez;
      set hive.vectorized.execution.reduce.enabled = true;
      set hive.vectorized.execution.enabled = true;
      set hive.cbo.enable=true;
      set hive.compute.query.using.stats=true;
      set hive.stats.fetch.column.stats=true;
      set hive.stats.fetch.partition.stats=true;

Yarn

  • Memory:
    • Node
  • Container:
    • Minimum container size
    • Maxmum containcer size

Sqoop (Use ORC to improve performance )

# import
mysql -h $myhost -u $myuser -p$mypass $mydb -e 'show tables' | awk -v myuser="$myuser" -v mypass="$mypass" -v mydb="$mydb" -v myhost="$myhost" '{ print "sqoop import --connect jdbc:mysql://"myhost"/"mydb" --username "myuser" --password "mypass" -m 1 --table "$1" --hcatalog-database "mydb" --hcatalog-table "$1" --create-hcatalog-table --hcatalog-storage-stanza \"stored as orcfile\""}' | bash