EMR Snippets

Software Engineering 3203 views

Here are a collection of shell snippets I often use when working on an EMR master EC2 Instance.

Log into EMR Master EC2 Instance

First ssh into the master instance. Replace master-ec2-instance-ip-address with the actual I.P. address of your master EC2 instance.

ssh ec2user@master-ec2-instance-ip-address

Spark shell

sh /usr/lib/spark/bin/spark-shell

Hue web access

If you've installed Hue, remembed it's running on port 8888 on your master node.

http://master-ec2-instance-ip-address:8888/

List Yarn Applications

yarn application -list

Start Apache Spark Thrift Server

sudo /usr/lib/spark/sbin/start-thriftserver.sh --master yarn-client

Stop Apache Spark Thrift Server

sudo /usr/lib/spark/sbin/stop-thriftserver.sh

Edit Hive configuration file

sudo nano /etc/hive/conf.dist/hive-site.xml

Start all Spark services

sudo /usr/lib/spark/sbin/start-all.sh

Find all start-* shell scripts

sudo find / -name start-*.sh

Start Hive2 Server

sudo start hive-server2

Stop Hive2 Server

sudo stop hive-server2

Start Hive Catalog Server

sudo start hive-hcatalog-server

Stop Hive Catalog Server

sudo stop hive-hcatalog-server

Copy Hive configuration file to Spark

sudo cp /etc/hive/conf.dist/hive-site.xml /etc/spark/conf/

View Hive2 Server Log

more /var/log/hive/hive-server2.log

cat /var/log/hive/hive-server2.log | grep WARN

tail /var/log/hive/hive-server2.log

View Spark Thrift Log

cat /var/log/spark/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-ip-10-237-188-160.out | grep ERROR

Start Beeline shell

/usr/lib/spark/bin/beeline

See also