EMR Snippets
Here are a collection of shell snippets I often use when working on an EMR master EC2 Instance.
Log into EMR Master EC2 Instance
First ssh into the master instance. Replace master-ec2-instance-ip-address with the actual I.P. address of your master EC2 instance.
ssh ec2user@master-ec2-instance-ip-address
Spark shell
sh /usr/lib/spark/bin/spark-shell
Hue web access
If you've installed Hue, remembed it's running on port 8888 on your master node.
http://master-ec2-instance-ip-address:8888/
List Yarn Applications
yarn application -list
Start Apache Spark Thrift Server
sudo /usr/lib/spark/sbin/start-thriftserver.sh --master yarn-client
Stop Apache Spark Thrift Server
sudo /usr/lib/spark/sbin/stop-thriftserver.sh
Edit Hive configuration file
sudo nano /etc/hive/conf.dist/hive-site.xml
Start all Spark services
sudo /usr/lib/spark/sbin/start-all.sh
Find all start-* shell scripts
sudo find / -name start-*.sh
Start Hive2 Server
sudo start hive-server2
Stop Hive2 Server
sudo stop hive-server2
Start Hive Catalog Server
sudo start hive-hcatalog-server
Stop Hive Catalog Server
sudo stop hive-hcatalog-server
Copy Hive configuration file to Spark
sudo cp /etc/hive/conf.dist/hive-site.xml /etc/spark/conf/
View Hive2 Server Log
more /var/log/hive/hive-server2.log
cat /var/log/hive/hive-server2.log | grep WARN
tail /var/log/hive/hive-server2.log
View Spark Thrift Log
cat /var/log/spark/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-ip-10-237-188-160.out | grep ERROR
Start Beeline shell
/usr/lib/spark/bin/beeline