Hortonworks Data Platform (HDP) Sandbox on AWS EC2

This post demonstrates the process of installing the docker image of Hortonworks Data Platform (HDP) sandbox, used for exploring a single node Hadoop Cluster for Big Data Analytics applications. The installation is done on an AWS EC2 instance, running Amazon Linux 2.

The HDP Sandbox makes it easy to get started with Apache Hadoop, Apache Spark, Apache Hive, Apache HBase, Druid and Data Analytics Studio (DAS).

Installation of HDP Sandbox via Docker on an AWS EC2 instance

  1. Spin up an AWS EC2 instance with the following specifications
    • OS: Amazon Linux 2 (Amazon Linux 2 AMI (HVM), SSD Volume Type)
    • 4 vCPUs, 16 GB RAM (t2.xlarge)
    • 50 GB SSD
    • In Security policy,
      • Open SSH Port 22 to be accessed only from your local machine
      • Open TCP Port 8080 to be accessed from any IPv4 (0.0.0.0/0)
      • Open TCP Port 2222 to access the Sandbox through SSH from any IPv4
    • Default user name is ec2-user
  2. SSH into instance at Port 22 via user ec2-user. In the terminal, Update your Amazon Linux repositories
sudo yum update -y
  1. Install & Start Docker
sudo amazon-linux-extras install docker
sudo service docker start
  1. Add the ec2-user to the docker group so you can execute Docker commands without using sudo.
sudo usermod -a -G docker ec2-user
  1. Log out and log back in again to pick up the new docker group permissions.
logout
  1. Login again with same username: ec2-user

  2. Verify that the ec2-user can run Docker commands without sudo.

docker info
  1. Change the user of the current session to ‘root’
sudo -s
  1. Download HDP 2.6.5 sandbox files
curl -O https://archive.cloudera.com/hwx-sandbox/hdp/hdp-2.6.5/HDP_2.6.5_deploy-scripts_180624d542a25.zip

This link is obtained from: https://www.cloudera.com/downloads/hortonworks-sandbox/hdp.html
10. Unzip the downloaded files

unzip HDP_2.6.5_deploy-scripts_180624d542a25.zip
  1. Run the HDP setup script
bash docker-deploy-hdp265.sh
  1. Test if the HDP sandbox is running properly
curl localhost:8080
  • If response received is as below, then all is well.
  1. Apache Ambari should now be available in the browser from
    <AWS_Public_IP_host_name>:8080
    Username: raj_ops
    Password: raj_ops

  2. Logout now. Login via Port 2222. HDP Sandbox terminal can be accessed from

  • Port: 2222
  • Username: root
  • Password: hadoop
  • Login using these credentials at Port 2222
  1. After logging in verify,
sandbox-version

Output should be:

16. Check installed version of Hadoop and Hive

hadoop version
hive --version

Hadoop 2.7.3.2.6.5.0-292 & Hive 1.2.1000.2.6.5.0-292 should be obtained.

  1. On reboot, the docker container does not start. Do the follow for it to restart after every reboot.
docker stop sandbox-hdp
docker stop sandbox-proxy
docker update --restart unless-stopped sandbox-hdp
docker update --restart unless-stopped sandbox-proxy
docker container start sandbox-hdp
docker container start sandbox-proxy

Checking docker container status with:

docker container ls -a

You can now continue in your journey to exploring Big Data Analytics using the HDP Sandbox.. Enjoy!

Credits: Peter Reiter, Thomas Feilhauer, Armin Simma from FH Vorarlberg University of Applied Sciences, Austria