ServiceCatalog

Service Catalog EMR-ML Lab


This Page https://tinyurl.com/y298k84u


EMR Lab

    The Lab Objective is to quickly create your own Service Catalog EMR environment

Pre-requisites:

  • DO NOT use your ROOT user
  • Create a user called emrscadmin, give it an an Administrator policy (optional if you have a user with an admin policy)

Setup the Lab Environment

  1. Login to your AWS Account using the emrscadmin user (or an admin user)
  2. Launch this stack that will create the Service Catalog environment

  3. On the Create stack page, choose Next,
  4. On the stack details page, fill in the parameters and then choose Next
    • Stack Parameters
      Stack Name EMRLABsetup
      PortfolioNameEMRloft

  5. On the Configure stack options page choose Next
  6. On Review page, choose the check box for I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  7. Choose Create Stack
  8. When the stack has completed click the Outputs Tab
  9. Copy the Output URLs into a text document or in the box below



Launch the EMR product

  1. Copy the URL with the SwitchRoleSCEndUser Key
  2. Open a new browser tab, paste the URL, hit enter
  3. Choose the Switch Role button
  4. Go to the Service Catalog console
  5. Choose the EMR Service Catalog Product
  6. Choose LAUNCH PRODUCT button
  7. Enter a name myemrlab
  8. Choose verison V1
  9. Choose Next
  10. Choose a VPC
  11. Choose a Subnet (choose a public Subnet/network)
  12. Choose a Remote Access CIDR Block "Enter a Remote Address CIDR Block (That you’ll access EMR from)
  13. * Note Click here to get your ipaddress copy it
  14. Enter it in this format paste_the_address_you_copied/32 e.g 100.122.122.22/32
  15. Choose Next
  16. On the TagOption page Choose Next
  17. On the Notifications page Choose Next
  18. On the Review page Choose Launch
  19. This will take about 15 minutes to complete
  20. View the output
  21. Copy the value of Hive Script you will need it to run the query
  22. Copy the value of S3LogBucket you will need it for the lab clean up steps. Paste it here

Running the EMR Query


Use Case: There are logs of who accessed an important website. We need to know what kind of computers were used to access the website so we can improve the website. The query will give us that information

  1. Open the JuypterNotebook link in a new bowser tab, right click,Open Link in new Tab
  2. Open a new terminal:New,Terminal
  3. Paste the hive script
  4. Hit Enter, this will run for about 1 minute
  5. View the results

    1. In the same EMR window
    2. Run the command below to view the results of the hive script. Replace the bold text with the ‘Moving data to directory’ s3 url copied from the previous command’s output
    3. aws s3 cp --recursive s3://sc-9999999999-pp-btjexf5pcesao-logbucket-1d5ejm5a251rx/out/os_requests/ here

    4. cat here/000000_0
    5. Your results should look like this
      Android855
      Linux813
      MacOS852
      OSX799
      Windows883
      iOS794



    Clean up the environment so you don't incur extra costs

    Manualy empty the bucket

    1. Close the browser tab for EMR
    2. Close the jupyter browser tab
    3. Switch Back to the emrscadmin (or the user you used to create the initial stack) user
    4. Open the Cloudformation console
    5. Choose the stack created by Service Catalog it will have a SC-99999999-pp format
    6. Expand the Resources section
    7. Copy the name of the Logging S3 Bucket to the clipboard
    8. Open the S3 console
    9. Paste the bucket name into the “Search for Buckets” box
    10. Select the bucket
    11. Click Empty
    12. Paste the bucket name again, into the “are you sure” dialog box and press Confirm
    13. Open the Cloudformation console
    14. Choose the stack created by Service Catalog it will have a SC-99999999-pp format
    15. Choose Delete
    16. Choose Delete stack Wait for deletion to complete - use refresh if needed
    17. * Note if the delete fails Repeat steps 5 - 17
    18. Choose the EMRALabSetup stack created for this exercise
    19. Choose Delete
    20. Choose Delete stack Wait for deletion to complete - use refresh if needed


    Congratulation the lab environment has been cleaned up. The Lab is complete