Child pages
  • Cloud-Based Processing using HCP Pipelines and Amazon Web Services - v2.0
Skip to end of metadata
Go to start of metadata

NOTE: The following is incomplete. The newest version of the HCP_NITRC AMI does not seem to work. It seems to have root access disabled which adds another layer of incompatibility with being a StarCluster node.

 

 

Table of Contents

Terms and Acronyms

The goal of this tutorial is for the reader to gain experience with running the HCP pipelines in the "Amazon Cloud". In order for this to make sense, it is important that you start out with a basic understanding of the following terms.

AWS - Amazon Web Services

A collection of remote computing services that make up a cloud computing platform. The two of the central services are Amazon EC2 (the service that provides compute power, “machines” that are remotely available) and Amazon S3 (the service that provides storage space for your data).

EC2 – Elastic Compute Cloud

Amazon service that allows users to rent virtual machines (VMs) on which to run their applications. Users can create, launch, and terminate VMs as needed, paying an hourly fee only for VMs that are currently active (this the “elastic” nature).

S3 – Simple Storage Service

Amazon online data storage service. Not a traditional file system. Stores large “objects” instead of files. These objects are accessible virtually anywhere on the web. Multiple running EC2 instances can access an S3 object simultaneously. Intended for large, shared pools of data. Conceptually similar to a shared, web-accessible drive.

S3 Bucket

Data in S3 is stored in buckets. For our purposes, a bucket is simply a named container for the files that we store and share via Amazon S3. HCP's data is made available publicly in a bucket named hcp-openaccess.

AMI – Amazon Machine Image – The Software

A read-only image of a file system that includes an Operating System (OS) and additional software installed. Conceptually, this is comparable to a CD/DVD that contains an OS and other software that is installed on a “machine” for you. The creator of the AMI chooses which OS to include and then installs and configures other software. For example, an AMI creator might choose to start with CentOS Linux or Ubuntu Linux and then pre-install a set of tools that are useful for a particular purpose.

An AMI might be created for Photo Editing which would contain a pre-installed suite of software that the AMI creator deems is useful for Photo Editing.

An AMI might be created for Neuroimaging with a chosen OS (e.g. Ubuntu 12.04.1 LTS) and a pre-installed suite of software for Neuroimaging (e.g. FSL, AFNI, FreeSurfer, the HCP Pipelines, Workbench, etc.)

The AMI is the software distribution that will be installed and run on your virtual machine instance (see below.)

Amazon EC2 Instance TypesThe available hardware

An EC2 Instance Type is a particular combination of CPU, memory (RAM), storage, and networking capacity optimized for a particular purpose. There are instance types defined for use as:

  • General Purpose systems

  • Compute Optimized (e.g. high performance) systems

  • Memory Optimized systems

  • GPU application systems

  • Storage Optimized (high I/O) systems

An Instance Type is a virtual hardware configuration.

Amazon EBS – Elastic Block Storage

Online data storage service that provides a more traditional file system. An EBS volume is attached to a running EC2 instance. From the EC2 instance's point of view, an EBS volume is a “local drive”.

EBS volumes can be configured such that the data continues to exist after the EC2 instance is shut down. By default, however, they are configured such that the volume is deleted upon instance shut down.

NITRC

Neuroimaging Informatics Tools and Resources Clearinghouse

 


Return to Table of Contents


Step 1: Getting Credentials to access HCP S3 Data

  • In order to have access to the HCP data via Amazon S3, you will need to have a ConnectomeDB account and have accepted the Open Access Data Use Terms.

  • In a web browser (e.g Firefox), login to your ConnectomeDB account by visiting https://db.humanconnectome.org and entering your ConnectomeDB user name and password.

  • There should be an orange icon and text stating Data Available on Amazon S3

 


  • To create your AWS Credentials, click on the Amazon Web Services icon at the upper right of the page. The credentials will include your ConnectomeDB username and a pair of keys created for your account that will be used for secure access to the HCP S3 Bucket

  • These keys are your AWS Access Key ID and your AWS Secret Access Key. While not exactly the same, it may be helpful to think of your Access Key ID as your username and your Secret Access Key as your password for accessing the HCP S3 Bucket.

  • Upon clicking the Amazon Web Services icon, you should see a Set Up Credentials dialog

 


  • Select the Create my AWS Credentials button on the dialog



  • When your credentials have been created, you should see the AWS Connection Manager: Success dialog

 

  • Notice that the Success dialog has a temporary link to use to download your credentials. Click on that link and record the 3 pieces of information that make up your AWS credentials:

    • Your username – the same as your ConnectomeDB username

    • Your Access Key ID – this is stored by ConnectomeDB

    • Your Secret Access Key – this is yours to store and maintain access to

  • Keep this information in a secure place as it is your access information to the HCP S3 data. It should not be shared with others.

 


Return to Table of Contents


Step 2: Getting Started with AWS

  • Before coming to the Exploring the Human Connectome course, you should have received an email with instructions for how to setup an Amazon Web Services (AWS) account and setup the account to use the $100 of credit that has been provided by Amazon to students in the course.

  • You will need your AWS account information (login email address and password) to complete the steps of the practical.

Step 2a: Login to AWS


  • In the upper right of the web page, just to the right of the name associated with your account, there is a region indication (e.g. N. Virginia, Oregon, Ireland, Sydney, etc.) At the time these instructions were written, the AMI that we want to use is available only in the US East (N. Virginia) region. If the region indicator does not already read N. Virginia, select the down arrow to the right of the region and select US East (N. Virginia).

 


Return to Table of Contents


Step 2b: Create an Instance

  • Once you've successfully logged in to the AWS Management Console, select the EC2- Virtual Servers in the Cloud link in the upper left.


 

  • Select the Launch Instance button



  • Select the AWS Marketplace tab on the left hand side of the page.

  • In the search box near the top of the page enter HCP


  • The result of this search should be at least one AMI, the NITRC Human Connectome Project Computational Environment.

  • If you select the more info link, you should see a section with the heading Product highlights. The last sentence in this section should confirm for you that, This is the NITRC-CE version for the 2015 OHBM HCP course and matching the course notes "Cloud-based Processing using HCP Pipelines and Amazon Web Services."

  • Select the NITRC Human Connectome Project Computational Environment AMI by pressing the Select button to the right of the AMI listing.

  • You will then see a pop-up dialog with product and pricing details. Press the Continue button at the bottom of that dialog.
  • You'll then be asked to choose an instance type.


  • For this exercise, choose the m3.medium instance type by making sure that the “check box” to the left of that instance type is selected.

  • Since we won't have time to fully run any pipelines in this exercise, this is a good, relatively inexpensive choice. When you do this “for real” you will want to consider whether you are going to run the pipelines on this machine (in which case you will probably want a high memory, high CPU power instance type) or use this machine as a master for running a cluster (a.k.a. grid) of other machines that actually run the pipelines (in which case you will probably want a relatively low memory and low CPU power instance type in order to save money.)

  • We will illustrate using a cluster to run pipelines later in this tutorial.

  • Select the Review and Launch button. 
  • You may see a pop-up asking you whether you want to boot from a General Purpose (SSD). If so, accept the default settings and select Next. 
  • Select the Launch button
  • If you have previously used your AWS console and created a key pair for accessing an instance, you will see a dialog asking you whether you want to use an existing key pair. If you have the private key file for an existing key pair, you can choose to use that existing key pair. Otherwise, you should choose to create a new key pair.

  • Give the key pair a name that does not have any spaces in it, e.g. MyHcpKeyPair (not My Hcp Key Pair) and download the key pair using the Download Key Pair button.

  • You must save the private key file (e.g. MyHcpKeyPair.pem) on your local computer system. Depending upon how your browser is configured, you may have to look in your
    ${HOME}/Downloads directory for the downloaded .pem file. You will need this file later.

  • If necessary, select the check box indicating that you acknowledge that you will have access to the private key file.

  • Select the Launch Instances button on the key pair dialog

  • After a short time, you should see the Launch Status page

 


  • Take note of the listed instance ID

    • The instance ID is shown in green after the text reading, “The following instance launches have been initiated:”

    • In the above illustration, the instance ID is i-a2b3888d. Your instance ID will be different.

  •  Select the instance ID (e.g. i-a2b3888d) link provided.
  • This should get you to an instance table that only shows the instance you just created.



  • In the instance table, note the following information about your instance. You will need all of this information later.

    • Instance ID

      • In the example, the Instance ID is: i-a2b3888d.

      • Yours will be different
    • Public DNS

      • In the example, the Public DNS is: ec2-52-7-106-116.compute-1.amazonaws.com

      • Yours will be different

    • Key Name

      • In the example, the KeyName is: MyHcpKeyPair

      • Yours may be different

  • Open another tab on your browser and visit your machine at http://<your-public-dns> (It may take a few minutes for your instance to finish initializing and be ready to respond. If you get an error message similar to "Server not found" or "Webpage is not available", verify that you are using the correct public DNS and try again. You can check whether your instance has completed initializing by refreshing your instance table.)

  • This should bring you to a Security Redirect page that looks like the following:

 


  • Select the “For better security, please click here” link.

  • Click through the “Your connection is not private” warnings by selecting the Advanced and then the Proceed to … (unsafe) links as necessary. (Different browsers may have different responses and messages associated with not making a valid secure connection. Choose whichever option allows you to continue to connect to your instance (e.g. Continue, Connect, Accept, etc.))

  • This should get you to the main page served up by your running HCP_NITRC instance. Note the address in the address bar of your browser window:

 


  • The part of the address starting with ec2 and going through amazonaws.com (in the above example: ec2-52-4-26-132.compute-1.amazonaws.com) is your virtual machine's public DNS address. As has been already mentioned, you will want to have this information recorded for future reference.

  • This web-based interface to your running instance is part of service provided by a NITRC's AMI and allows you to configure your NITRC-CE instance.

  • Enter your saved Amazon instance ID (e.g. i‑e09d961c, yours will be different) and a username and password for an account that will be created for you on your running instance (e.g. hcpuser and hcppassword). You are entering a username and password for an account that will be created for you on the instance, not an account that already exists.

  • Fill in an email address for notification if the machine instance is left running. You will be charged for the time the machine instance is left running whether you are “doing anything” on it or not. So it is worth keeping track of a machine instance and being notified if it is left running. If you'd prefer not to receive such notifications, uncheck the box next to the If your machine is left running... text.

  • Select the Submit button.

  • This sets up an account for you on the running instance and starts a logged in Virtual Network Computing (VNC) server session on the instance.

  • If you get a page that has a red banner across the top reading, “There was an unexpected error creating your account. Please try again”, then simply press the Submit button again.

  • Your confirmation that the account has been setup and the VNC server session has been started will be seeing a page that looks similar to the following:



  • The VNC server session will allow you to connect to the machine instance with a full GUI Desktop interface. You can connect to this GUI Desktop interface either from inside your browser or by using VNC client software installed on your local machine. For this demonstration, we will connect to the GUI Desktop from the browser. Later we will also see how to establish a simple terminal connection to your instance using SSH.

 


Return to Table of Contents


Step 2c: Configure Your Machine Instance

  • Select the Control Panel button to get to a page that allows you to:

    • Configure your software licenses

    • Setup access to the HCP OpenAccess S3 bucket (hcp-openaccess)

    • Start, Connect To, and End a VNC server session.



  • You will be able to return to this Control Panel page in the future by using your public DNS (e.g. ec2-52-4-26-132.compute-1.amazonaws.com), clicking through the Security Redirect page, and entering the username and password for the account that was just created for you (e.g. hcpuser and hcppassword).

  • Notice the bold text on the page providing you with the address to use to connect to the VNC server session with VNC client software. In the above example it states, “You may access it by directing a VNC client to ec2-52-4-26-132.compute-1.amazonaws.com::5901 or with the “Connect” button below.”

  • This is your instance's public DNS (ec2-52-4-26-132.compute-1.amazonaws.com) with a port number (::5901) associated with the VNC server session added to the end of it.

  • Select the Licenses tab on the Control Panel




  • Notice that the FreeSurfer Status is “License not installed”

  • Select the click here link after the “To update your FreeSurfer License...” text

  • In the text box presented, place the following license information.

Important Notes:

  1. The following is the FreeSurfer license that we are using for this course. It is only intended for your use during this course. If you want to continue using FreeSurfer after the course, please get your own FreeSurfer license and install it on any machine instances you use.
  2. In the FreeSurfer license information below, please carefully note that there are single space characters before lines 3 and 4.
tsc5yc@mst.edu
7361
 *CMS6c5mP.wmk
 FSQVaStVzhzXA
  • Once you have entered the FreeSurfer license information, press the Submit button below the text area. You should then be returned to the Console tab.

  • Select the Settings tab



  • Enter your AWS Access Key ID and AWS Secret Access Key in the provided text fields.

  • Be very careful if you are copying and pasting your Access Key ID and Secret Access Key from somewhere into the text fields that you do not accidentally add a space character at either end of the copied and pasted text. Extra characters (even extra spaces) will prevent you from properly mounting the HCP bucket.

  • Notice that the Public S3 Bucket for the HCP is already configured to mount, but needs to have your AWS keys entered, under the Mount header it says Enter AWS keys.

  • Select the Apply button at the bottom left of the page, and a check box should appear under the Mount heading for the HCP S3 Bucket.



  • Select the check box asking that the HCP S3 Bucket be mounted



  • Select the Apply button again.

  • This time you should see the following notification across the top of the page above the Console, Licenses, and Settings tabs. Notice the “Mounted hcp at /s3/hcp” notification.




Return to Table of Contents


Step 2d: Connect to Your Running Machine Instance

  • Select the Console tab on the Control Panel Page

  • To connect to the VNC server session within your browser, press the Connect button. You will then need to supply your account username and password (e.g. hcpuser and hcppassword) and press the Login button.

  • This presents a web page (from your machine instance) that allows you to use the Guacamole clientless remote desktop gateway (http://guac-dev.org) to connect to the running VNC server session.




  • Select the link
  • A new tab will open up in your browser showing you a complete Ubuntu Desktop in which you are logged in to your created account.



  • Feel free to issue some simple commands in the presented terminal window (ls, pwd, fslview, …)

  • In a moment, we'll do a little more “looking around” to take note of what software is already installed and available for you to use on this system. But first, let's look at one other way to connect to the system.



Return to Table of Contents


Step 2e: Make a Terminal Connection using SSH

  • Rather than have a full Desktop GUI, you can simply connect to your running machine instance using a terminal emulator and SSH.

  • Start a terminal emulator on your local machine.

  • Locate the private key file that was created as part of creating your machine instance (e.g. ${HOME}/Downloads/MyAmazon1Click.pem) and use the chmod command to make sure your private key file isn't publicly viewable.

Important Notes:

  1. If your private key file was downloaded to somewhere different than your ~/Downloads directory, you will need to substitute the location of your private key file for ~/Downloads in the below commands.
  2. If your private key file was named something other than MyAmazon1Click.pem, you will need to substitute the name of your private key file for MyAmazon1Click.pem in the below commands.
$ cd ~/Downloads
$ chmod 400 MyAmazon1Click.pem

 

  • Issue the following command to connect to use SSH to connect to your running instance:

Important Notes:

  1. You will need to substitute the full path to your private key file name for ${HOME}/Downloads/Amazon1Click.pem in the below commands.
  2. You will need to substitute your machine instance's public DNS for ec2-52-4-26-132.compute-1.amazonaws.com in the below commands.
$ ssh -X -i ${HOME}/Downloads/Amazon1Click.pem hcpuser@ec2-52-4-26.132.compute-1.amazonaws.com
  • You will likely be informed that the authenticity of the host to which you are connecting cannot be established and asked if you want to continue connecting. Answer yes and then enter the password for the account that was created (e.g. hcppassword.) Once you've answered yes to this question one time, you shouldn't receive this notification in the future.

  • When prompted, enter the password for the account you created on your running instance (e.g. hcppassword).

  • You should then receive a welcome to NITRC Computational Environment message that looks similar to:



  • You have now successfully used 2 of the 3 possible ways to access your machine instance.

  • The third way would be to use VNC Client software on your local system to connect to the running VNC Server session. We will not use this mechanism in this tutorial, but you should be aware of the option.



Return to Table of Contents


Step 3: Take Note of the Pre-installed Software

Step 3a: Note FSL Installation

  • From either the terminal (SSH) connection or from the terminal window on the Desktop GUI inside your browser, enter the following commands to see that FSL has been pre-installed for you on your machine instance.

 

$ which fslview
/usr/share/fsl/5.0/bin/fslview
$ fslmerge

Usage: fslmerge <-x/y/z/t/a/tr> <output> <file1 file2 .......> 
 -t : concatenate images in time
 -x : concatenate images in the x direction
 -y : concatenate images in the y direction
 ...
 
$ flirt -version
FLIRT version 6.0

$ fsl 

 

  • See the fsl main window (similar to below) and then exit.




Return to Table of Contents


Step 3b: Note FreeSurfer Installation

  • Enter the following commands:

$ which freesurfer
/usr/local/freesurfer/bin/freesurfer
$ freesurfer

FreeSurfer is a set of tools for analysis and visualization
of structural and functional brain imaging data. FreeSurfer
also refers to the structural imaging stream within the 
FreeSurfer suite.

Users should consult ...

 

  • Be sure to note that you are running the v5.3.0-HCP version of FreeSurfer



Return to Table of Contents


Step 3c: Note Connectome Workbench Installation

  • Enter the following commands:

$ wb_command -version
Connectome Workbench
Version: 1.0
Qt Compiled Version: 4.8.1
Qt Runtime Version: 4.8.1
commit: unknown (NeuroDebian build from source)
commit date: unknown
Compiler: c++ (/usr/bin)
Compiler Version:
Compiled Debug: NO
Operating System: Linux

$ wb_view
  • See the wb_view "splash screen" window

  • Click on the Skip button, then exit from wb_view using File → Exit → Exit.



Return to Table of Contents


Step 3d: Note the HCP Pipelines Installation

  • From either the terminal (SSH) connection or from the terminal window on the Desktop GUI inside your browser, enter the following commands:

 

$ cd ~/tools/Pipelines
$ ls
DiffusionPreprocessing  fMRISurface  FreeSurfer  LICENSE.md ..
...
$ more version.txt
V3.6.0-RCd



Return to Table of Contents


Step 3e: Note All Available Pre-installed Software



Return to Table of Contents


Step 4: Take Note of Available HCP data

  • From either the terminal (SSH) connection or from the terminal window on the Desktop GUI inside your browser, enter the following commands:

$ cd /s3/hcp
$ ls
  • The first time you run this command, it can take quite a while (a few minutes) before you get a full listing. Subsequent commands should give results much more quickly. This has to do with caching of the data in the S3 bucket. If you allow a “significant” period of time to go by between accesses, the first access after that delay will again take a few minutes with subsequent accesses going quickly.

  • Notice that the latest release of HCP data is mounted and available for your use. However, it is read-only data in read-only directories.

  • If you want to run pipelines on this data, you will need to copy some of this data to your “local” EBS disk (or link to it) in order to run pipelines or do other processing with it.

  • On our example configuration, you will have roughly 60GB of available free space on your “local” EBS disk.



Return to Table of Contents


Step 5: Create directory structure on which HCP Pipelines can be run

  • There is a utility that we have made available that should help in creating a directory on your local EBS disk that contains links to data in the read-only /s3/hcp directory. Use the following commands to get and install the utility.

$ cd ${HOME}/tools
$ wget https://github.com/Washington-University/access_hcp_data/archive/v3.0.0.tar.gz
$ tar xvf v3.0.0.tar.gz
$ ln -s access_hcp_data-3.0.0 access_hcp_data
$ cd ${HOME}
  • Next, use the utility to setup a local directory that is ready for us to run HCP pipelines for subjects 100307 and 111413 using the following commands.

Important Notes:

  1. The last command in the code block below should all be entered on one line (or wrapped only by the width of the terminal). Do not press Enter until you've type the entire command.
$ cd ${HOME}
$ ./tools/access_hcp_data/link_hcp_data --source=/s3/hcp --dest=${HOME}/data --subjlist=${HOME}/tools/access_hcp_data/example_subject_list.txt --stage=unproc
  • We are, by default, using the link_hcp_data utility in “verbose” mode. So you should see a lot of informational messages scroll by. It will take a minute or two to complete this step.

  • Feel free to also issue the command ./tools/access_hcp_data/link_hcp_data with no options to see the usage information for the tool.

  • Once the utility finishes, take a quick look at the data in your ${HOME}/data directory. It should look familiar to you as the directory structure for a study with only unprocessed data for 2 subjects in it.



Return to Table of Contents


Step 6: Editing files to run a pipeline stage

This step should be familiar to you as it is very similar to the modifications you made to the PreFreeSurferPipelineBatch.sh script and the SetUpHCPPipeline.sh script in a previous practical. The point is to make similar modifications to adapt these scripts to the configuration of your running EC2 instance.

  • Make copies of these two script files in the ${HOME}/tools/Pipelines/Examples/Scripts directory on your running EC2 instance.

$ cd ${HOME}/tools/Pipelines/Examples/Scripts
$ cp PreFreeSurferPipelineBatch.sh PreFreeSurferPipelineBatch.mine.sh
$ cp SetUpHCPPipeline.sh SetUpHCPPipeline.mine.sh

 

  • Edit your version of the PreFreeSurfer “batch” file to change the StudyFolder, Subjlist, and EnvironmentScript variable settings to look like the following:

StudyFolder="${HOME}/data"
Subjlist="100307 111413"
EnvironmentScript="${HOME}/tools/Pipelines/Examples/Scripts/SetUpHCPPipeline.mine.sh"

 

  • Edit your version of the setup script file to change the values for the FSLDIR, FREESURFER_HOME, HCPPIPEDIR, CARET7DIR environment variables to look like the following:

Important Notes:

  1. Commented out commands very similar to the following are already in the setup script file. If you want to modify those commands instead of entering all new commands yourself, you will need to make sure you add the keyword export to the appropriate places, remove the comment marker (#) from the beginning of the appropriate lines, and carefully check that the values set for the FSLDIR, FREESURFER_HOME, HCPPIPEDIR, and CARET7 variables are as shown below.
# Set up FSL (if not already done so in the running environment)
export FSLDIR="/usr/share/fsl/5.0"
. ${FSLDIR}/etc/fslconf/fsl.sh

# Set up FreeSurfer (if not already done so in the running environment)
export FREESURFER_HOME="/usr/local/freesurfer"
. ${FREESURFER_HOME}/SetUpFreeSurfer.sh > /dev/null 2>&1

# Set up specific environment variables for the HCP Pipeline
export HCPPIPEDIR="${HOME}/tools/Pipelines"
export CARET7DIR="/usr/bin"



Return to Table of Contents


Step 7: Starting up a set of PreFreeSurfer Pipeline jobs

Again, this step should be familiar as it is essentially the same as the test run you did of the PreFreeSurferPiplineBatch.mine.sh script in a previous practical.

  • Issue the following commands:

Important Notes:

  1. There are 2 (two) hyphens in from of runlocal
$ cd ${HOME}/tools/Pipelines/Examples/Scripts
$ ./PreFreeSurferPipelineBatch.mine.sh --runlocal 
  • You should see a number of logging messages indicating that PreFreeSurferPipeline.sh is running.

  • The first of these logging messages should look similar to:

This script must be SOURCED to correctly setup the environment prior to running ...
  • After a few more informational logging messages, you should see something like:

START: ACPCAlignment
Final FOV is:
0.000000 ...
  • As before, we will not wait for this to finish. So you can press Ctrl-C to cancel this run.

  • You can briefly examine the ${HOME}/data/100307 directory to see that the pipeline script has already started creating new directories and data files in your study location.
    ${HOME}/data/100307 should now contain MNINonLinear, T1w, and T2w directories in addition to the release-notes and unprocessed directories that were already there.

  • If, for example, you look in the ${HOME}/data/100307/T1w directory, you should see files named T1w1_gdc.nii.gz and T1w.nii.gz. Note that unlike the files in the
    ${HOME}/data/100307/unprocessed/3T/T1w_MPR1 directory, these files are not symbolic links to files over in the /s3/hcp directory tree.

 

$ cd ${HOME}/data/100307
$ ls
MNINonLinear  release-notes  T1w  T2w  unprocessed
$ cd T1w
$ ls -l
total 62808
drwxrwxr-x 2 hcpuser hcpuser 4096 May 8 17:08 ACPCAlignment
-r-xr-xr-x 1 hcpuser hcpuser 32150559 May 8 17:08 T1w1_gdc.nii.gz
-r-xr-xr-x 1 hcpuser hcpuser 32150559 May 8 17:08 T1w.nii.gz
drwxrwxr-x 2 hcpuser hcpuser 4096 May 8 17:08 xfms
 
$ cd ${HOME}/data/100307/unprocessed/3T/T1w_MPR1
$ ls -l
total 20
lrwxrwxrwx 1 hcpuser hcpuser 59 May 8 16:51 100307_3T_AFI.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_AFI.nii.gz
lrwxrwxrwx 1 hcpuser hcpuser 65 May 8 16:51 100307_3T_BIAS_32CH.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_BIAS_32CH.nii.gz
lrwxrwxrwx 1 hcpuser hcpuser 63 May 8 16:51 100307_3T_BIAS_BC.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_BIAS_BC.nii.gz
lrwxrwxrwx 1 hcpuser hcpuser 74 May 8 16:51 100307_3T_FieldMap_Magnitude.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_FieldMap_Magnitude.nii.gz
lrwxrwxrwx 1 hcpuser hcpuser 70 May 8 16:51 100307_3T_FieldMap_Phase.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_FieldMap_Phase.nii.gz
lrwxrwxrwx 1 hcpuser hcpuser 64 May 8 16:51 100307_3T_T1w_MPR1.nii.gz -> /s3/hcp/100307/unprocessed/3T/T1w_MPR1/100307_3T_T1w_MPR1.nii.gz



Return to Table of Contents


Step 8: Shutdown and Restart of an instance

Step 8a: Shutdown of a running machine instance

  • Log out of any running terminal connections and close any web browser VNC connections to your running instance.

  • Then visit your control panel at http://<your public dns> and press the Logout button

  • Visit your Amazon EC2 Dashboard at https://console.aws.amazon.com/ec2

  • Click on Instances (not INSTANCES) on the left side of the page

  • See your Instance table

  • Select the instance you want to stop by clicking in the selection box to the left of that instance

    • Note: Multiple instances can be selected.

  • Select the drop down button and select Instance State → Stop

    • Note: The Terminate option is equivalent to deleting the machine instance for good. Only use this option if you really want the machine instance to be deleted, not just stopped.

    • The data on the “local” EBS drive connected to an instance generated from the HCP_NITRC AMI is not “ephemeral storage”. It will persist while the machine instance is stopped. Of course, it will not persist if the machine is terminated.

  • Select the Yes, Stop button in the pop-up dialog.



Return to Table of Contents


Step 8b: Restart of a machine instance

  • Visit your Amazon EC2 Dashboard at https://console.aws.amazon.com/ec2

  • Click on Instances (not INSTANCES) on the left side of the page

  • See your Instance table

  • Select the instance you want to start by clicking in the selection box to the left of that instance

  • Select the  drop down button and select Instance State → Start

  • Select the Yes, Start button in the pop-up dialog.



Return to Table of Contents


Important Notes about Stopping and Restarting machine instances:

It is important to stop your machine instance when it is not in use. Amazon charges you for the instance while it is active/running (whether you are actually using it or not). You are not charged for the instance during the time that it is stopped. You are still charged a monthly rental fee for provisioned EBS storage.

When you do restart an instance that has been stopped, you'll find that it has a new Public IP address and a new Public DNS entry. You will have to modify the commands you use to connect to the running instance to take into account this new DNS entry.

Similarly, each time you shut down your running instance, the VNC Server session will be shut down. After you start up the instance again, if you want to run another VNC Server session, you will need to visit your HCP_NITRC control panel at http://<your-public-DNS>, login, and then press the Start Session button to get a VNC Server session restarted.

After restarting your instance, you may find that you no longer have access to HCP S3 bucket at the /s3/hcp mount point. If you try to use a command like:

$ cd /s3/hcp

and receive an error message similar to:

-bash: cd: /s3/hcp: Transport endpoint is not connected

then you will need to remount the S3 bucket. This can be done using the following steps:

  • Visit the Settings tab on the control panel for your instance (visit http:<your-public-DNS>, login, and go to the Settings tab)

  • Make sure the check box to the left of the hcp Public S3 Bucket is checked

  • Press the Apply button.

  • Notice the banner across the top of the page:

  • After remounting, your first access to the data will again take a few minutes. Subsequent accesses will be much faster.

  • There is no need to always remount the S3 bucket. Only do this if you have tried to access the bucket at the /s3/hcp mount point and received an error message.

 


Return to Table of Contents



For the Exploring the Human Connectome Course (Summer 2015), the following steps are optional.

Step 9: Installing StarCluster

If we were to allow the PreFreeSurfer pipeline processing that we started a couple steps back to continue, it would run the PreFreeSurfer processing to completion for subject 100307 before moving on to running the PreFreeSurfer processing for the next subject in our list, 111413. To do this for very many subjects would be very time consuming as the processing would be happening serially (one subject, then the next, then the next, etc.) on this single machine instance.

To make this processing less time consuming and more cost efficient, we can, instead of just running the pipelines on this one Amazon EC2 instance, distribute the jobs across a cluster of EC2 instances.

StarCluster (http://star.mit.edu/cluster) is available from the STAR (Software Tools for Academics and Researchers) program at MIT. StarCluster is a cluster-computing toolkit specifically designed for Amazon's EC2. Installation documentation for StarCluster can be found at http://star.mit.edu/cluster/docs/latest/installation.html.

StarCluster is written in the Python programming language, and your HCP_NITRC instance already has a Python module installed on it (called easy_install) that allows for easy installation of Python packages. Therefore, installing StarCluster is as simple as entering the following command in a terminal connected to your running HCP_NITRC instance (followed by the password for your hcpuser account when prompted).

 

$ sudo easy_install StarCluster
(enter your password e.g. hcppassword when prompted)

The installation process will display a number of messages about installing prerequisite software and should end by returning you to the $ prompt. Note, whenever the $ prompt is used for the remainder of this document, your actual prompt will likely not be just a $. For example, it may include a user name (e.g. hcpuser), a node name (e.g. nitrcce), and your current working directory before the $. So your actual $ prompt might look like: hcpuser@nitrcce:~$

You can verify that the installation was successful by asking for the StarCluster version number with a command like:

$ starcluster --version
StarCluster – (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
 
0.95.6
$

 


Return to Table of Contents


Step 10: Create an AWS Access Key Pair

In order to configure and use StarCluster, you will need an AWS Access Key ID and AWS Secret Access Key for your AWS account. These are a different AWS Access Key Pair than you created for accessing the HCP S3 data. That previous access key pair are associated with your HCP ConnectomeDB account. The pair that you create as part of this step are for access to your Amazon AWS account. The StarCluster software will need to access your AWS account.

To create the necessary AWS key pair, do the following:

  • Visit the AWS console at https://console.aws.amazon.com and login if necessary

  • Near the upper right hand corner, click on the down arrow next to your name and select Security Credentials. When/if you see a pop-up dialog that starts with “You are accessing the security credentials page for your AWS account”, select the Continue to Security Credentials button.

  • Click on the plus sign to the left of the entry Access Keys (Access Key ID and Secret Access Key)

  • Select the Create New Access Key button

  • Choose to Download the Key File and perhaps also select the to Show the Access Key information. Either way, the point is to get and record the Access Key ID and the Secret Key information.

  • Record your Access Key ID and the Secret Key information for use in the next step.

 


Return to Table of Contents


 

Step 11: Setup a cluster for running HCP Pipelines

Step 11a: Supply StarCluster with your AWS credentials

Next, you will need to begin the process of creating and editing a StarCluster configuration file.

  • Start by simply asking StarCluster for help

$ starcluster help
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
 
!!! ERROR - config file /home/hcpuser/.starcluster/config does not exist

Options:
--------
[1] Show the StarCluster config template
[2] Write config template to /home/hcpuser/.starcluster/config
[q] Quit
 
Please enter your selection: 
  • Select option 2 to create a configuration file based on a template

Please enter your selection: 2
 
>>> Config template written to /home/hcpuser/.starcluster/config
>>> Please customize the config template

 

  • Edit the config file to supply your account information:

$ cd ~/.starcluster
$ gedit config

 

  • The configuration file will need to have your Amazon Web Services (AWS) account information added to it, including the AWS Access Key ID and AWS Secret Access Key you created in Step 10.

  • In addition to your Access Key ID and your Secret Access Key, you will need to add your AWS Account ID number to the configuration file. To obtain your Account ID, visit the AWS console (http://console.aws.amazon.com) and login if necessary.

  • Click on the down arrow next to your name in the upper right hand corner of the page, and select My Account. At the very top of the Account page under the heading Account Settings, you should see a field labelled Account Id. This will show your 12 digit Account ID number.

  • Add these 3 pieces of information to the [aws info] section of the StarCluster config file. In your editor with the StarCluster config file open, search for a section that looks like:

[aws info]
# This is the AWS credentials section (required).
# These settings apply to all clusters
# replace these with your AWS keys
AWS_ACCESS_KEY_ID = #your_aws_access_key_id
AWS_SECRET_ACCESS_KEY = #your_secret_access_key
# replace this with your account number
AWS_USR_ID= #your userid
    • Replace #your_aws_access_key_id with your AWS Access Key ID

    • Replace #your_secret_access_key with your AWS Secret Access Key

    • Replace #your userid with your 12 digit Account ID number

  • Save the StarCluster config file you are editing and exit from the editor. 

 


Return to Table of Contents


Step 11b: Creating an Amazon EC2 key pair

StarCluster will be creating and configuring a number of machine instances for you. To do this, in addition to needing access to your account, StarCluster will also need an EC2 key pair to use to connect to and configure EC2 instances on your behalf. Therefore, you must create at least one EC2 key pair to supply to StarCluster via its configuration file.

You can have multiple EC2 key pairs. Each cluster that you create will be associated with one of your key pairs. For now, we will just create a single key pair.

StarCluster itself has a convenient mechanism built in (once it has your AWS account credentials) for creating an EC2 key pair.

  • Issue a StarCluster createkey command like the following.

$ cd
$ mkdir .ssh
$ starcluster createkey mykey -o ~/.ssh/mykey.rsa
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
 
>>> Successfully created keypair: mykey
>>> fingerprint: e9:9a:a8:f6:7f:63:cb:87:40:2e:14:6d:1a:3e:14:e4:9f:9b:f4:43
>>> keypair written to /home/hcpuser/.ssh/mykey.rsa
$
  • This will create a key named mykey and create a file named ~/.ssh/mykey.rsa

  • Include information about this key pair in your StarCluster configuration file by editing the
    ~/.starcluster/config file again and filling in the following sections (or making sure the following sections are already filled in) as follows

[key mykey]
key_location = ~/.ssh/mykey.rsa
...
[cluster smallcluster]
keyname = mykey
  • Note: The [key mykey] section and the [cluster smallcluster] section may already be filled in as shown. So you may not need to make any changes.

  • Save the file you are editing and close the editor.

 


Return to Table of Contents


Step 11c: Start an example cluster

  • Next, we'll start an example cluster just to verify that everything is setup correctly. The cluster we start now will not be one on which we can actually run pipelines, we've got further configuration work to do before we get to that point.

  • Start an example cluster by issuing commands like the following. Note that in using the following commands, you are starting a cluster and giving it the cluster name: mysmallcluster. You are allowing StarCluster to use the default cluster template (which defines the machines that you would like to be in your cluster). Your default cluster template is also set in your StarCluster config file, and should already be set to smallcluster.

Important Notes:

  1. The output supplied while creating the cluster is somewhat long and is not all included below. To confirm the success of this operation, look for the text that reads "The cluster is now ready to use" in the output.
$ starcluster start mysmallcluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Using default cluster template: smallcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 2-node cluster...
>>> Creating security group @sc-mysmallcluster...
>>> Waiting for security group @sc-mysmallcluster...
Reservation:r-77fcd49b
>>> Waiting for instances to propagate...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
.
.
.
>>> Configuring cluster took 1.801 mins
>>> Starting cluster took 3.970 mins

The cluster is now ready to use. To login to the master node
as root, run:
.
.
You can activate a 'stopped' cluster by passing the -x
option to the 'start' command:

    $ starcluster start -x mysmallcluster

This will start all 'stopped' nodes and reconfigure the
cluster.
$

 


Return to Table of Contents


Step 11d: Navigate your example cluster

  • After starting your cluster, visit your EC2 console at https://console.aws.amazon.com/ec2 and view your instance table by selecting the Instances link on the left hand side. You may have to refresh your instance table by clicking on the icon in the upper right.

  • You should see the instance that you created previously, installed StarCluster software on, and used to create a cluster. This instance will likely have no name. If it doesn't have a name, you should give it a name to help distinguish it from other instances. Point your cursor at the Name field for the instance and then click on the pencil icon that appears in the field. You can then fill in the field with a name for your instance (e.g. MyHCP_NITRC) and select the check mark to confirm the change.

  • In addition to your HCP_NITRC instance, you should see 2 new instances which have been created for you, one named master and one named node001. This is yoursmall cluster of computers consisting of one “master” node which you would typically log in to in order to start and control jobs and one “worker” node that the cluster engine software controls and processing jobs would be run on. (Note: In this type of cluster, the master node actually also functions as a worker node. As such, processing jobs can and will also run on the master node.)

Example StarCluster Commands

Now is a good time to become familiar with some basic StarCluster commands. You will issue such StarCluster commands on a terminal connected your HCP_NITRC instance. In the below examples, text enclosed in angle brackets, < >, should be replaced by names that you provide.

Important Notes:

  1.  There is no need to enter the example commands below now. These examples are provided here just to familiarize you with some of the available StarCluster commands and concepts. After the example commands, we will return to steps that you should carry out.
    • To see what clusters you have currently in existence:

starcluster listclusters

 

    • To start a new cluster based on a cluster template defined in your StarCluster config file:
starcluster start -c <template-name> <new-cluster-name>

 

    • To reboot all the nodes in a running cluster:
starcluster restart <running-cluster-name>

    • To stop a running cluster:
starcluster stop <running-cluster-name>

 

    • To terminate a cluster (whether running or not):
starcluster terminate <cluster-name>

NOTE: Stopping a cluster is analogous to turning off the machines. Terminating a cluster is analogous to throwing away the machines. When you terminate, the instances go away and cannot be restarted; they are gone.

 

    • To restart a stopped (not terminated) cluster:
starcluster start -x <cluster-name>

 

    • To login to the master node of a cluster:
starcluster sshmaster <cluster-name>

 

    • To login to one of the worker nodes of a cluster:
starcluster sshnode <cluster-name> <node-name>

 

Important Notes:

  1. The following steps are those you should start carrying out again.
  • On your HCP_NITRC instance, use the starcluster sshmaster command to login to the master node of your cluster named mysmallcluster and place a file in the /home directory 
$ starcluster sshmaster mysmallcluster
# cd /home
# ls
sgeadmin  ubuntu
# echo "hello there" > hello.txt
# ls
hello.txt  sgeadmin  ubuntu
# more hello.txt
hello there
# exit


  • Use the starcluster sshnode command to login to the worker node named node001 of your cluster named mysmallcluster and note that the file you placed in the /home directory while logged in to the master node is available from the worker node.

$ starcluster sshnode mysmallcluster node001
# cd /home
# ls
hello.txt  sgeadmin  ubuntu
# cat hello.txt
hello there
# exit

 


Return to Table of Contents


Step 11e: Terminate your small cluster

  • The machine instances that are part of the cluster you have started are not based upon the HCP_NITRC AMI or upon the machine instance that you have configured to access the S3 HCP open access data. So neither the master nor the worker nodes can run HCP pipelines.

  • Terminate the cluster so that you can move on to configuring a cluster with nodes that can run the HCP Pipelines.

$ starcluster terminate mysmallcluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu
 
Terminate EBS cluster mysmallcluster (y/n)? y
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Terminating node: master (i-5bd7a38d)
>>> Terminating node: node001 (i-5ad7a38c)
>>> Waiting for cluster to terminate... 
>>> Removing security group: @sc-mycluster 
$

 


Return to Table of Contents


Step 11f: Create an instance to use as a model for your pipeline cluster nodes

The HCP_NITRC AMI is currently not configured so that instances created using that AMI can be nodes in a StarCluster managed cluster. So we now need to create an Amazon EC2 instance that will be used as a "template" for creating nodes in a cluster that can run HCP pipelines. We'll create another machine instance that is based on an AMI supplied by the Human Connectome Project called HCP_PipelineClusterNode. The HCP_PipelineClusterNode AMI is a modified version of an early form of the HCP_NITRC AMI. 


We now need to create an Amazon EC2 instance that will be used as a "template" for creating the nodes in a cluster that can run HCP pipelines. The HCP_NITRC AMI is currently not configured so that instances created using that AMI can be nodes in a StarCluster managed cluster. So we'll start by creating another machine instance that is based on an AMI supplied by HCP called HCP_PipelineClusterNode_AMI. The HCP_PipelineClusterNode_AMI is a modified version of the HCP_NITRC AMI.

  • Visit the instance table in your Amazon EC2 console (http://console.aws.amazon.com/ec2 select Instances)
  • Select the Launch Instance button; Select the Community AMIs tab, search for the




We now need to create an Amazon EC2 instance that will be used as a “template” for creating the nodes in a cluster that can run HCP pipelines. We'll start by creating another instance that is based on the HCP_NITRC AMI.

  • Visit the instance table in your Amazon EC2 console (http://console.aws.amazon.com/ec2 → select Instances)

  • Select the Launch Instance button; Select the Community AMIs tab, search for the HCP_NITRC AMI, and press the Select button.

  • Choose the instance type that you want to use for running pipelines.

    This is where you might consider choosing a high memory and high compute power instance type (e.g. c4.4xlarge with 16 CPUs and 30GB of RAM) or a GPU enabled instance type (e.g. g2.8xlarge with 32 CPUs and 60GB of RAM). But you should consider the costs associated with these choices.

    For simplicity in this exercise, we'll just create use one instance type for both the master and worker nodes. Choose m3.medium and select the Next: Configure Instance Details button.

  • Select Next: Add Storage followed by Next: Tag Instance.

  • On the Tag Instance page, give your instance a name by filling in a Value for the Key Name which is already supplied. A name like PipelineNodeTemplate is appropriate.


  • Next, select the Next, Configure Security Group button.

  • On the Configure Security Group page, you will need to add rules to configure the security group just as you did when creating your original HCP_NITRC instance.

  • The figure showing the security rules to configure is repeated here for your reference.


  • Once you have the security rules configured, select the Review and Launch button followed by the Launch button.

  • Selecting the Launch button should cause the Select an existing key pair or create a new key pair pop-up dialog to appear.

  • As we saw earlier, you should use a key pair to control access instances you create. StarCluster will need to access the instance you are currently creating. To simplify things, we'll create a key pair that is intended just for accessing this instance. To do so, change the pull down that says “Choose and existing key pair” to “Create a new key pair”. Supply the new key pair with a name (e.g. PipelineNodeTemplate), and then select the Download Key Pair button.

  • Locate the downloaded file (e.g PipelineNodeTemplate.pem). It will most likely be in the ${HOME}/Downloads folder on your local system as before. You will need to copy this file to your HCP_NITRC instance so that StarCluster running on that instance can access the instance you are creating now. The transfer can be done by using the following commands in a terminal window on your local system (not a terminal connected to your HCP_NITRC instance.) Carefully note that you should be transferring this file to your HCP_NITRC instance not to your PipelineNodeTemplate instance. So the Public DNS to use in the sftp command below is the Public DNS for your original HCP_NITRC instance not the instance you are currently creating.

$ cd <directory-containing-your-PipelineNodeTemplate.pem-file>
$ sftp hcpuser@<your-HCP_NITRC-instance-public-dns>.compute-1.amazon.aws.com
Enter your password (e.g. hcppassword) when prompted
sftp> cd .ssh
sftp> pwd
Remote working directory: /home/hcpuser/.ssh
sftp> put PipelineNodeTemplate.pem
Uploading PipelineNodeTemplate.pem to /home/user/.ssh/PipelineNodeTemplate.pem
PipelineNodeTemplate.pem 100% 1692 1.7KB/s 00.00
sftp> exit
$ 
  • After you've transferred the key file to your HCP_NITRC instance, go ahead and press the Launch Instances button on the pop-up.

  • While the new instance is launching, return to a terminal attached to your HCP_NITRC instance and add the following lines to the ~/.starcluster/config file a few lines after the comment that reads,You can of course have multiple key sections”

# You can of course have multiple key sections
# [key myotherkey]
# KEY_LOCATION=~/.ssh/myotherkey.rsa

[key PipelineNodeTemplate]
KEY_LOCATION=~/.ssh/PipelineNodeTemplate.pem
  • Save the config file and exit from the editor.


Return to Table of Contents


Step 11g: Further prepare your new instance for StarCluster use

  • In a new tab in your browser,

    • Visit your newly created instance (your PipelineNodeTemplate instance) by going to http://<your public DNS>. You will have to visit your Amazon AWS instance table to get the public DNS for the PipelineNodeTemplate instance you just created.

    • Continue through the Security Redirect page by pressing the here link and clicking through the security warnings to proceed to your running instance

  • At the page that looks like:



  • Enter the Amazon Instance ID, a username (e.g. hcpuser), a password (e.g. hcppassword), and an email address, then press the Submit button.

  • Wait for the page to return from pressing Submit and show you the “Your account was successfully configured” message.

  • If you instead receive a message like, “There was an unexpected error creating your account. Please try again”, then press the Submit button again.

  • Once you successfully see the “Your account was successfully configured” page, select the Control Panel button. Visit the Licenses tab and enter your FreeSurfer License information as you did back in Step 2c.

  • The FreeSurfer license in use for the course is repeated here for your convenience. As before, please note that this license is for use in the course and not intended to be your regular FreeSurfer license. Visit https://surfer.nmr.mgh.harvard.edu/registration.html to get your own license.

tsc5yc@mst.edu
7361
 *CMS6c5mP.wmk
 FSQVaStVzhzXA
  • Press the Connect button to connect to your newly running PipelineNodeTemplate instance using Guacamole. Enter your username (e.g. hcpuser) and password (e.g. hcppassword) when prompted, and click on the NITRC_CE Desktop link.

  • In the terminal window now in your browser, enter the following sets of commands.

Turn off the software firewall on your PipelineNodeTemplate instance

  • Use the following commands on your PipelineNodeTemplate instance. (The Amazon Security Groups that you configure for each instance are the virtual firewall for your EC2 instances. Having the additional software firewall included with Ubuntu enabled just adds a layer of confusion when trying to configure your instance and sometimes prevents StarCluster from sharing data across nodes.)

$ sudo ufw disable
Enter your password (e.g. hcppassword) when prompted
Firewall stopped and disabled on system startup
$

Delete gridengine software from your PipelineNodeTemplate instance

  • StarCluster expects the gridengine software to be installed in a particular location and fails to create a cluster node if the software is installed differently. We'll fix this up in the next sub-step.

$ sudo apt-get remove gridengine-client gridengine-common gridengine-master
Enter your password (e.g. hcppassword) if/when prompted
Enter Y when asked if you want to continue to remove

Delete the sgeadmin account and group.

  •  Double check your use of the sudo rm -rf command below to make sure it matches exactly what is written below before you press enter.
$ sudo userdel sgeadmin
Enter your password (e.g. hcppassword) when/if prompted
$ sudo rm -rf /var/lib/gridengine
$ sudo delgroup sgeadmin
  • Do not worry if the system's response to the delgroup command is, “The group `sgeadmin' does not exist.”

Remove the SGE_ROOT setting in the /etc/profile file

  • Use the following command:
$ sudo sed -i 's/export SGE_ROOT/#export SGE_ROOT/g' /etc/profile



Return to Table of Contents


Step 11h: Install SGE files

We need to create another running instance. This instance needs to be based on an officially released StarCluster AMI. We'll need to copy some files from that running instance to our PipelineNodeTemplate instance.

  • Return to your instance table in a browser tab and again select Launch Instance.

  • Select the Community AMIs tab and enter ami-765b3e1f in the search box.

  • Select the Community AMIs tab and enter starcluster-base-ubuntu-12.04-x86_64 in the search box.
  • Select the starcluster-base-ubuntu-12.04-x86_64 AMI (no "hvm" or "lustre" or "public" in the AMI name)

  • Select the t1.micro Instance Type and press the Review and Launch button

  • Select the Launch button

  • Select Choose an existing key pair in the first (upper) pull down menu, and then select your PipelineNodeTemplate key pair in the second (lower) pull down menu. Check the box to acknowledge that you have access to the private key file, and select the Launch Instances button.

  • Visit your instances table again in the browser and get the Public DNS for the instance that you are starting up.

  • When that instance is up and running, login to it via a terminal on your local machine with commands executed from your local machine like the following.

  • In the following,

    • <new-instance-DNS> = the Public DNS for the instance you just created

    • <PipelineNodeTemplate-DNS> = the Public DNS for your PipelineNodeTemplate instance

Create a compressed tar file containing what StarCluster needs

$ ssh -i ~/.ssh/PipelineNodeTemplate.pem root@<new-instance-DNS>
# cd /
# tar cavf opt_starcluster.tar.gz ./opt
# exit

Copy the compressed tar file you just made to your local machine

$ scp -i ~/.ssh/PipelineNodeTemplate.pem root@<new-instance-DNS>:/opt_starcluster.tar.gz ./

Copy the compressed tar file from your local machine to your PipelineNodeTemplate instance

$ scp -i ~/.ssh/PipelineNodeTemplate.pem opt_starcluster.tar.gz root@<PipelineNodeTemplate-DNS>:.

Unpack the compressed tar file and copy its contents to where StarCluster expects it

$ ssh -i ~/.ssh/PipelineNodeTemplate.pem root@<PipelineNodeTemplate-DNS>
# tar xvf opt_starcluster.tar.gz
# mv opt/sge6-fresh /opt
# exit

Terminate the instance you just created based on the StarCluster AMI

  • Visit your Instance table, select only the instance that you just created (it will be of Instance Type t1.micro) (Make sure no other instances are selected!!!)
  • Select Actions → Instance State → Terminate followed by Yes, Terminate.



Return to Table of Contents


Step 11i: Create an EBS volume to hold data to be shared across your cluster

You now need an EBS volume (think of it as a simple Hard Disk Drive) that will contain your data for processing. It would be best if this volume be independent of any particular EC2 instance (machine) whether that instance is part of a cluster or not. That way, if you terminate the instances, your data will persist. We'll create such a volume, and then setup StarCluster so that the created volume gets mounted to all the nodes in the cluster that we create for running pipelines.

  • Return to a terminal logged in to your original HCP_NITRC (MyHCP_NITRC) instance and use StarCluster to create a volume which will be shared between your cluster's master node and all of the cluster's worker nodes.

$ starcluster createvolume --name=mydata 200 us-east-1a --shutdown-volume-host
  • Note the 200 value in the above is the size of the volume to be created in Gigabytes (GB). You should consider changing that value to something larger when you create such a volume later (back home) for your actual use. The us-east-1a is an “availability zone” for your volume. The first part that specification (us-east) should match the region your account is operating within.

  • Creating and formatting the volume can take a while and is somewhat dependent upon the volume size. The size also determines how much you will pay for the volume. So, while you might want to create a bigger volume later, for this exercise you should probably stick with just 200GB.

  • If you visit your AWS console at https://console.aws.amazon.com/ec2 and select the Volumes link on the left, you should be able to (eventually) notice the creation of your 200GB volume named mydata.

  • The creation process will report the volume id for the newly created volume. It will look something like:

.
.
>>> Checking for required remote commands...
>>> Creating 200GB volume in zone us-east-1a
>>> New volume id: vol-4b5b480c
>>> Waiting for vol-4b5b480c to become 'available'...
.
>>> Your new 200GB volume vol-4b5b480c has been created successfully
.
.
  • You'll need this volume id, so take note of it.
  • After the volume is created, edit the ~/.starcluster/config file. In the section after comments that look like:

#############################
## Configuring EBS Volumes ##
#############################
# StarCluster can attach one or more EBS volumes to the master and then
# NFS_share these volumes to all of the worker nodes. ...  
  • Add a volume section that looks like:

[volume mydata]
VOLUME_ID = vol-4b5b480c
MOUNT_PATH = /mydata
  • The volume id value that you use should not be vol-4b5b480c. Instead it should be the id of the volume you just created.
  • Save the file and exit from the editor.



Return to Table of Contents


Step 11j: Create an AMI for cluster nodes

  • Visit your Amazon instance table and get the instance id of the PipelineNodeTemplate instance you've created.

  • Continuing to work in a terminal attached to your HCP_NITRC instance (MyHCP_NITRC), issue the following command to create an Amazon Machine Image based upon your running PipelineNodeTemplate instance.

$ starcluster ebsimage i-12e5e73d pipelineclusterami
  • Substitute your instance id for the i-12e5e73d in the above. (The process of creating an ebsimage from a running instance makes it such that you will no longer be able to access that instance using SSH from a terminal. You will still be able to access it via Guacamole in a browser.)

  • At the end of this AMI creation process, you will be informed of the AMI ID for the AMI that is created for you.

.
.
>>> New EBS AMI created: ami-feb7aa96
>>> Waiting for snapshot to complete: snap-d6a61fa0
Snap-d6a61fa0: | 100% ETA: --:--:-- 
 >>> Waiting for ami-feb7aa96 to become available...
>>> create_image took 7.253 mins
>>> Your new AMI id is: ami-feb7aa96
  • As, you might expect, you'll need this AMI ID in the next sub-step.

 

 

 


Return to Table of Contents


Step 11k: Configure and Start a Pipeline Cluster

Next we'll modify the StarCluster configuration file to create a template for a cluster that is appropriate for running HCP Pipelines. It will use the AMI that we just created as the starting point image (e.g. ami-feb7aa96 above, but yours will be different) for both the master and the worker nodes.

  • On your HCP_NITRC instance (e.g. MyHCP_NITRC), edit the ~/.starcluster/config file and add the following lines just before the section labelled [cluster smallcluster]

[cluster pipelinecluster]
KEYNAME = mykey
CLUSTER_SIZE = 5
NODE_IMAGE_ID = ami-feb7aa96
NODE_INSTANCE_TYPE = m3.medium
VOLUMES = mydata
  • Your entry for NODE_IMAGE_ID will be your AMI ID from the previous substep, not ami-feb7aa96.
  • Save the config file and exit from the editor.

  • Start a cluster based on your pipelinecluster template

$ starcluster start -c pipelinecluster mypipelinecluster
  • Once the cluster is fully running (this could take several minutes), you will receive a “The cluster is now ready to use.” message followed by a summary of starcluster commands.
  • If you visit your Instances table, you should see a node named master and worker nodes named node001, node002, node003, and node004.

  • Use the starcluster sshmaster and starcluster sshnode commands to login to you cluster nodes and verify that the /mydata and the /home directories are shared between the cluster nodes.

$ starcluster sshmaster mypipelinecluster
# cd /home
# ls
hcpuser  sgeadmin  ubuntu
# touch afileinhome.txt
# ls
afileinhome.txt  hcpuser  sgeadmin  ubuntu
# cd /mydata
# ls
lost+found
# touch afileinmydata.txt
# ls
afileinmydata.txt  lost+found
# exit
$ starcluster sshnode mypipelinecluster node001
# cd /home
# ls
afileinhome.txt  hcpuser  sgeadmin  ubuntu
# cd /mydata
# ls
afileinmydata.txt  lost+found

 


Return to Table of Contents


Step 12: Getting the HCP OpenAccess data available to your cluster

You now have a running cluster that has the necessary software installed for running the HCP Pipelines. However, none of the nodes in the cluster (master or workers) have direct access to the HCP OpenAccess S3 data. For this exercise, we will see how to easily copy the data you would like to use for pipeline processing from the HCP OpenAccess S3 bucket to the /mydata directory that is shared between your cluster nodes.

Step 12a: Setting up s3cmd on your master node

S3cmd (http://s3tools.org/s3cmd) is a free command line tool for uploading, retrieving and managing data in an Amazon S3 bucket. S3cmd is pre-installed in the HCP_NITRC AMI. Therefore, it is available to use on your cluster nodes. In particular for our use now, it is available for use on the master node of your cluster.

To configure s3cmd so that it can access the HCP OpenAccess bucket, you will need your AWS Access Key ID and your AWS Secret Access Key that you obtained for accessing S3 bucket, those that you obtained back in Step 1 and are associated with your Connectome DB account. These are not the AWS Key ID and the AWS Secret Access Key that you obtained in Step 10. They are the access key pair created in Step 1.

  • Login to your HCP_NITRC instance (either via Guacamole or via SSH as below)

$ ssh -X -i ${HOME}/Downloads/Amazon1Click.pem hcpuser@<your-HCP_NITRC-Public-DNS>

Substitute the path to your key file for accessing your HCP_NITRC instance for ${HOME}/Downloads/Amazon1Click.pem in the above.

Substitute the Public DNS for your HCP_NITRC instance for <your-HCP_NITRC-Public-DNS> in the above.

Enter the password for the hcpuser account (e.g. hcppassword) when prompted.

  • Use StarCluster to login to your cluster's master node

$ starcluster sshmaster mypipelinecluster
# 
  • Configure s3cmd (Note the access key and secret key to use are the ones you obtained back in Step 1.) Items in angle brackets < > are where you substitute something.
root@master:~# s3cmd --configure

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
 
Access key and Secret key are your identifiers for Amazon S3
Access Key: <your-access-key>
Secret Key: <your-secret-key>
 
Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: <just-press-enter>
Path to GPG program [/usr/bin/gpg]: <just-press-enter>
 
When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP and can't be used if you're behind a proxy
Use HTTPS protocol [No]: <just-press-enter>
 
On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't conect to S3 directly
HTTP Proxy server name: <just-press-enter>
 
New settings:
  Access Key: <your-access-key>
  Secret Key: <your-secret-key>
  Encryption password: 
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name: 
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] y
Please wait...
 
Success. Your access key and secret key worked fine :-)
 
Now verifying that encryption works…
Not configured. Never mind.
 
Save settings? [y/N] y
Configuration saved to '/root/.s3cfg'
root@master:~#
  • Now, you should be able to list the S3 buckets that you have access to by virtue of the credentials you entered.
root@master:~# s3cmd ls
2014-05-15 18:56  s3://hcp-openaccess
2014-05-15 18:57  s3://hcp-openaccess-logs
  • And list the contents of the hcp-openaccess bucket
root@master:~# s3cmd ls s3://hcp-openaccess
                       DIR s3://hcp-openaccess/HCP/
root@master:~# s3cmd ls s3://hcp-openaccess/HCP
ERROR: Access to bucket 'hcp-openaccess' was denied
root@master:~# s3cmd ls s3://hcp-openaccess/HCP/
                       DIR s3://hcp-openaccess/HCP/100307/
                       DIR s3://hcp-openaccess/HCP/100408/
                       DIR s3://hcp-openaccess/HCP/101006/
                       DIR s3://hcp-openaccess/HCP/101107/
                       DIR s3://hcp-openaccess/HCP/101309/
. . .
2015-01-24 21:34         0 s3://hcp-openaccess/HCP/
2015-05-08 08:26      3577 s3://hcp-openaccess/HCP/S500.txt
2015-01-28 08:22       700 s3://hcp-openaccess/HCP/UR100.txt
root@master:~#

Notice that the ls subcommand of s3cmd (s3cmd ls) is a bit picky with regard to whether you include the final / in the name of a directory. Without the /, you get an access denied error. With the /, you can see the subdirectories containing subject data.

 


Return to Table of Contents


Step 12b: Retrieving data to process from the HCP OpenAccess S3 Bucket

  • Just as there was a utility available to help creating a link structure (see Step 5), there is a utility available to help retrieve copies of data from the HCP OpenAccess S3 bucket using s3cmd.

  • While logged in to the master node of your cluster, enter the following commands to get data for a couple subjects

# cd /mydata
# wget https://github.com/Washington-University/access_hcp_data/archive/v3.0.0.tar.gz
# tar xvf v3.0.0.tar.gz
# /mydata/access_hcp_data-3.0.0/sync_hcp_data --subjlist=/mydata/access_hcp_data-3.0.0/example_subject_list.txt --dest=/mydata --stage=unproc
  • Once this command completes (which could take a few minutes), you will have subject data for two subjects 100307 and 111413 in your /mydata directory. Recall that this /mydata directory is shared across all nodes in your cluster.

  • While you are waiting for the command to complete, visit the following page in a browser
    https://sagebionetworks.jira.com/wiki/display/SCICOMP/Configuration+of+Cluster+for+Scientific+Computing.
    That page contains a simple diagram that illustrates our AWS and StarCluster configuration. With a few minor adjustments, the illustration in the Overview section of that page shows our configuration.

    • The node that is labeled admin in the illustration is equivalent to the HCP_NITRC instance we created in the early parts of this practical (a.k.a. MyHCP_NITRC).

    • The disk icon in the illustration shows that the NFS Mounted EBS Volume is available at a mount point called /shared. Our NFS Mounted EBS Volume is available at a mount point called /mydata instead.

    • Not shown, is that the /home directory is also shared between the master and the worker nodes.

    • Our current cluster only has 4 worker nodes (node001 … node004) instead of the 999 nodes shown in the diagram.

 


Return to Table of Contents


Step 13: Editing files to run a pipeline stage

Once again, this step should be familiar as you are editing the PreFreeSurferPipelineBatch.sh script and the SetUpHCPPipeline.sh script to match your cluster configuration.

  • To do the file editing that you need to do on the master node, you either need to use an “in-terminal” editor like vi or nano or modify your use of the starcluster sshmaster command to connect to the master node of your cluster. If you are comfortable using vi or nano, go ahead and make the edits that way. Otherwise, exit from your current connection to the master node and issue a new connection command as follows:

$ starcluster sshmaster -X mypipelinecluster
  • Note the addition of the -X after sshmaster. This addition will allow you to use the gedit text editor if you prefer.

nano is a relatively user-friendly editor that, like vi, doesn't need to open a separate window on your screen in which to edit files. Instead it uses your terminal window. To invoke nano just use a command like nano PreFreeSurferBatch.mine.sh. Navigation and editing of text is straightforward. Use the arrow keys to move around in the file; use the Delete or Backspace keys for deleting text; and add new text by simply typing. Once you have made the necessary chagnes, press Ctrl-X to exit the editor, answer Y when prompted to save the buffer, and press Enter when asked for the name of the file to write before exiting.

  • On your master node, make copies of these two script files to versions you will edit.

# cd /home/hcpuser/tools/Pipelines/Examples/Scripts
# cp PreFreeSurferPipelineBatch.sh PreFreeSurferPipelineBatch.mine.sh
# cp SetUpHCPPipeline.sh SetUpHCPPipeline.mine.sh
  • Edit your version of the “batch” file (PreFreeSurferPipelineBatch.mine.sh) to change the StudyFolder, Subjlist, and EnvironmentScript variable settings to look like the following:
StudyFolder=/mydata
Subjlist="100307 111413"
EnvironmentScript="/home/hcpuser/tools/Pipelines/Examples/Scripts/SetUpHCPPipeline.mine.sh"
  • Further down in the batch script there are lines that look like:

#if [ X$SGE_ROOT != X ] ; then
#    QUEUE="-q long.q"
    QUEUE="-q hcp_priority.q"
#fi
  • Change the queue specification in that section to the queue named all.q. So that code should look like:

#if [ X$SGE_ROOT != X ] ; then
#    QUEUE="-q long.q"
    QUEUE="-q all.q"
#fi
  • Even further down in the batch script (close the end of the file) there are lines that look like:

  if [ -n "${command_line_specified_run_local}" ] ; then
      echo "About to run ${HCPPIPEDIR}/PreFreeSurfer/PreFreeSurferPipeline.sh"
      queuing_command=""
  else
      echo "About to use fsl_sub to queue or run ${HCPPIPEDIR}/PreFreeSurfer/PreFreeSurferPipeline.sh"
      queuing_command="${FSLDIR}/bin/fsl_sub ${QUEUE}"
  fi
  • Change the else clause by substituting qsub for ${FSLDIR}/bin/fsl_sub so that the code looks like the following:
  if [ -n "${command_line_specified_run_local}" ] ; then
      echo "About to run ${HCPPIPEDIR}/PreFreeSurfer/PreFreeSurferPipeline.sh"
      queuing_command=""
  else
      echo "About to use qsub to queue ${HCPPIPEDIR}/PreFreeSurfer/PreFreeSurferPipeline.sh"
      queuing_command="qsub ${QUEUE}"
      queuing_command+=" -o ${HCPPIPEDIR}/Examples/Scripts/${Subject}.PreFreeSurfer.stdout.log"
      queuing_command+=" -e ${HCPPIPEDIR}/Examples/Scripts/${Subject}.PreFreeSurfer.stderr.log"
  fi
  • Edit your version of the setup script file to change the values for FSLDIR, FREESURFER_HOME, HCPPIPEDIR, CARET7DIR environment variables to look like the following:
# Set up FSL (if not already done so in the running environment)
export FSLDIR="/usr/share/fsl/5.0"
. ${FSLDIR}/etc/fslconf/fsl.sh
 
# Set up FreeSurfer (if not already done so in the running environment)
export FREESURFER_HOME="/usr/local/freesurfer"
. ${FREESURFER_HOME}/SetUpFreeSurfer.sh > /dev/null 2>&1
 
# Set up specific environment variables for the HCP Pipeline
export HCPPIPEDIR="/home/hcpuser/tools/Pipelines"
export CARET7DIR="/usr/bin"
  • Finally, you will need to edit the actual PreFreeSurferPipeline.sh script (located at /home/hcpuser/tools/Pipelines/PreFreeSurferPipeline.sh).
  • After the header comments in this script file (lines starting with #), the first actual non-comment line is:

set -e
  • Immediately after that line, add the following two lines of code

EnvironmentScript="/home/hcpuser/tools/Pipelines/Examples/Scripts/SetUpHCPPipeline.mine.sh"
. ${EnvironmentScript}

Note that we are making these edits in order to run the PreFreeSurfer portion of Structural Preprocessing. Similar edits to example batch files (e.g. FreeSurferPipelineBatch.sh, GenericfMRISurfaceProcessingPipelineBatch.sh, DiffusionPreprocessingBatch.sh, etc.) would be necessary in order to run those pipelines on your cluster. Edits similar to the one to PreFreeSurferPipeline.sh would also be necessary to files like FreeSurferPipeline.sh, DiffPreprocPipeline.sh, etc.) to run those pipelines on your cluster.

(If you don't want to lose your edits to the Pipeline script files when your cluster is terminated, you should consider moving the entire /home/hcpuser/tools directory over to somewhere in the /mydata directory. This will put the scripts and your changes to them on the shared volume that persists beyond the life of any given instance. You will need to modify the paths specified in your script files accordingly.)

 


Return to Table of Contents


Step 14: Starting up a set of PreFreeSurfer Pipeline jobs

  • From the master node of your cluster, issue the following commands

# cd /home/hcpuser/tools/Pipelines/Examples/Scripts
# ./PreFreeSurferPipelineBatch.mine.sh
  • Note that there is no --runlocal option being used.
  • The output to your terminal window should show you jobs being submitting and include text showing you all the command line parameters supplied to the PreFreeSurferPipeline.sh script.

  • Look in the on screen output for confirmation lines that look something like:

...
Your job n ("PreFreeSurferPipeline.sh") has been submitted
...
Your job n+1 ("PreFreeSurferPipeline.sh") has been submitted
...
  • Do an ls command to see the log files being produced by your jobs
# ls *.log
100307.PreFreeSurfer.stderr.log  100307.PreFreeSurfer.stdout.log
111413.PreFreeSurfer.stderr.log  111413.PreFreeSurfer.stdout.log
#
  • These are the standard output (stdout) and standard error (stderr) files being produced by your pipeline jobs submitted to the cluster queue. You can use the more command to see the contents of the files. (e.g. more 100307.PreFreeSurfer.stdout.log)

  • To see the status of your jobs use the qstat command

# qstat
job-ID  prior   name       user         state submit/start at     queue                         slots
-------------------------------------------------------------------------------------------------------
     20 0.55500 PreFreeSur root         r     05/20/2015 16:00:44 all.q@master                      1
     21 0.55500 PreFreeSur root         r     05/20/2015 16:00:44 all.q@node004                     1
  • The left hand column of the qstat output provides the job IDs for the jobs you currently have queued or running. In the example above, the job IDs are 20 and 21. Your job IDs will probably be 1 and 2.

  • The state value for a job tells you whether it is running (r) or queued and waiting (qw) or any number of other states.

  • You can use the job ID to get further information about a job by supplying the -j option and the job ID number to the qstat command. For example:

# qstat -j 20
  • You can also use the job ID to delete a running job if necessary using the qdel command. For example:

# qdel 20

 


Return to Table of Contents


Step 15: Using the StarCluster load balancer

As you might imagine there can be disadvantages to keeping worker nodes of your cluster running even when they are not being used. In our example so far, we have created a cluster that contains one master node and 4 worker nodes, but we only have 2 jobs running. So at most we really need only 2 nodes right now.

To lower costs, we can take advantage of the StarCluster load balancer. The StarCluster load balancer can observe the job queue for a cluster and start new worker nodes or remove worker nodes from the cluster based on demand.

The load balancer is an experimental feature of StarCluster. To allow the use of an experimental feature, you must edit the .starcluster/config file (on your HCP_NITRC instance, the one on which you have StarCluster installed and from which you started the cluster, not the master node of the cluster on which you were editing scripts in the previous step.)

In the [global] section of your .starcluster/config file include the following line

ENABLE_EXPERIMENTAL=True

You should be able to do this by simply removing the comment marker (#) from a line in the config file that already looks like:

#ENABLE_EXPERIMENTAL=True

Once you have enabled experimental features and have a cluster up and running (e.g. mypipelinecluster), you can start the load balancer for the cluster by issuing the following command:

$ nohup starcluster loadbalance -m 20 -n 3 mypipelinecluster & 

 

The -m option specifies the maximum number of nodes in your cluster and the -n option specifies the minimum number of nodes in your cluster. You will need to press enter twice to return to the system prompt.

To find out the process ID of your load balancer issue a command like the following

$ ps -ef | grep loadbalance
hcpuser  24161 20520  1 18:16 pts/1    00:00:03 /usr/bin/python /usr/local/bin/starcluster loadbalance mypipelinecluster
hcpuser  24243 20520  0 18:21 pts/1    00:00:00 grep --color=auto loadbalance

The first numeric entry in the output line that ends with mypipelinecluster and after the hcpuser text (in the example above the number 24161) is the process ID of your load balancer process. To stop the load balancer issue a command like:

$ kill -9 24161

Of course, you will need to substitute your loadbalancer process ID for 24161 in the above.

If you allow the load balancer to continue to run and only have the PreFreeSurfer jobs running for two subjects as we have started in the previous steps, then when you visit your Instance Table in a browser you will likely see the worker nodes that are not being used by your running jobs have been terminated. It can take in the neighborhood of 30 minutes before nodes will be terminated. If you have more jobs queued to run than there are nodes available to run them on (and this situation lasts for a while), the load balancer will (eventually) add new nodes to your cluster.

If your cluster is using spot instances for worker nodes (see the next step), the load balancer will also use spot instances for worker nodes that it adds to your cluster.


Return to Table of Contents


Step 16: Using spot instances as worker nodes

To lower costs even further, we can take advantage of the spot instance mechanism of Amazon AWS. The spot instance mechanism is a way for you to bid on Amazon EC2 instances such that instances are run only when your bid exceeds the current Spot Price for the instance type that you want to use.

Amazon's documentation at http://aws.amazon.com/ec2/purchasing-options/spot-instances/ describes spot instances as follows:

Spot Instances are spare Amazon EC2 instances for which you can name your own price. The Spot Price is set by Amazon EC2, which fluctuates in real-time according to Spot Instances supply and demand. When your bid exceed the Spot Price, your Spot instance is launched and your instance will run until the Spot Price exceeds your bid (a Spot interruption) or you choose to terminate them. …

To use Spot Instances, you place a Spot Instance request that specifies the instance type, the Availability Zone desired, the number of Spot Instances desired, and the maximum price you are willing to pay per instance hour (your bid).

To determine how that maximum price compares to past Spot Prices, the Spot Price history for the past 90 days is available via the Amazon EC2 API and the AWS Management Console. ...

Starting a StarCluster cluster using spot instances as your worker nodes is as simple as using the -b option when starting your cluster from your HCP_NITRC instance. For example:

$ starcluster start -c pipelinecluster -b 0.50 myspotpipelinecluster

The above command would start a cluster named myspotpipelinecluster with a bid of $0.50 per hour for each worker node. (By default, StarCluster will not use spot instances for the master node of a cluster. It is unlikely that you would want your master node to be stopped if the current price exceeds your bid.) But how do you decide what to bid for your worker nodes? As is noted in the quote above, Amazon makes available spot bid history for instance types. StarCluster provides an easy command for viewing that history.

For example:

$ starcluster spothistory m3.medium
. . .
>>> Fetching spot history for m3.medium (VPC)
>>> Current price: $0.1131
>>> Max price: $0.7000
>>> Average price: $0.1570

Adding the -p option (e.g. starcluster spothistory -p m3.medium) will launch a web browser tab and supply you with a graph of the spot price over the last 30 days. (It may take a while to generate the graph and open the browser, so you may not want to do this during class.)

It is very useful to take note of the warning message that you get when starting a cluster using spot instances.

$ starcluster start -c pipelinecluster -b 0.50 myspotpipelinecluster
 .
 .
 .
 *** WARNING - ************************************************************
 *** WARNING - SPOT INSTANCES ARE NOT GUARANTEED TO COME UP
 *** WARNING - 
 *** WARNING - Spot instances can take a long time to come up and may not
 *** WARNING - come up at all depending on the current AWS load and your
 *** WARNING - max spot bid price.
 *** WARNING - 
 *** WARNING - StarCluster will wait indefinitely until all instances (5)
 *** WARNING - come up. If this takes too long, you can cancel the start
 *** WARNING - command using CTRL-C. You can then resume the start command
 *** WARNING - later on using the --no-create (-x) option:
 *** WARNING - 
 *** WARNING - $ starcluster start -x myspotpipelinecluster
 *** WARNING - 
 *** WARNING - This will use the existing spot instances launched
 *** WARNING - previously and continue starting the cluster. If you don't
 *** WARNING - wish to wait on the cluster any longer after pressing CTRL-C
 *** WARNING - simply terminate the cluster using the 'terminate' command.
 *** WARNING - ************************************************************

 


Return to Table of Contents


Links and references

Browse Amazon S3 buckets with Ubuntu Linux: http://makandracards.com/makandra/31999-browse-amzon-s3-buckets-with-ubuntu-linux

Expanding the Storage Space of an EBS Volume on Linux: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-expand-volume.html

What Is Amazon EC2?: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html

StarCluster Quick Start: http://star.mit.edu/cluster/docs/latest/quickstart.html

StarCluster Configuration File information: http://star.mit.edu/cluster/docs/latest/manual/configuration.html

Defining StarCluster Templates: http://star.mit.edu/cluster/docs/latest/manual/configuration.html#defining-cluster-templates

Configuration of a Cluster for Scientific Computing: https://sagebionetworks.jira.com/wiki/display/SCICOMP/Configuration+of+Cluster+for+Scientific+Computing

StarCluster Elastic Load Balancer: http://star.mit.edu/cluster/docs/0.93.3/manual/load_balancer.html

 


Return to Table of Contents


  • No labels