Machine-learning with the Shogun Toolbox

by Samuel S. Shepard


Steps

  1. Introductory slides may be downloaded for your reference.
    1. Watch the SVM Introduction video for part 1.
    2. Watch the Shogun Installation video for part 2.
    3. Watch the Lab Introduction video for part 3.
    4. Watch the Lab Exercises video for part 4.

  2. Installing Octave and the Shogun Toolbox.
    1. Mac OS X users compilation and installation procedure.
      1. Download the MacPorts appropriate for your version of OS X, run the installer.
      2. Open the Mac OS X Terminal application, which can be found in /Applications/Utilities.
      3. Run the following commands as shown:
        sudo port selfupdate
        sudo port install octave
        
        sudo port install swig -php5 -ruby -perl +python +octave
        sudo port install shogun +octave
        
      4. Each install command may take a long time.

    2. Linux users compilation and installation procedure.
      1. Download the latest stable Octave source code (.gz file assumed below) and install it:
        tar -xzf octave*gz
        cd octave*
        ./configure
        make
        sudo make install
        cd .. 
        
      2. Download the SWIG source and install it:
        tar -xzf swig*gz
        cd swig*
        ./configure
        make
        sudo make install
        cd ..
        
      3. Download the Shogun source and install it:
        bzip2 -cd shogun*bz2 | tar xvf -
        cd shogun*/src/
        ./configure
        make
        sudo make install
        
      4. Takes a long time to make. If Shogun fails to work in Octave (file not found) do the following command (replace /usr/local/lib with the Octave install directory if different) and re-open your terminal connection:
        echo "LD_LIBRARY_PATH=/lib:/usr/local/lib:$LD_LIBRARY_PATH" >> ~/.bash_profile
        

    3. Windows users compilation and installation procedure.
      1. Cygwin is a Linux compatibility layer for Windows. Use the setup.exe file to install it. Any working mirror should be okay when you are prompted. Make sure to select the "Devel" as well as all octave related software (under Math) for installation when you are prompted. Installation will take a long time!
      2. Download the Shogun source. Move the downloaded file to C:\cygwin\home\account_name from your download location.
      3. Open your Cygwin bash shell and execute:
      4. tar -xvjf shogun*bz2
        cd shogun*/src
        ./configure
        make
        make DESTDIR= install
        

    4. Binary versions (pre-compliled) of Octave are available from Octave-Forge, but one must still compile Shogun and install it (you are on your own if you go this route).

  3. Download the example excel file for the SVM training data.
    1. Look at the data worksheet. Look at point 1. Each column x1, x2, x3, ... is a prediction score the data points. Positive scores are prediction to be exonic while negative scores are predicted to be intronic for the respective column's gene prediction scheme. The class or label on the far right describes the true classification (-1 or +1 for introns and exons respectively). Using the x1, what do you predict point 1 to be? Now try using x1 and x2. Now trying using all 10 attributes. What is your conclusion? How did it change as you added more data? Was it difficult to figure out?
    2. Open the worksheet called "1d". This is zoomed (close up) data for the prediction scores of the x1 column. The two classes are shown in red and blue. Can you pick a horizontal line to separate the blue and red data? Now look at the further zoomed "1d +/-10" worksheet. Does your opinion hold?
    3. Open the worksheet called "2d". This represents the prediction scores from columns x1 and x2 for each data point (slightly zoomed). Can you draw a line to separate the two classes of red and blue? Open "2d 100x100" then the worksheet "2d 10x10". How does your opinion change?
    4. By now you should see that finding the "just right" rules to make a decision bounary is difficult to do by hand.

  4. Try out Shogun/Octave on your personal computer (after following step 2 for your OS of choice).
    1. Download the exercise files and move them to the working directory of your Terminal/Cygwin Shell (type pwd to find your current working directory). Unzip them and move to the created directory, then launch Octave:
      unzip svm_lab.zip
      cd svm_lab
      octave
      
    2. Run the Sigmoid kernel script from Octave: octave sigmoid.m
    3. Use a plain text editor (TextEdit, Vim editor, Notepad, etc.) to change the coef0 and gamma variables (try non-negative real numbers) in the sigmoid.m script. Re-run it and report the new parameters and values you got in your homework.
    4. Run the Gaussian kernel script from Octave: run gaussian.m
    5. Change the C variable. Run the gaussian.m script again and report the new parameter and result in your homework.
    6. Run the Polynomial kernel script from Octave: run polynomial.m
    7. Change the degree variable. Run the script again and report your changes and the resulting values.
    8. Which SVM kernel did the best job (use the average of the exon and intron accuracy to define "best") under the default script parameters? Which one took the longest to run? When you altered the parameters for the kernels, did it improve, reduce, or basically make no difference to the accuracy? Report each answer in your homework.
    9. Type exit to quit Octave from the command-line.


Last updated: 2.2011  |  Author: Samuel S. Shepard, Ph.D.  |  Contact:  sammysheep@gmail.com