.. _R_Tutorial:

R Tutorial
==========

This tutorial provides a sample workflow for new users of H\ :sub:`2`\ O's R API.
Readers will learn the basic syntax of H\ :sub:`2`\ O, including importing and parsing files,
specifying a model, and obtaining model output.

New H\ :sub:`2`\ O users should refer to the `quick start guide <http://s3.amazonaws.com/h2o-release/h2o/master/1532/docs-website/newuser/top.html>`_ for additional
instructions on how to run H\ :sub:`2`\ O. The following tutorial assumes that H\ :sub:`2`\ O is installed in R.

Getting Started
"""""""""""""""

R uses an REST API to send functions to H\ :sub:`2`\ O, so a reference object  in R to the H\ :sub:`2`\ O instance is required.
You can start H\ :sub:`2`\ O outside of R and connect to it. You can also launch directly from R, but if you close the R session, the H\ :sub:`2`\ O instance is closed as well. The client object is used to direct R to datasets and models located in
H\ :sub:`2`\ O.

**Launch From R**

By default, if the argument **max_mem_size** is not specified when running **h2o.init()**, the heap size of the H\ :sub:`2`\ O running on 32-bit Java is
1g. On 64-bit Java, the heap size is 1/4 of the total memory available on the machine. For a 32-bit version, the function  runs a check
and suggests an upgrade.

::

 > library(h2o)
 > localH2O <- h2o.init(ip = 'localhost', port = 54321, max_mem_size = '4g')
  Successfully connected to http://localhost:54321 
	R is connected to H2O cluster:
    H2O cluster uptime:         11 minutes 35 seconds 
    H2O cluster version:        2.7.0.1497 
    H2O cluster name:           H2O_started_from_R 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   3.56 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 

	
**Launch From Command Line**

Follow one of the `deployment tutorials <http://docs.0xdata.com/index.html?highlight=deployment>`_ to launch an instance from the command line:
	* on your desktop
	* on ec2 instances
	* on Hadoop servers
	
After launching the H\ :sub:`2`\ O cluster, initialize the connection by taking one node in the cluster and run **h2o.init** with the node's 
IP Address and port in the parentheses.
Note that the IP Address must be on your local machine. For the following example, change **192.168.1.161** to your local host. 
::

 > library(h2o)
 > localH2O <- h2o.init(ip = '192.168.1.161', port =54321)

.. WARNING::
  If the version of the current H\ :sub:`2`\ O instance is not the same as the package version loaded in R,
  a "version mismatch" warning message displays. To fix this issue, update the R package
  or launch an H\ :sub:`2`\ O instance using the jar file from the installed package.

::

	Error in h2o.init():
	Version mismatch! H2O is running version # but R package is version # 
  	

**Cluster Info**

To check the status and health of the H\ :sub:`2`\ O cluster, use **h2o.clusterInfo()** to display an easy-to-read
summary of information about the cluster.

::

  > library(h2o)
  > localH2O = h2o.init(ip = 'localhost', port = 54321)
  > h2o.clusterInfo(localH2O)
  R is connected to H2O cluster:
    H2O cluster uptime:         43 minutes 43 seconds 
    H2O cluster version:        2.7.0.1497 
    H2O cluster name:           H2O_started_from_R 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   3.56 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 


Importing Data
""""""""""""""

**Import File**

The H\ :sub:`2`\ O package consolidates all of the various supported import functions. Although **h2o.importFolder** and **h2o.importHDFS** will still work, these functions are deprecated and should be updated to **h2o.importFile**.

::

  ## To import small iris data file from H\ :sub:`2`\ O's package 
  > irisPath = system.file("extdata", "iris.csv", package="h2o")
  > iris.hex = h2o.importFile(localH2O, path = irisPath, key = "iris.hex")
	|=================================================| 100%

  ## To import an entire folder of files as one data object
  > pathToFolder = "/Users/Amy/0xdata/data/airlines/"
  > airlines.hex = h2o.importFile(localH2O, path = pathToFolder, key = "airlines.hex")
	|=================================================| 100%  

  ## To import from HDFS
  > pathToData = "hdfs://mr-0xd6.0xdata.loc/datasets/airlines_all.csv"
  > airlines.hex = h2o.importFile(localH2O, path = pathToData, key = "airlines.hex")
	|=================================================| 100%
  

**Upload File**

To upload a file from your local disk, **importFile** is recommended. However, you can still run **upload file**.

::

  > irisPath = system.file("extdata", "iris.csv", package="h2o")
  > iris.hex = h2o.uploadFile(localH2O, path = irisPath, key = "iris.hex")
  |====================================================| 100%


Data Manipulation and Description
"""""""""""""""""""""""""""""""""
**Any Factor**

  Determine if any column in a data set is a factor.

::

  > irisPath = system.file("extdata", "iris_wheader.csv", package="h2o")
  > iris.hex = h2o.importFile(localH2O, path = irisPath)
  |===================================================| 100%
  > h2o.anyFactor(iris.hex)
  [1] TRUE


**As Data Frame**

  Convert an H\ :sub:`2`\ O parsed data object into an R data frame
  that can be manipulated using R calls. While this can be very useful, be careful with **as.data.frame** when
  converting H\ :sub:`2`\ O Parsed Data objects. Data sets that are easily and
  quickly handled by H\ :sub:`2`\ O are often too large to be treated
  equivalently well in R. 

::

  > prosPath <- system.file("extdata", "prostate.csv", package="h2o")
  > prostate.hex = h2o.importFile(localH2O, path = prosPath)
   |===================================================| 100%

  > prostate.data.frame<- as.data.frame(prostate.hex)
  > summary(prostate.data.frame)
        ID            CAPSULE            AGE             RACE      
 Min.   :  1.00   Min.   :0.0000   Min.   :43.00   Min.   :0.000  
 1st Qu.: 95.75   1st Qu.:0.0000   1st Qu.:62.00   1st Qu.:1.000  
 	....
  > head(prostate.data.frame)
  	  ID CAPSULE AGE RACE DPROS DCAPS  PSA  VOL GLEASON
	1  1       0  65    1     2     1  1.4  0.0       6
	2  2       0  72    1     3     2  6.7  0.0       7
		....


**As Factor**

  Convert an integer into a non-ordered factor (also
  called an enum or categorical).

::

  > prosPath = system.file("extdata", "prostate.csv", package="h2o")
  > prostate.hex = h2o.importFile(localH2O, path = prosPath)
  |===================================================| 100%
  > prostate.hex[,4] = as.factor(prostate.hex[,4])
  > summary(prostate.hex)
  	ID               CAPSULE          AGE             RACE    DPROS          
 	Min.   :  1.00   Min.   :0.0000   Min.   :43.00   1 :341  Min.   :1.000  
 	1st Qu.: 95.75   1st Qu.:0.0000   1st Qu.:62.00   2 : 36  1st Qu.:1.000  
 		....


**As H2O** 

  Pass a data frame from inside the R environment to the H\ :sub:`2`\ O instance.

::

  > data(iris)
  > summary(iris)
   	Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
	 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 	1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300 
 		....
  > iris.r <- iris
  > iris.h2o <- as.h2o(localH2O, iris.r, key="iris.h2o")
    |===================================================| 100%
  > class(iris.h2o)
  	[1] "H2OParsedData"
	attr(,"package")
	[1] "h2o"


**Assign H2O**

  Create a hex key on the server running H\ :sub:`2`\ O for data sets manipulated in R. 
  For instance, in the example below, the prostate data set was
  uploaded to the H\ :sub:`2`\ O instance and manipulated to remove
  outliers. To save the new data set on the H\ :sub:`2`\ O server so that it can
  be subsequently be analyzed with H\ :sub:`2`\ O without overwriting the original
  data set, use **h2o.assign**.

::
 
  > prosPath = system.file("extdata", "prostate.csv", package="h2o")
  > prostate.hex = h2o.importFile(localH2O, path = prosPath)
      |===================================================| 100%      
  > prostate.qs = quantile(prostate.hex$PSA)
  > PSA.outliers = prostate.hex[prostate.hex$PSA 
  <= prostate.qs[2] | prostate.hex$PSA >=   prostate.qs[10],]
  > PSA.outliers = h2o.assign(PSA.outliers, "PSA.outliers")
  > nrow(prostate.hex) 
  [1] 380 
  > nrow(PSA.outliers)
  [1] 380


**Colnames**

  Obtain a list of the column names in a data set. 

::

  > irisPath = system.file("extdata", "iris.csv", package="h2o")
  > iris.hex = h2o.importFile(localH2O, path = irisPath, key = "iris.hex")
        |===================================================| 100%      
  > colnames(iris.hex)
  [1] "C1" "C2" "C3" "C4" "C5"
  

**Extremes**

 Obtain the maximum and minimum values in real-valued columns. 

::

  > ausPath = system.file("extdata", "australia.csv", package="h2o")
  > australia.hex = h2o.importFile(localH2O, path = ausPath, key = "australia.hex")
   |===================================================| 100%
  > min(australia.hex)
  [1] 0
  > min(c(-1, 0.5, 0.2), FALSE, australia.hex[,1:4])
  [1] -1


**Quantile**

  Request quantiles for an H\ :sub:`2`\ O parsed data set. To request a quantile for a single numeric column, use the column name (for example, **$AGE**). When you request
  for a full parsed data set, **quantile()** returns a matrix that displays
  quantile information for all numeric columns in the data set.
 

::

  > prosPath = system.file("extdata", "prostate.csv", package="h2o")
  > prostate.hex = h2o.importFile(localH2O, path = prosPath)
     |===================================================| 100%
  > quantile(prostate.hex$AGE)


**Summary**

  Generate an R-like summary for each of the columns in a data
  set. For continuous real functions, this produces a summary that includes
  information on quartiles, min, max, and mean. For factors, this
  produces information about counts of elements within each factor
  level. For information on the Summary algorithm, see :ref:`SUMmath`.

::

   > prosPath = system.file("extdata", "prostate.csv", package="h2o")
   > prostate.hex = h2o.importFile(localH2O, path = prosPath)
        |===================================================| 100%
   > summary(prostate.hex)
           ID            CAPSULE            AGE             RACE      
 Min.   :  1.00   Min.   :0.0000   Min.   :43.00   Min.   :0.000  
 1st Qu.: 95.75   1st Qu.:0.0000   1st Qu.:62.00   1st Qu.:1.000  
 	....
   > summary(prostate.hex$GLEASON)
   GLEASON        
 Min.   :0.000  
 1st Qu.:6.000  
 Median :6.000  
 Mean   :6.384  
 3rd Qu.:7.000  
 Max.   :9.000  
   > summary(prostate.hex[,4:6])
   RACE            DPROS           DCAPS          
 Min.   :0.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.000  
 Median :1.000   Median :2.000   Median :1.000  
 Mean   :1.087   Mean   :2.271   Mean   :1.108  
 3rd Qu.:1.000   3rd Qu.:3.000   3rd Qu.:1.000  
 Max.   :2.000   Max.   :4.000   Max.   :2.000  


**H2O Table**

  Summarize information in data. Because H\ :sub:`2`\ O handles such large data sets, 
  it is possible to generate tables that are larger than R's
  capacity. To minimize this risk and enable uninterrupted work,
  **h2o.table** is called inside of a call for **head()** or **tail()**. Within
  **head()** and **tail()**, specify the number of rows in
  the table to return. 

::

  > head(h2o.table(prostate.hex[,3]))
 	   row.names Count
	1        43     1
	2        47     1
	3        50     2
	4        51     3
	5        52     2
	6        53     4

  > head(h2o.table(prostate.hex[,c(3,4)]))
 	  row.names X0 X1 X2
	1        43  1  0  0
	2        47  0  1  0
	3        50  0  2  0
	4        51  0  3  0
	5        52  0  2  0
	6        53  0  3  1


**Generate Random Uniformly Distributed Numbers**

  **h2o.runif()** appends a column of random numbers to an H\ :sub:`2`\ O data
  frame and facilitates creating testing/training data splits for
  analysis and validation in H\ :sub:`2`\ O. 

::

  > prosPath = system.file("extdata", "prostate.csv", package="h2o")
  > prostate.hex = h2o.importFile(localH2O, path = prosPath, key = "prostate.hex")
       |===================================================| 100%
  > s = h2o.runif(prostate.hex)
  > summary(s)
  rnd               
	Min.   :0.001434  
    1st Qu.:0.241275  
    Median :0.496995  
    Mean   :0.489468  
    3rd Qu.:0.740592  
    Max.   :0.994894  

  > prostate.train = prostate.hex[s <= 0.8,]
  > prostate.train = h2o.assign(prostate.train, "prostate.train")
  > prostate.test = prostate.hex[s > 0.8,]
  > prostate.test = h2o.assign(prostate.test, "prostate.test")
  > nrow(prostate.train) + nrow(prostate.test)
  [1] 380


**Split Frame**

  Generate two subsets from an existing H2O data set, according to user-specified ratios that can be used as testing/training sets.
  This is the preferred method of splitting a data frame because it's faster and more stable than running **runif** across entire the data set. However, **runif**
  can be used for customized frame splitting.

::

  > prosPath = system.file("extdata", "prostate.csv", package="h2o")
  > prostate.hex = h2o.importFile(localH2O, path = prosPath, key = "prostate.hex")
         |===================================================| 100%
  > prostate.split = h2o.splitFrame(data = prostate.hex , ratios = 0.75)
  > prostate.train = prostate.split[1]
  > prostate.test = prostate.split[2]
  > summary(prostate.train)
     Length Class         Mode
	[1,] 9      H2OParsedData S4  
  > summary(prostate.test)
    Length Class         Mode
	[1,] 9      H2OParsedData S4  


Running Models
""""""""""""""

**GBM**

  Generate Gradient Boosted Models (GBM), which are used to develop forward-learning ensembles. For information on the GBM algorithm, see :ref:`GBMmath`.

::

  > ausPath = system.file("extdata", "australia.csv", package="h2o")
  > australia.hex = h2o.importFile(localH2O, path = ausPath)
     |===================================================| 100%
  > independent <- c("premax", "salmax","minairtemp", "maxairtemp",
  "maxsst", "maxsoilmoist", "Max_czcs")
  > dependent <- "runoffnew"
  > h2o.gbm(y = dependent, x = independent, data = australia.hex,
  > n.trees = 10, interaction.depth = 3, 
     n.minobsinnode = 2, shrinkage = 0.2, distribution= "gaussian")
       |======================================================| 100%
	IP Address: 127.0.0.1 
	Port      : 54321 
	Parsed Data Key: australia1.hex 

	GBM Model Key: GBM_a3ae2edf5dfadbd9ba5dc2e9560c405d 

	Mean-squared Error by tree:
	 [1] 230760.11 166957.80 124904.30  94031.17  72367.01  57180.17  47092.85
 	[8]  39168.05  34456.00  31095.86  28397.10


*Run multinomial classification GBM on abalone data*

To generate a classification model that uses labels, use a **multinomial** distribution. 

::

  > h2o.gbm(y = dependent, x = independent, data = australia.hex, n.trees
  = 15, interaction.depth = 5,
   n.minobsinnode = 2, shrinkage = 0.01, distribution= "multinomial")
	IP Address: 127.0.0.1 
	Port      : 54321 
	Parsed Data Key: australia1.hex 

	GBM Model Key: GBM_8e4591a9b413407b983d73fbd9eb44cf 

	Confusion matrix:
	Reported on australia1.hex 
        Predicted
	Actual     0 3 6 7 14 16 17 19 20 25 38 43 61 75 82 107 138 150 167 191 200
	  0      115 0 0 0  0  0  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0
	  3        0 1 0 0  0  0  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0
	  6        0 0 1 0  0  0  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0
	  7        0 0 0 2  0  0  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0
	....
	 Totals 120 1 1 2  1  2  2  2  2 31  1  1  1  6  1   1   1   6   1   1   1
        Predicted
	Actual   210 245 300 343 396 400 462 480 514 533 545 600 750 764 840 933 960
 	 0        0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
	 3        0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  	 6        0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  	 7        0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  	 14       0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  	 16       0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
	....
	 Totals   1   1  20   1   1   1   1   1   1   1   1   8   1   1   1   1   1
        Predicted
	Actual   1154 1200 2000 2400 Error
	  0         0    0    0    0 0.000
 	  3         0    0    0    0 0.000
  	  6         0    0    0    0 0.000
 	  7         0    0    0    0 0.000
	....
	Mean-squared Error by tree:
	 [1] 0.9529478 0.9337646 0.9157476 0.8985756 0.8818316 0.8654845 0.8497011
 	[8] 0.8341974 0.8187867 0.8036760 0.7887764 0.7741757 0.7594546 0.7452223
	[15] 0.7309634 0.7168317


**GLM**

  Generate Generalized Linear Models, which are used to develop linear models
  for exponential distributions. Regularization can be applied. For
  information on the GLM algorithm, see :ref:`GLMmath`.


::

  > prostate.hex = h2o.importFile(localH2O, path =
  "https://raw.github.com/0xdata/h2o/master/smalldata/logreg/prostate.csv", 
  key = "prostate.hex")
         |===================================================| 100%
  > h2o.glm(y = "CAPSULE", x = c("AGE","RACE","PSA","DCAPS"), data =
  prostate.hex, family = "binomial", nfolds = 10, alpha = 0.5)
   |=====================================================================| 100%
	IP Address: 127.0.0.1 
	Port      : 54321 
	Parsed Data Key: prostate.hex 

	GLM2 Model Key: GLMModel__a2fdb4e3fdd92e0325141cdbd1bd43e1

	Coefficients:
      AGE      RACE     DCAPS       PSA Intercept 
	 -0.01104  -0.63136   1.31888   0.04713  -1.10896 

	Normalized Coefficients:
      AGE      RACE     DCAPS       PSA Intercept 
	 -0.07208  -0.19495   0.40972   0.94253  -0.33707 

	Degrees of Freedom: 379 Total (i.e. Null);  375 Residual
	Null Deviance:     514.9
	Residual Deviance: 461.3  AIC: 471.3
	Deviance Explained: 0.10404 
	AUC: 0.68875  Best Threshold: 0.328

	Confusion Matrix:
        Predicted
	Actual   false true Error
 	false    127  100 0.441
  	true      51  102 0.333
  	Totals   178  202 0.397

	Cross-Validation Models:
         Nonzeros       AUC Deviance Explained
	Model 1         4 0.6532738        0.048419803
	Model 2         4 0.6316527       -0.006414532
	Model 3         4 0.7100840        0.087779178
	Model 4         4 0.8268698        0.243020554
	Model 5         4 0.6354167        0.153190735
	Model 6         4 0.6888889        0.041892118
	Model 7         4 0.7366071        0.164717509
	Model 8         4 0.6711310        0.004897310
	Model 9         4 0.7803571        0.200384622
	Model 10        4 0.7435897        0.114548543


::

  > myX = setdiff(colnames(prostate.hex), c("ID", "DPROS", "DCAPS", "VOL"))
  > h2o.glm(y = "VOL", x = myX, data = prostate.hex, family = "gaussian", nfolds = 5, 
  alpha = 0.1)
	 |=========================================================| 100%

	IP Address: 127.0.0.1 
	Port      : 54321 
	Parsed Data Key: prostate.hex 

	GLM2 Model Key: GLMModel__b8339af00fbe8951ba0871611c9e42eb

	Coefficients:
  	CAPSULE       AGE      RACE       PSA   GLEASON Intercept 
 	-4.29014   0.29787   4.35557   0.04946  -0.51274  -4.35359 

	Normalized Coefficients:
  	CAPSULE       AGE      RACE       PSA   GLEASON Intercept 
 	-2.10678   1.94424   1.34488   0.98908  -0.55989  15.81292 

	Degrees of Freedom: 379 Total (i.e. Null);  374 Residual
	Null Deviance:     126623.9
	Residual Deviance: 127402  AIC: 11059.1
	Deviance Explained: -0.00615 

	Cross-Validation Models:
        Nonzeros      AIC Deviance Explained
	Model 1        5 685.6101        -0.02827868
	Model 2        5 660.3719        -0.15397511
	Model 3        5 658.0768         0.05826293
	Model 4        5 665.8665         0.05117173
	Model 5        5 683.6276         0.01333543


**K-Means**

  Generate a K-means model, which is a clustering algorithm that allows users to characterize
  data. This algorithm does not rely on a dependent variable. For
  information on the K-Means algorithm, see :ref:`KMmath`

::

  > prosPath = system.file("extdata", "prostate.csv", package="h2o")
  > prostate.hex = h2o.importFile(localH2O, path = prosPath)
  |=========================================================| 100%
  > prostate.km = h2o.kmeans(data = prostate.hex, centers = 10, 
  cols = c("AGE", "RACE", "VOL", "GLEASON"))
    |=========================================================| 100%
  print(prostate.km)
  IP Address: 127.0.0.1 
  Port      : 54321 
  Parsed Data Key: prostate6.hex 

  K-Means Model Key: KMeans2_99fea55be4a22f741df74532d7844bb4

  K-means clustering with 10 clusters of sizes 41, 27, 59, 17, 21, 47, 26, 61, 47, 34

	Cluster means:
        AGE     RACE         VOL  GLEASON
	1  69.73171 1.024390 37.99756098 6.512195
	2  54.48148 1.111111  0.32222222 6.518519
	3  62.59322 1.067797  0.19322034 5.966102
	.....


**Principal Components Analysis**

  Map a set of variables onto a
  subspace using linear transformations. Principle Components Analysis (PCA) is the first step in
  Principal Components Regression. For more information on PCA, 
  see :ref:`PCAmath`.

::

  > ausPath = system.file("extdata", "australia.csv", package="h2o")
  > australia.hex = h2o.importFile(localH2O, path = ausPath)
    |=========================================================| 100%
  > australia.pca = h2o.prcomp(data = australia.hex, standardize = TRUE)
    |=========================================================| 100%
  > print(australia.pca)
  IP Address: 127.0.0.1 
   Port      : 54321 
   Parsed Data Key: australia2.hex 

   PCA Model Key: PCA_90d7162c6d4855392ba1272c2f314bec

   Standard deviations:
   1.750703 1.512142 1.031181 0.8283127 0.6083786 0.5481364 0.4181621 0.2314953
	....

  summary(australia.pca)
  Importance of components:
  ....

**Principal Components Regression**

  Map a set of variables to a
  new set of linearly independent variables. The new set of variables
  are linearly independent linear combinations of the original
  variables and exist in a subspace of lower dimension. This
  transformation is then prepended to a regression model, often
  improving results. For more information on PCA, see :ref:`PCAmath`.

::

  > prostate.hex = h2o.importFile(localH2O, path =
    "https://raw.github.com/0xdata/h2o/master/smalldata/logreg/prostate.csv",
  key = "prostate.hex")
      |=========================================================| 100%
  > h2o.pcr(x = c("AGE","RACE","PSA","DCAPS"), y = "CAPSULE", data =
  prostate.hex, family = "binomial", 
  nfolds = 10, alpha = 0.5, ncomp = 3)
   |==========================================================| 100%

	IP Address: 127.0.0.1 
	Port      : 54321 
	Parsed Data Key: PCAPredict_80069467adfe441c92282ac766f9de7e 

	GLM2 Model Key: GLMModel__a1454a5b8a212d1069376356543a4887

	Coefficients:
      PC0       PC1       PC2 Intercept 
  	3.76219   1.26824  -1.35455  -0.36271 
  	....


Obtaining Predictions
"""""""""""""""""""""

**Predict**

  Apply an H\ :sub:`2`\ O model to a holdout set to obtain predictions
  based on model results. 
  In the examples below, models are generated first, and then the
  predictions for that model are displayed. 

::

  > prostate.hex = h2o.importFile(localH2O, path =
    "https://raw.github.com/0xdata/h2o/master/smalldata/logreg/prostate.csv", 
    key = "prostate.hex")
    |==========================================================| 100%
  > prostate.glm = h2o.glm(y = "CAPSULE", x =
  c("AGE","RACE","PSA","DCAPS"), data = prostate.hex, 
  family = "binomial", nfolds = 10, alpha = 0.5)
      |==========================================================| 100%

  > prostate.fit = h2o.predict(object = prostate.glm, newdata = prostate.hex)
  > (prostate.fit)
  IP Address: 127.0.0.1 
	Port      : 54321 
	Parsed Data Key: GLM2Predict_8b6890653fa743be9eb3ab1668c5a6e9 

  predict        X0        X1
	1       0 0.7452267 0.2547732
	2       1 0.3969807 0.6030193
	3       1 0.4120950 0.5879050
	4       1 0.3726134 0.6273866
	5       1 0.6465137 0.3534863
	6       1 0.4331880 0.5668120


Other Useful Functions
""""""""""""""""""""""

**Get Frame**

  For users that alternate between using the web interface and the R API, or for multiple users accessing the same H\ :sub:`2`\ O,
  this function gives the user the option to create a reference object for a data frame sitting in H\ :sub:`2`\ O  (assuming there's a
  **prostate.hex** in the KV store).

::

  > prostate.hex = h2o.getFrame(h2o = localH2O, key = "prostate.hex")
  

**Get Model**

  For users that alternate between using the web interface and the R API, this function gives the user the option to create a reference object
  for a data frame sitting in H\ :sub:`2`\ O (assuming there's a **GLMModel__ba724fe4f6d6d5b8b6370f776df94e47** model in the KV store).

::

  > glm.model = h2o.getModel(h2o = localH2O, key = "GLMModel__ba724fe4f6d6d5b8b6370f776df94e47")
  > glm.model


**List all H2O Objects**

  Generate a list of all H\ :sub:`2`\ O objects generated
  during a work session, along with each object's byte size. 

::

  > prostate.hex = h2o.importFile(localH2O, path = prosPath, key = "prostate.hex")
      |==========================================================| 100%
  > prostate.split = h2o.splitFrame(prostate.hex , ratio = 0.8)
  > prostate.train = prostate.split[[1]]
  > prostate.train = h2o.assign(prostate.train, "prostate.train")
  > h2o.ls(localH2O)
                                                     Key Bytesize
	1               GBM_8e4591a9b413407b983d73fbd9eb44cf    40617
	2               GBM_a3ae2edf5dfadbd9ba5dc2e9560c405d     1516


**Remove an H2O object from the server where H2O is running**
  
  To remove an H\ :sub:`2`\ O object on the server 
  associated with an object in the R environment, we recommend also removing the object from the R environment.

::

  > h2o.ls(localH2O)
                 Key Bytesize
 1      Last.value.39      448
 2      Last.value.42       73
 3       prostate.hex     4874
 4     prostate.train     4028
 5 prostate_part0.hex     4028
 6 prostate_part1.hex     1432
 
 > h2o.rm(object= localH2O, keys= "prostate.train")
 > h2o.ls(localH2O)
                 Key Bytesize
 1      Last.value.39      448
 2      Last.value.42       73
 3       prostate.hex     4874
 4 prostate_part0.hex     4028
 5 prostate_part1.hex     1432