.. _References:

References
==============

Prior publications and useful reading relevant to general analysis 
and for each algorithm can be found in the references listed below. 

Understanding the algorithms of  H2O is an integral part of using the
platform correctly and getting the most out of your analysis. 

Below are the citations of seminal articles and articles
demonstrating rigorous application of the algorithms of H2O
This list is not meant to be exhaustive, but provides an
abbreviated syllabus to help develop a strong understanding. 

""""

Recommended Reading
""""""""""""""""""""

Hastie, Trevor, Robert Tibshirani, and J Jerome H Friedman. The
Elements of Statistical Learning.
Vol.1. N.p.: Springer New York, 2001. 
http://www.stanford.edu/~hastie/local.ftp/Springer/OLD//ESLII_print4.pdf

""""

GLM
""""""

Breslow, N E. "Generalized Linear Models: Checking Assumptions and
Strengthening Conclusions." Statistica Applicata 8 (1996): 23-41.

Goldberger, Arthur S. "Best Linear Unbiased Prediction in the
Generalized Linear Regression Model." Journal of the American
Statistical Association 57.298 (1962): 369-375.
http://people.umass.edu/~bioep740/yr2009/topics/goldberger-jasa1962-369.pdf

Guisan, Antoine, Thomas C Edwards Jr, and Trevor Hastie. "Generalized
Linear and Generalized Additive Models in Studies of Species
Distributions: Setting the Scene." Ecological modelling
157.2 (2002): 89-100. 
http://www.stanford.edu/~hastie/Papers/GuisanEtAl_EcolModel-2003.pdf

Nelder, John A, and Robert WM Wedderburn. "Generalized Linear Models."
Journal of the Royal Statistical Society. Series A (General) (1972): 370-384.
http://biecek.pl/MIMUW/uploads/Nelder_GLM.pdf

Snee, Ronald D. "Validation of Regression Models: Methods and
Examples." Technometrics 19.4 (1977): 415-428.

""""

Poisson
"""""""""

Frome, E L. "The Analysis of Rates Using Poisson Regression Models." 
Biometrics (1983): 665-674.
http://www.csm.ornl.gov/~frome/BE/FP/FromeBiometrics83.pdf

""""

Logistic (binomial and multinomial)
"""""""""""""""""""""""""""""""""""""

Press, S James, and Sandra Wilson. "Choosing Between Logistic
Regression and Discriminant Analysis." Journal of the American
Statistical Association 73.364 (April, 2012): 699–705.
http://www.statpt.com/logistic/press_1978.pdf

Pearce, Jennie, and Simon Ferrier. "Evaluating the Predictive
Performance of Habitat Models Developed Using Logistic Regression."
Ecological modelling 133.3 (2000): 225-245.
http://www.whoi.edu/cms/files/Ecological_Modelling_2000_Pearce_53557.pdf

""""""

GBM
""""

Dietterich, Thomas G, and Eun Bae Kong. "Machine Learning Bias,
Statistical Bias, and Statistical Variance of Decision Tree
Algorithms." ML-95 255 (1995).

Elith, Jane, John R Leathwick, and Trevor Hastie. "A Working Guide to
Boosted Regression Trees." Journal of Animal Ecology 77.4 (2008): 802-813

Friedman, Jerome H. "Greedy Function Approximation: A Gradient
Boosting Machine." Annals of Statistics (2001): 1189-1232.

Friedman, Jerome, Trevor Hastie, Saharon Rosset, Robert Tibshirani,
and Ji Zhu. "Discussion of Boosting Papers." Ann. Statist 32 (2004): 
102-107


Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. "Additive
Logistic Regression: A Statistical View of Boosting (With Discussion
and a Rejoinder by the Authors)." The Annals of Statistics 28.2
(2000): 337-407
http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1016218223

""""

Neural Networks
""""""""""""""""

Baldi, Pierre, and Kurt Hornik. "Neural Networks and Principal
Component Analysis: Learning From Examples Without Local Minima."
Neural networks 2.1 (1989): 53-58.

Coolen, A C C. Concepts for Neural Networks. N.p.: Springer, 1998. 13-70.

""""

Tweedie
""""""""""

Dunn, Peter K. "Occurrence and Quantity of Precipitation Can Be
Modelled Simultaneously." International Journal of Climatology 24.10 
(2004): 1231-1239.

""""

K-Means
"""""""""

Napoleon, D, and S Pavalakodi. "A New Method for Dimensionality
Reduction Using KMeans Clustering Algorithm for High Dimensional Data
Set." International Journal of Computer Applications 13.7 (2011): 41-46.

Xiong, Hui, Junjie Wu, and Jian Chen. "K-means Clustering Versus
Validation Measures: A Data- distribution Perspective." Systems, Man,
and Cybernetics, Part B: Cybernetics, IEEE Transactions on 39.2 (2009): 318-331.

""""
