References ========== Prior publications and useful reading relevant to analysis in general and for each algorithm can be found at the references listed below. Understanding the algorithms of H2O is an integral part of using the platform correctly, and getting the most of analysis. Below are the citations of seminal articles, and articles demonstrating rigorous application of the algorithms of H2O. Included are references for algos currently in the works. This list is not meant to be exhaustive, but rather to provide an abbreviated syllabus to help develop a strong understanding. Recommended Reading ------------------- Hastie, Trevor, Robert Tibshirani, and J Jerome H Friedman. The Elements of Statistical Learning. Vol.1. N.p.: Springer New York, 2001. http://www.stanford.edu/~hastie/local.ftp/Springer/OLD//ESLII_print4.pdf Glm --- Breslow, N E. "Generalized Linear Models: Checking Assumptions and Strengthening Conclusions." Statistica Applicata 8 (1996): 23-41. Goldberger, Arthur S. "Best Linear Unbiased Prediction in the Generalized Linear Regression Model." Journal of the American Statistical Association 57.298 (1962): 369-375. http://people.umass.edu/~bioep740/yr2009/topics/goldberger-jasa1962-369.pdf Guisan, Antoine, Thomas C Edwards Jr, and Trevor Hastie. "Generalized Linear and Generalized Additive Models in Studies of Species Distributions: Setting the Scene." Ecological modelling 157.2 (2002): 89-100. http://www.stanford.edu/~hastie/Papers/GuisanEtAl_EcolModel-2003.pdf Nelder, John A, and Robert WM Wedderburn. "Generalized Linear Models." Journal of the Royal Statistical Society. Series A (General) (1972): 370-384. http://biecek.pl/MIMUW/uploads/Nelder_GLM.pdf Snee, Ronald D. "Validation of Regression Models: Methods and Examples." Technometrics 19.4 (1977): 415-428. Poisson ------- Frome, E L. "The Analysis of Rates Using Poisson Regression Models." Biometrics (1983): 665-674. http://www.csm.ornl.gov/~frome/BE/FP/FromeBiometrics83.pdf Logistic (binomial and multinomial) ----------------------------------- Press, S James, and Sandra Wilson. "Choosing Between Logistic Regression and Discriminant Analysis." Journal of the American Statistical Association 73.364 (April, 2012): 699–705. http://www.statpt.com/logistic/press_1978.pdf Pearce, Jennie, and Simon Ferrier. "Evaluating the Predictive Performance of Habitat Models Developed Using Logistic Regression." Ecological modelling 133.3 (2000): 225-245. http://www.whoi.edu/cms/files/Ecological_Modelling_2000_Pearce_53557.pdf GBM --- Dietterich, Thomas G, and Eun Bae Kong. "Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms." ML-95 255 (1995). Elith, Jane, John R Leathwick, and Trevor Hastie. "A Working Guide to Boosted Regression Trees." Journal of Animal Ecology 77.4 (2008): 802-813 Friedman, Jerome H. "Greedy Function Approximation: A Gradient Boosting Machine." Annals of Statistics (2001): 1189-1232. Friedman, Jerome, Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. "Discussion of Boosting Papers." Ann. Statist 32 (2004): 102-107 Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. "Additive Logistic Regression: A Statistical View of Boosting (With Discussion and a Rejoinder by the Authors)." The Annals of Statistics 28.2 (2000): 337-407 http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1016218223 Neural Networks --------------- Baldi, Pierre, and Kurt Hornik. "Neural Networks and Principal Component Analysis: Learning From Examples Without Local Minima." Neural networks 2.1 (1989): 53-58. Coolen, A C C. Concepts for Neural Networks. N.p.: Springer, 1998. 13-70. Tweedie ------- Dunn, Peter K. "Occurrence and Quantity of Precipitation Can Be Modelled Simultaneously." International Journal of Climatology 24.10 (2004): 1231-1239. K-Means ------- Napoleon, D, and S Pavalakodi. "A New Method for Dimensionality Reduction Using KMeans Clustering Algorithm for High Dimensional Data Set." International Journal of Computer Applications 13.7 (2011): 41-46. Xiong, Hui, Junjie Wu, and Jian Chen. "K-means Clustering Versus Validation Measures: A Data- distribution Perspective." Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 39.2 (2009): 318-331.