mlflow.mleap
The mlflow.mleap module provides an API for saving Spark MLLib models using the
MLeap persistence mechanism.
A companion module for loading MLflow models with the MLeap flavor format is available in the
mlflow/java package.
-
exception
mlflow.mleap.MLeapSerializationException Bases:
mlflow.exceptions.MlflowExceptionException thrown when a model or dataframe cannot be serialized in MLeap format
-
mlflow.mleap.add_to_model(mlflow_model, path, spark_model, sample_input) Add the MLeap flavor to an existing MLflow model.
Parameters: - mlflow_model –
mlflow.models.Modelto which this flavor is being added. - path – Path of the model to which this flavor is being added.
- spark_model – Spark PipelineModel to be saved. This model must be MLeap-compatible and cannot contain any custom transformers.
- sample_input – Sample PySpark DataFrame input that the model can evaluate. This is required by MLeap for data schema inference.
>>> import mlflow >>> import mlflow.mleap >>> #set values >>> mlflow_model = ... >>> spark_model = ... >>> model_path_dir = ... >>> sample_input_df = >>> #add MLeap flavor to our MLflow model >>> mlflow.mleap.add_to_model(mlflow_model,model_path_dir, sample_input_df)
- mlflow_model –
-
mlflow.mleap.log_model(spark_model, sample_input, artifact_path) Log a Spark MLLib model in MLeap format as an MLflow artifact for the current run. The logged model will have the MLeap flavor.
Note
The MLeap model flavor cannot be loaded in Python; it must be loaded using the Java module within the
mlflow/javapackage.Parameters: - spark_model – Spark PipelineModel to be saved. This model must be MLeap-compatible and cannot contain any custom transformers.
- sample_input – Sample PySpark DataFrame input that the model can evaluate. This is required by MLeap for data schema inference.
- artifact_path – Run-relative artifact path.
>>> import mlflow >>> import mlflow.mleap >>> import pyspark >>> from pyspark.ml import Pipeline >>> from pyspark.ml.classification import LogisticRegression >>> from pyspark.ml.feature import HashingTF, Tokenizer >>># training DataFrame >>> training = spark.createDataFrame([ ... (0, "a b c d e spark", 1.0), ... (1, "b d", 0.0), ... (2, "spark f g h", 1.0), ... (3, "hadoop mapreduce", 0.0) ], ["id", "text", "label"]) >>># testing DataFrame >>> test_df = spark.createDataFrame([ ... (4, "spark i j k"), ... (5, "l m n"), ... (6, "spark hadoop spark"), ... (7, "apache hadoop")], ["id", "text"]) >>> #Create an MLlib pipeline >>> tokenizer = Tokenizer(inputCol="text", outputCol="words") >>> hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features") >>> lr = LogisticRegression(maxIter=10, regParam=0.001) >>> pipeline = Pipeline(stages=[tokenizer, hashingTF, lr]) >>> model = pipeline.fit(training) >>> #log parameters >>> mlflow.log_parameter("max_iter", 10) >>> mlflow.log_parameter("reg_param", 0.001) >>> #log the Spark MLlib model in MLeap format >>> mlflow.mleap.log_model(model, test_df, "mleap-model")
-
mlflow.mleap.save_model(spark_model, sample_input, path, mlflow_model=<mlflow.models.Model object>) Save a Spark MLlib PipelineModel in MLeap format at a local path. The saved model will have the MLeap flavor.
Note
The MLeap model flavor cannot be loaded in Python; it must be loaded using the Java module within the
mlflow/javapackage.Parameters: - spark_model – Spark PipelineModel to be saved. This model must be MLeap-compatible and cannot contain any custom transformers.
- sample_input – Sample PySpark DataFrame input that the model can evaluate. This is required by MLeap for data schema inference.
- path – Local path where the model is to be saved.
- mlflow_model –
mlflow.models.Modelto which this flavor is being added.
>>> import mlflow >>> import mlflow.mleap >>> #set values as appropriate >>> spark_model = ... >>> model_save_dir = ... >>> sample_input_df = ... >>> #save the spark MLlib model in MLeap flavor >>> mlflow.mleap.save_model(spark_model, sample_input_df, model_save_dir)