»
配置与身份验证 »
DriverlessAI 配置 »
Nlp 配置
Edit on GitHub

Nlp 配置¶

`enable_tensorflow_textcnn`¶

Enable word-based CNN TensorFlow transformers for NLP (String)

默认值 'auto'

如果启用了 TensorFlow，是否将基于 Word 的 CNN TensorFlow 模型的折外预测用作 NLP 的转换器

`enable_tensorflow_textbigru`¶

Enable word-based BiGRU TensorFlow transformers for NLP (String)

默认值 'auto'

如果启用了 TensorFlow，是否将基于 Word 的 Bi-GRU TensorFlow 模型的折外预测用作 NLP 的转换器

`enable_tensorflow_charcnn`¶

Enable character-based CNN TensorFlow transformers for NLP (String)

默认值 'auto'

如果启用了 TensorFlow，是否将字符级 CNN TensorFlow 模型的折外预测用作 NLP 的转换器

`enable_pytorch_nlp_transformer`¶

Enable PyTorch transformers for NLP (String)

默认值 'auto'

是否使用预训练的 PyTorch 模型作为 NLP 任务的转换器。在预训练的嵌入顶部拟合线性模型。需要网络连接。默认 ‘自动’ 表示禁用。要启用，请设置为 ‘开启’ 。强烈推荐使用 GPU。

`pytorch_nlp_transformer_max_rows_linear_model`¶

Max number of rows to use for fitting the linear model on top of the pretrained embeddings. (Number)

默认值 50000

行数越多，拟合过程越慢。推荐值小于 100000。

`enable_pytorch_nlp_model`¶

Enable PyTorch models for NLP (String)

默认值 'auto'

是否使用预训练的 PyTorch 模型，并针对 NLP 任务对这些模型进行微调。需要网络连接。默认 ‘自动’.表示禁用。要启用，请设置为 ‘开启’.。这些模型仅使用第一个文本列，训练可能会比较慢。强烈推荐使用 GPU。

`pytorch_nlp_pretrained_models`¶

Select which pretrained PyTorch NLP model(s) to use. (List)

默认值 ['bert-base-uncased', 'distilbert-base-uncased', 'bert-base-multilingual-cased']

选择要使用的预训练的 PyTorch NLP 模型。非默认模型可能不支持 MOJO。需要网络连接。仅当用于 NLP 的 PyTorch 模型或转换器设置为 ‘开启’ 时。

`tensorflow_max_epochs_nlp`¶

Max. TensorFlow epochs for NLP (Number)

默认值 2

TensorFlow 模型用于创建 NLP 特征的最大时期数

`enable_tensorflow_nlp_accuracy_switch`¶

Accuracy above enable TensorFlow NLP by default for all models (Number)

默认值 5

当 TensorFlow NLP 转换器设置为自动时，在文本主导型问题的实验开始时，Accuracy 设置为等于或超过该值，将添加以下所有已启用的 TensorFlow NLP 模型。如果设置为开启，则忽略该参数。否则，在准确度较低时，TensorFlow NLP 转换只会作为突变创建。

`tensorflow_nlp_pretrained_embeddings_file_path`¶

Path to pretrained embeddings for TensorFlow NLP models. If empty, will train from scratch. (String)

默认值 ''

TensorFlow NLP 模型的预训练嵌入的路径可以是本地文件系统中的路径，也可以是 S3 位置 (s3://)。例如，下载并解压 https://nlp.stanford.edu/data/glove.6B.zip tensorflow_nlp_pretrained_embeddings_file_path = /path/on/server/to/glove.6B.300d.txt

`tensorflow_nlp_pretrained_s3_access_key_id`¶

S3 access key Id to use when tensorflow_nlp_pretrained_embeddings_file_path is set to an S3 location. (String)

默认值 ''

`tensorflow_nlp_pretrained_s3_secret_access_key`¶

S3 secret access key to use when tensorflow_nlp_pretrained_embeddings_file_path is set to an S3 location. (String)

默认值 ''

`tensorflow_nlp_pretrained_embeddings_trainable`¶

For TensorFlow NLP, allow training of unfrozen pretrained embeddings (in addition to fine-tuning of the rest of the graph) (Boolean)

默认值 False

允许训练神经网络图的所有权重，包括预训练的嵌套层的权重。如果禁用此项设置，则将冻结嵌套层。但是，仍将对所有其他权重进行微调。

`pytorch_tokenizer_parallel`¶

pytorch_tokenizer_parallel (Boolean)

默认值 True

是否并行处理 BERT 模型/转换器的令牌化。

`pytorch_nlp_fine_tuning_num_epochs`¶

Number of epochs for fine-tuning of PyTorch NLP models. (Number)

默认值 -1

PyTorch NLP 模型的微调时期数。值越大，准确度越高，但需要的训练时间越长。

`pytorch_nlp_fine_tuning_batch_size`¶

Batch size for PyTorch NLP models. -1 for automatic. (Number)

默认值 -1

PyTorch NLP 模型的批处理大小。模型和批处理大小越大，内存使用越多。

`pytorch_nlp_fine_tuning_padding_length`¶

Maximum sequence length (padding length) for PyTorch NLP models. -1 for automatic. (Number)

默认值 -1

PyTorch NLP 模型的最大序列长度（填充长度）。模型和填充长度越大，内存使用越多。

`pytorch_nlp_pretrained_models_dir`¶

Path to pretrained PyTorch NLP models. If empty, will get models from S3 (String)

默认值 ''

Path to pretrained PyTorch NLP models. Note that this can be either a path in the local file system (/path/on/server/to/bert_models_folder), an URL or a S3 location (s3://). To get all models, download http://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/pretrained/bert_models.zip and unzip and store it in a directory on the instance where DAI is installed. pytorch_nlp_pretrained_models_dir = /path/on/server/to/bert_models_folder

`pytorch_nlp_pretrained_s3_access_key_id`¶

S3 access key Id to use when pytorch_nlp_pretrained_models_dir is set to an S3 location. (String)

默认值 ''

`pytorch_nlp_pretrained_s3_secret_access_key`¶

S3 secret access key to use when pytorch_nlp_pretrained_models_dir is set to an S3 location. (String)

默认值 ''

`text_fraction_for_text_dominated_problem`¶

Fraction of text columns out of all features to be considered a text-dominated problem (Float)

默认值 0.3

要视为文本主导型问题的所有特征中文本列的分数比。

`text_transformer_fraction_for_text_dominated_problem`¶

Fraction of text per all transformers to trigger that text dominated (Float)

默认值 0.3

文本转换器与所有转换器的比例，超过该比例，会触发该文本主导型问题

`string_col_as_text_threshold`¶

Threshold for string columns to be treated as text (0.0 - text, 1.0 - string) (Float)

默认值 0.3

通过内部启发式方法确定的字符串即文本平均分数的阈值。它确定何时将字符串列作为文本（适用于 NLP 问题）或仅作为标准分类变量。值较高时，则偏向于将字符串列作为分类列，值较低时，则偏向于将字符串列作为文本列

`string_col_as_text_threshold_preview`¶

string_col_as_text_threshold_preview (Float)

Default value 0.1

Threshold for string columns to be treated as text during preview - should be less than string_col_as_text_threshold to allow data with first 20 rows that don’t look like text to still work for Text-only transformers (0.0 - text, 1.0 - string)

`tokenize_single_chars`¶

Tokenize single characters. (Boolean)

默认值 True

If disabled, require 2 or more alphanumeric characters for a token in Text (Count and TF/IDF) transformers, otherwise create tokens out of single alphanumeric characters. True means that ‘Street 3’ is tokenized into ‘Street’ and ‘3’, while False means that it’s tokenized into ‘Street’.

`text_transformers_max_vocabulary_size`¶

Max size of the vocabulary for text transformers. (List)

默认值 [1000, 5000]

在拟合基于 Tfidf/Count 的文本转换器（非 CNN/BERT）的过程中创建的: 词汇表的最大大小（单位为令牌）。如果提供了多个值，则将第一个值用于初始模型，并在参数调优和特征演变过程中使用其他值。若需加速，则建议使用小于 10000 的值。

Next Previous

Built with Sphinx using a theme provided by Read the Docs.

Nlp 配置¶

enable_tensorflow_textcnn¶

enable_tensorflow_textbigru¶

enable_tensorflow_charcnn¶

enable_pytorch_nlp_transformer¶

pytorch_nlp_transformer_max_rows_linear_model¶

enable_pytorch_nlp_model¶

pytorch_nlp_pretrained_models¶

tensorflow_max_epochs_nlp¶

enable_tensorflow_nlp_accuracy_switch¶

tensorflow_nlp_pretrained_embeddings_file_path¶

tensorflow_nlp_pretrained_s3_access_key_id¶

tensorflow_nlp_pretrained_s3_secret_access_key¶

tensorflow_nlp_pretrained_embeddings_trainable¶

pytorch_tokenizer_parallel¶

pytorch_nlp_fine_tuning_num_epochs¶

pytorch_nlp_fine_tuning_batch_size¶

pytorch_nlp_fine_tuning_padding_length¶

pytorch_nlp_pretrained_models_dir¶

pytorch_nlp_pretrained_s3_access_key_id¶

pytorch_nlp_pretrained_s3_secret_access_key¶

text_fraction_for_text_dominated_problem¶

text_transformer_fraction_for_text_dominated_problem¶

string_col_as_text_threshold¶

string_col_as_text_threshold_preview¶

tokenize_single_chars¶

text_transformers_max_vocabulary_size¶

`enable_tensorflow_textcnn`¶

`enable_tensorflow_textbigru`¶

`enable_tensorflow_charcnn`¶

`enable_pytorch_nlp_transformer`¶

`pytorch_nlp_transformer_max_rows_linear_model`¶

`enable_pytorch_nlp_model`¶

`pytorch_nlp_pretrained_models`¶

`tensorflow_max_epochs_nlp`¶

`enable_tensorflow_nlp_accuracy_switch`¶

`tensorflow_nlp_pretrained_embeddings_file_path`¶

`tensorflow_nlp_pretrained_s3_access_key_id`¶

`tensorflow_nlp_pretrained_s3_secret_access_key`¶

`tensorflow_nlp_pretrained_embeddings_trainable`¶

`pytorch_tokenizer_parallel`¶

`pytorch_nlp_fine_tuning_num_epochs`¶

`pytorch_nlp_fine_tuning_batch_size`¶

`pytorch_nlp_fine_tuning_padding_length`¶

`pytorch_nlp_pretrained_models_dir`¶

`pytorch_nlp_pretrained_s3_access_key_id`¶

`pytorch_nlp_pretrained_s3_secret_access_key`¶

`text_fraction_for_text_dominated_problem`¶

`text_transformer_fraction_for_text_dominated_problem`¶

`string_col_as_text_threshold`¶

`string_col_as_text_threshold_preview`¶

`tokenize_single_chars`¶

`text_transformers_max_vocabulary_size`¶