tf.keras.layers.experimental.preprocessing.CategoryEncoding

Category encoding layer.

Inherits From: PreprocessingLayer, Layer, Module

tf.keras.layers.experimental.preprocessing.CategoryEncoding(
    max_tokens=None, output_mode=BINARY, sparse=False, **kwargs
)

This layer provides options for condensing data into a categorical encoding. It accepts integer values as inputs and outputs a dense representation (one sample = 1-index tensor of float values representing data about the sample's tokens) of those inputs.

Examples:

layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
          max_tokens=4, output_mode="count")
layer([[0, 1], [0, 0], [1, 2], [3, 1]])
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
  array([[1., 1., 0., 0.],
         [2., 0., 0., 0.],
         [0., 1., 1., 0.],
         [0., 1., 0., 1.]], dtype=float32)>

Examples with weighted inputs:

layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
          max_tokens=4, output_mode="count")
count_weights = np.array([[.1, .2], [.1, .1], [.2, .3], [.4, .2]])
layer([[0, 1], [0, 0], [1, 2], [3, 1]], count_weights=count_weights)
<tf.Tensor: shape=(4, 4), dtype=float64, numpy=
  array([[0.1, 0.2, 0. , 0. ],
         [0.2, 0. , 0. , 0. ],
         [0. , 0.2, 0.3, 0. ],
         [0. , 0.2, 0. , 0.4]])>

Call arguments:

inputs: A 2D tensor (samples, timesteps).
count_weights: A 2D tensor in the same shape as inputs indicating the weight for each sample value when summing up in count mode. Not used in binary or tfidf mode.

Attributes
`max_tokens`	The maximum size of the vocabulary for this layer. If None, there is no cap on the size of the vocabulary.
`output_mode`	Specification for the output of the layer. Defaults to "binary". Values can be "binary", "count" or "tf-idf", configuring the layer as follows: "binary": Outputs a single int array per batch, of either vocab_size or max_tokens size, containing 1s in all elements where the token mapped to that index exists at least once in the batch item. "count": As "binary", but the int array contains a count of the number of times the token at that index appeared in the batch item. "tf-idf": As "binary", but the TF-IDF algorithm is applied to find the value in each token slot.
`sparse`	Boolean. If true, returns a `SparseTensor` instead of a dense `Tensor`. Defaults to `False`.

Methods

`adapt`

View source

adapt(
    data, reset_state=True
)

Fits the state of the preprocessing layer to the dataset.

Overrides the default adapt method to apply relevant preprocessing to the inputs before passing to the combiner.

Arguments
`data`	The data to train on. It can be passed either as a tf.data Dataset, or as a numpy array.
`reset_state`	Optional argument specifying whether to clear the state of the layer at the start of the call to `adapt`. This must be True for this layer, which does not support repeated calls to `adapt`.

Raises
`RuntimeError`	if the layer cannot be adapted at this time.

`set_num_elements`

View source

set_num_elements(
    num_elements
)

`set_tfidf_data`

View source

set_tfidf_data(
    tfidf_data
)

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/keras/layers/experimental/preprocessing/CategoryEncoding