tf.raw_ops.GenerateVocabRemapping

Given a path to new and old vocabulary files, returns a remapping Tensor of

length num_new_vocab, where remapping[i] contains the row number in the old vocabulary that corresponds to row i in the new vocabulary (starting at line new_vocab_offset and up to num_new_vocab entities), or -1 if entry i in the new vocabulary is not in the old vocabulary. The old vocabulary is constrained to the first old_vocab_size entries if old_vocab_size is not the default value of -1.

num_vocab_offset enables use in the partitioned variable case, and should generally be set through examining partitioning info. The format of the files should be a text file, with each line containing a single entity within the vocabulary.

For example, with new_vocab_file a text file containing each of the following elements on a single line: [f0, f1, f2, f3], old_vocab_file = [f1, f0, f3], num_new_vocab = 3, new_vocab_offset = 1, the returned remapping would be [0, -1, 2].

The op also returns a count of how many entries in the new vocabulary were present in the old vocabulary, which is used to calculate the number of values to initialize in a weight matrix remapping

This functionality can be used to remap both row vocabularies (typically, features) and column vocabularies (typically, classes) from TensorFlow checkpoints. Note that the partitioning logic relies on contiguous vocabularies corresponding to div-partitioned variables. Moreover, the underlying remapping uses an IndexTable (as opposed to an inexact CuckooTable), so client code should use the corresponding index_table_from_file() as the FeatureColumn framework does (as opposed to tf.feature_to_id(), which uses a CuckooTable).

Args
new_vocab_file A Tensor of type string. Path to the new vocab file.
old_vocab_file A Tensor of type string. Path to the old vocab file.
new_vocab_offset An int that is >= 0. How many entries into the new vocab file to start reading.
num_new_vocab An int that is >= 0. Number of entries in the new vocab file to remap.
old_vocab_size An optional int that is >= -1. Defaults to -1. Number of entries in the old vocab file to consider. If -1, use the entire old vocabulary.
name A name for the operation (optional).
Returns
A tuple of Tensor objects (remapping, num_present).
remapping A Tensor of type int64.
num_present A Tensor of type int32.

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r2.4/api_docs/python/tf/raw_ops/GenerateVocabRemapping