tf.contrib.learn.RunConfig

This class specifies the configurations for an Estimator run.

Inherits From: RunConfig

This class is a deprecated implementation of tf.estimator.RunConfig interface.

Args
master TensorFlow master. Defaults to empty string for local.
num_cores Number of cores to be used. If 0, the system picks an appropriate number (default: 0).
log_device_placement Log the op placement to devices (default: False).
gpu_memory_fraction Fraction of GPU memory used by the process on each GPU uniformly on the same machine.
tf_random_seed Random seed for TensorFlow initializers. Setting this value allows consistency between reruns.
save_summary_steps Save summaries every this many steps.
save_checkpoints_secs Save checkpoints every this many seconds. Can not be specified with save_checkpoints_steps.
save_checkpoints_steps Save checkpoints every this many steps. Can not be specified with save_checkpoints_secs.
keep_checkpoint_max The maximum number of recent checkpoint files to keep. As new files are created, older files are deleted. If None or 0, all checkpoint files are kept. Defaults to 5 (that is, the 5 most recent checkpoint files are kept.)
keep_checkpoint_every_n_hours Number of hours between each checkpoint to be saved. The default value of 10,000 hours effectively disables the feature.
log_step_count_steps The frequency, in number of global steps, that the global step/sec will be logged during training.
protocol An optional argument which specifies the protocol used when starting server. None means default to grpc.
evaluation_master the master on which to perform evaluation.
model_dir directory where model parameters, graph etc are saved. If None, will use model_dir property in TF_CONFIG environment variable. If both are set, must have same value. If both are None, see Estimator about where the model will be saved.
session_config a ConfigProto used to set session parameters, or None. Note - using this argument, it is easy to provide settings which break otherwise perfectly good models. Use with care.
session_creation_timeout_secs Max time workers should wait for a session to become available (on initialization or when recovering a session) with MonitoredTrainingSession. Defaults to 7200 seconds, but users may want to set a lower value to detect problems with variable / session (re)-initialization more quickly.
Attributes
cluster_spec
device_fn Returns the device_fn.

If device_fn is not None, it overrides the default device function used in Estimator. Otherwise the default one is used.

environment
eval_distribute Optional tf.distribute.Strategy for evaluation.
evaluation_master
experimental_max_worker_delay_secs
global_id_in_cluster The global id in the training cluster.

All global ids in the training cluster are assigned from an increasing sequence of consecutive integers. The first id is 0.

Note: Task id (the property field task_id) is tracking the index of the node among all nodes with the SAME task type. For example, given the cluster definition as follows:
cluster = {'chief': ['host0:2222'],
'ps': ['host1:2222', 'host2:2222'],
'worker': ['host3:2222', 'host4:2222', 'host5:2222']}

Nodes with task type worker can have id 0, 1, 2. Nodes with task type ps can have id, 0, 1. So, task_id is not unique, but the pair (task_type, task_id) can uniquely determine a node in the cluster.

Global id, i.e., this field, is tracking the index of the node among ALL nodes in the cluster. It is uniquely assigned. For example, for the cluster spec given above, the global ids are assigned as:

task_type  | task_id  |  global_id
--------------------------------
chief      | 0        |  0
worker     | 0        |  1
worker     | 1        |  2
worker     | 2        |  3
ps         | 0        |  4
ps         | 1        |  5
is_chief
keep_checkpoint_every_n_hours
keep_checkpoint_max
log_step_count_steps
master
model_dir
num_ps_replicas
num_worker_replicas
protocol Returns the optional protocol value.
save_checkpoints_secs
save_checkpoints_steps
save_summary_steps
service Returns the platform defined (in TF_CONFIG) service dict.
session_config
session_creation_timeout_secs
task_id
task_type
tf_config
tf_random_seed
train_distribute Optional tf.distribute.Strategy for training.

Methods

get_task_id

View source

Returns task index from TF_CONFIG environmental variable.

If you have a ClusterConfig instance, you can just access its task_id property instead of calling this function and re-parsing the environmental variable.

Returns
TF_CONFIG['task']['index']. Defaults to 0.

replace

View source

Returns a new instance of RunConfig replacing specified properties.

Only the properties in the following list are allowed to be replaced:

  • model_dir,
  • tf_random_seed,
  • save_summary_steps,
  • save_checkpoints_steps,
  • save_checkpoints_secs,
  • session_config,
  • keep_checkpoint_max,
  • keep_checkpoint_every_n_hours,
  • log_step_count_steps,
  • train_distribute,
  • device_fn,
  • protocol.
  • eval_distribute,
  • experimental_distribute,
  • experimental_max_worker_delay_secs,

In addition, either save_checkpoints_steps or save_checkpoints_secs can be set (should not be both).

Args
**kwargs keyword named properties with new values.
Raises
ValueError If any property name in kwargs does not exist or is not allowed to be replaced, or both save_checkpoints_steps and save_checkpoints_secs are set.
Returns
a new instance of RunConfig.

uid

View source

Generates a 'Unique Identifier' based on all internal fields. (experimental)

Caller should use the uid string to check RunConfig instance integrity in one session use, but should not rely on the implementation details, which is subject to change.

Args
whitelist A list of the string names of the properties uid should not include. If None, defaults to _DEFAULT_UID_WHITE_LIST, which includes most properties user allowes to change.
Returns
A uid string.

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/contrib/learn/RunConfig