tf.nn

时间: 2023-07-18 admin 互联网

tf.nn

tf.nn

参考   tf.nn - 云+社区 - 腾讯云

目录

一、函数列表

二、重要的API

1、tf.nn.sparse_softmax_cross_entropy_with_logits

2、tf.nn.softmax

3、tf.compat.v1.nn.dropout

4、tf.compat.v1.nn.sigmoid_cross_entropy_with_logits

5、tf.nn.bias_add

6、tf.nn.atrous_conv2d

7、tf.nn.relu

8、tf.nn.l2_loss

9、tf.nn.max_pool

10、tf.nn.bias_add

11、tf.nn.softmax_cross_entropy_with_logits


一、函数列表

主要用来搭建神经网络。

all_candidate_sampler(): 生成所有类的集合。

atrous_conv2d(): 无源卷积(又称孔卷积或膨胀卷积)。

atrous_conv2d_transpose(): atrous_conv2d的转置。

avg_pool(): 对输入执行平均池操作。

avg_pool1d(): 对输入执行平均池操作。

avg_pool2d(): 对输入执行平均池操作。

avg_pool3d(): 对输入执行平均池操作。

avg_pool_v2(): 对输入执行平均池操作。

batch_norm_with_global_normalization(): 批归一化

batch_normalization(): 批归一化

bias_add(): 给值加上偏置

bidirectional_dynamic_rnn(): 创建一个双向递归神经网络的动态版本。(deprecated)

collapse_repeated():将重复的标签合并到单个标签中。

compute_accidental_hits(): 计算与true_classes匹配的sampled_candidate中的位置id。

conv1d(): 计算给定三维输入和滤波张量的一维卷积。(弃用参数值)(弃用参数值)

conv1d_transpose(): conv1d的转置。

conv2d(): 计算给定4-D输入和滤波张量的二维卷积。

conv2d_backprop_filter(): 计算卷积相对于滤波器的梯度。

conv2d_backprop_input(): 计算卷积相对于输入的梯度。

conv2d_transpose(): conv2d的转置。

conv3d(): 计算给定5-D输入和滤波张量的三维卷积。

conv3d_backprop_filter(): 计算三维卷积相对于滤波器的梯度。

conv3d_backprop_filter_v2(): 计算三维卷积相对于滤波器的梯度。

conv3d_transpose(): conv3d的转置。

conv_transpose(): 卷积的转置。

convolution():计算N-D卷积的和(实际上是互相关的)。

crelu(): 计算连接ReLU。

ctc_beam_search_decoder(): 对输入的日志执行波束搜索解码。

ctc_beam_search_decoder_v2(): 对输入的日志执行波束搜索解码。

ctc_greedy_decoder(): 对输入(最佳路径)中给定的日志执行贪婪解码。

ctc_loss(): 计算CTC(连接主义时间分类)损失。

ctc_loss_v2(): 计算CTC(连接主义时间分类)损失。

ctc_unique_labels(): 获取用于tf.n .ctc_loss的成批标签的惟一标签和索引。

depth_to_space(): T型张量的测深。

depthwise_conv2d(): 切除二维卷积。

depthwise_conv2d_backprop_filter(): 计算深度卷积相对于滤波器的梯度。

depthwise_conv2d_backprop_input(): 计算深度卷积相对于输入的梯度。

depthwise_conv2d_native(): 计算一个二维深度卷积给定4-D输入和滤波器张量。

depthwise_conv2d_native_backprop_filter(): 计算深度卷积相对于滤波器的梯度。

depthwise_conv2d_native_backprop_input(): 计算深度卷积相对于输入的梯度。

dilation2d(): 计算了4-D输入和3-D滤波张量的灰度膨胀。

dropout(): 计算dropout. (deprecated arguments)

dynamic_rnn(): 创建由RNNCell cell指定的递归神经网络。 (deprecated)

elu(): 计算指数线性:exp(特征)- 1如果< 0, features otherwise.

embedding_lookup(): 在嵌入张量列表中查找id。

embedding_lookup_sparse(): 计算给定id和权重的嵌入。

erosion2d(): 计算了4-D值和3-D核张量的灰度侵蚀。

fixed_unigram_candidate_sampler(): 使用提供的(固定的)基本分布对一组类进行示例。

fractional_avg_pool(): 对输入执行分数平均池化。 (deprecated)

fractional_max_pool(): 对输入执行分数最大池化。 (deprecated)

fused_batch_norm(): Batch normalization.

in_top_k(): 表示目标是否在前K个预测中。

l2_loss(): L2 Loss.

l2_normalize(): 使用L2范数沿着维度轴进行标准化。(deprecated arguments)

leaky_relu(): 计算泄漏的ReLU激活函数。

learned_unigram_candidate_sampler(): 从训练期间学到的分布中抽取一组类作为样本。

local_response_normalization(): Local Response Normalization.

log_poisson_loss(): 计算给定log_input的log泊松损失。

log_softmax(): 计算日志softmax激活。(deprecated arguments)

log_uniform_candidate_sampler(): 使用log_uniform (Zipfian)基分布对一组类进行示例。

lrn(): Local Response Normalization.

max_pool():对输入执行最大池化。

max_pool1d(): 对输入执行最大池化。

max_pool2d(): 对输入执行最大池化。

max_pool3d():对输入执行最大池化。

max_pool_v2(): 对输入执行最大池化。

max_pool_with_argmax(): 对输入和输出最大值和索引执行最大池化。

moments(.): 计算x的均值和方差。

nce_loss(): 计算并返回噪声对比估计训练损失。

normalize_moments():在充分统计的基础上,计算的均值和方差。

pool(): 执行N-D池操作。

quantized_avg_pool(): 生成量子化类型的输入张量的平均池。

quantized_conv2d(): 计算二维卷积给定量化的四维输入和滤波器张量。

quantized_max_pool(): 生成量子化类型的输入张量的最大池。

quantized_relu_x():计算量子化整流线性X: min(max(features, 0), max_value)

raw_rnn():创建由RNNCell单元格和loop函数loop_fn指定的RNN。

relu(): 计算校正线性:(features, 0).

relu6(): 计算修正后的线性6: min(max(features, 0), 6).

relu_layer(): 计算Relu(x * weight + biases).

safe_embedding_lookup_sparse(): 查找嵌入结果,解释无效id和空特性。

sampled_softmax_loss(): 计算并返回采样的softmax训练损失。

selu():计算比例指数线性:scale * alpha * (exp(features) - 1)

separable_conv2d(): 二维卷积与可分离滤波器。

sigmoid(): 计算x元素的sigmoid。

sigmoid_cross_entropy_with_logits(): 计算给定对数的s形交叉熵。

softmax(): 计算softmax激活。(deprecated arguments)

softmax_cross_entropy_with_logits(): 计算logits和标签之间的softmax交叉熵。 (deprecated)

softmax_cross_entropy_with_logits_v2(): 计算logits和标签之间的softmax交叉熵。(deprecated arguments)

softplus(): 计算softplus: log(exp(features) + 1).

softsign(): 计算softsign: features / (abs(features) + 1).

space_to_batch(): 适用于T型4-D张量。

space_to_depth(): T型张量的空间-深度。

sparse_softmax_cross_entropy_with_logits(): 计算对数和标签之间的稀疏软最大交叉熵。

static_bidirectional_rnn(): 建立一个双向递归神经网络。(deprecated)

static_rnn(): 创建由RNNCell cell指定的递归神经网络。 (deprecated)

static_state_saving_rnn(): RNN,它接受一个状态保护程序,用于时间截断的RNN计算。(deprecated)

sufficient_statistics(): 计算x的均值和方差的充分统计量。

tanh(): 计算x元素的双曲正切。

top_k(): 查找最后一个维度的k个最大项的值和索引。

uniform_candidate_sampler(): 使用统一的基分布对一组类进行采样。

weighted_cross_entropy_with_logits(): 计算加权交叉熵。(deprecated arguments)

weighted_moments(): 返回x的频率加权平均值和方差。

with_space_to_batch(): 对输入的空间到批处理表示形式执行op。

xw_plus_b(): 计算matmul(x,权值)+偏差。

zero_fraction():返回值中0的分数。

二、重要的API

1、tf.nn.sparse_softmax_cross_entropy_with_logits

计算logitslabels之间的稀疏softmax交叉熵。

tf.nn.sparse_softmax_cross_entropy_with_logits(_sentinel=None,labels=None,logits=None,name=None
)

衡量离散分类任务中的概率误差,其中的类是互斥的(每个条目恰好在一个类中)。例如,每个CIFAR-10图像都有且只有一个标签:一个图像可以是一条狗或一辆卡车,但不能同时是两条。

注意:对于这个操作,给定标签的概率被认为是排他的。也就是说,不允许使用软类,标签向量必须为每一行logits(每一个minibatch条目)的真正类提供一个特定的索引。对于每个条目都具有概率分布的softsoftmax分类,请参见softmax_cross_entropy_with_logits_v2。

警告:此op期望未缩放的日志,因为为了提高效率,它在内部对日志执行softmax。不要用softmax的输出调用这个op,因为它会产生不正确的结果。

一个常见的用例是有shape [batch_size, num_classes]的日志和shape [batch_size]的标签,但是支持更高的维度,在这种情况下,dim-th维度的大小假定为num_classes。logits必须具有float16、float32或float64的dtype,标签必须具有int32或int64的dtype。注意,为了避免混淆,只需要将命名参数传递给这个函数。

参数:

  • _sentinel:用于防止位置参数。内部,请勿使用。
  • labels:形状张量[d_0, d_1,…], d_{r-1}](其中r为标签和结果的秩)和dtype int32或int64。标签中的每个条目必须是[0,num_classes]中的索引。当这个op在CPU上运行时,其他值将引发异常,并在GPU上返回相应的丢失和梯度行NaN。
  • logits:每个标签激活(通常是线性输出)的形状[d_0, d_1,…, d_{r-1}, num_classes]和dtype float16、float32或float64。这些活化能被解释为非标准化的对数概率。
  • name:操作的名称(可选)。

返回值:

  • 一个与标签形状相同,与logits类型相同的张量,具有softmax交叉熵。

 可能产生的异常:

  • ValueError: If logits are scalars (need to have rank >= 1) or if the rank of the labels is not equal to the rank of the logits minus one.

2、tf.nn.softmax

Computes softmax activations.

Aliases:

  • tf.compat.v2.math.softmax
  • tf.compat.v2.nn.softmax
  • tf.math.softmax
tf.nn.softmax(logits,axis=None,name=None
)

Used in the tutorials:

  • Custom training: walkthrough
  • Image captioning with visual attention
  • Neural machine translation with attention
  • Transformer model for language understanding

This function performs the equivalent of

softmax = tf.exp(logits) / tf.reduce_sum(tf.exp(logits), axis)

Args:

  • logits: A non-empty Tensor. Must be one of the following types: half, float32, float64.
  • axis: The dimension softmax would be performed on. The default is -1 which indicates the last dimension.
  • name: A name for the operation (optional).

Returns:

A Tensor. Has the same type and shape as logits.

Raises:

  • InvalidArgumentError: if logits is empty or axis is beyond the last dimension of logits.

3、tf.compat.v1.nn.dropout

Computes dropout. (deprecated arguments)

tf.compat.v1.nn.dropout(x,keep_prob=None,noise_shape=None,seed=None,name=None,rate=None
)

Warning: SOME ARGUMENTS ARE DEPRECATED: (keep_prob). They will be removed in a future version. Instructions for updating: Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.

For each element of x, with probability rate, outputs 0, and otherwise scales up the input by 1 / (1-rate). The scaling is such that the expected sum is unchanged.

By default, each element is kept or dropped independently. If noise_shape is specified, it must be broadcastable to the shape of x, and only dimensions with noise_shape[i] == shape(x)[i] will make independent decisions. For example, if shape(x) = [k, l, m, n] and noise_shape = [k, 1, 1, n], each batch and channel component will be kept independently and each row and column will be kept or not kept together.

Args:

  • x: A floating point tensor.
  • keep_prob: (deprecated) A deprecated alias for (1-rate).
  • noise_shape: A 1-D Tensor of type int32, representing the shape for randomly generated keep/drop flags.
  • seed: A Python integer. Used to create random seeds. See tf.compat.v1.set_random_seed for behavior.
  • name: A name for this operation (optional).
  • rate: A scalar Tensor with the same type as x. The probability that each element of x is discarded.

Returns:

A Tensor of the same shape of x.

Raises:

  • ValueError: If rate is not in [0, 1) or if x is not a floating point tensor.

4、tf.compat.v1.nn.sigmoid_cross_entropy_with_logits

Computes sigmoid cross entropy given logits.

tf.compat.v1.nn.sigmoid_cross_entropy_with_logits(_sentinel=None,labels=None,logits=None,name=None
)

Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.

For brevity, let x = logits, z = labels. The logistic loss is

  z * -log(sigmoid(x)) + (1 - z) * -log(1 - sigmoid(x))
= z * -log(1 / (1 + exp(-x))) + (1 - z) * -log(exp(-x) / (1 + exp(-x)))
= z * log(1 + exp(-x)) + (1 - z) * (-log(exp(-x)) + log(1 + exp(-x)))
= z * log(1 + exp(-x)) + (1 - z) * (x + log(1 + exp(-x))
= (1 - z) * x + log(1 + exp(-x))
= x - x * z + log(1 + exp(-x))

For x < 0, to avoid overflow in exp(-x), we reformulate the above

  x - x * z + log(1 + exp(-x))
= log(exp(x)) - x * z + log(1 + exp(-x))
= - x * z + log(1 + exp(x))

Hence, to ensure stability and avoid overflow, the implementation uses this equivalent formulation

max(x, 0) - x * z + log(1 + exp(-abs(x)))

logits and labels must have the same type and shape.

Args:

  • _sentinel: Used to prevent positional parameters. Internal, do not use.
  • labels: A Tensor of the same type and shape as logits.
  • logits: A Tensor of type float32 or float64.
  • name: A name for the operation (optional).

Returns:

A Tensor of the same shape as logits with the componentwise logistic losses.

Raises:

  • ValueError: If logits and labels do not have the same shape.

5、tf.nn.bias_add

Adds bias to value.

Aliases:

  • tf.compat.v1.nn.bias_add
  • tf.compat.v2.nn.bias_add
tf.nn.bias_add(value,bias,data_format=None,name=None
)

This is (mostly) a special case of tf.add where bias is restricted to 1-D. Broadcasting is supported, so value may have any number of dimensions. Unlike tf.add, the type of bias is allowed to differ from value in the case where both types are quantized.

Args:

  • value: A Tensor with type float, double, int64, int32, uint8, int16, int8, complex64, or complex128.
  • bias: A 1-D Tensor with size matching the channel dimension of value. Must be the same type as value unless value is a quantized type, in which case a different quantized type may be used.
  • data_format: A string. 'N...C' and 'NC...' are supported.
  • name: A name for the operation (optional).

Returns:

A Tensor with the same type as value.

6、tf.nn.atrous_conv2d

Atrous convolution (a.k.a. convolution with holes or dilated convolution).

Aliases:

  • tf.compat.v1.nn.atrous_conv2d
  • tf.compat.v2.nn.atrous_conv2d
tf.nn.atrous_conv2d(value,filters,rate,padding,name=None
)

This function is a simpler wrapper around the more general tf.nn.convolution, and exists only for backwards compatibility. You can use tf.nn.convolution to perform 1-D, 2-D, or 3-D atrous convolution.

Computes a 2-D atrous convolution, also known as convolution with holes or dilated convolution, given 4-D value and filters tensors. If the rate parameter is equal to one, it performs regular 2-D convolution. If the rate parameter is greater than one, it performs convolution with holes, sampling the input values every rate pixels in the height and width dimensions. This is equivalent to convolving the input with a set of upsampled filters, produced by inserting rate - 1 zeros between two consecutive values of the filters along the height and width dimensions, hence the name atrous convolution or convolution with holes (the French word trous means holes in English).

More specifically:

output[batch, height, width, out_channel] =sum_{dheight, dwidth, in_channel} (filters[dheight, dwidth, in_channel, out_channel] *value[batch, height + rate*dheight, width + rate*dwidth, in_channel])

Atrous convolution allows us to explicitly control how densely to compute feature responses in fully convolutional networks. Used in conjunction with bilinear interpolation, it offers an alternative to conv2d_transpose in dense prediction tasks such as semantic image segmentation, optical flow computation, or depth estimation. It also allows us to effectively enlarge the field of view of filters without increasing the number of parameters or the amount of computation.

For a description of atrous convolution and how it can be used for dense feature extraction, please see: Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. The same operation is investigated further in Multi-Scale Context Aggregation by Dilated Convolutions. Previous works that effectively use atrous convolution in different ways are, among others, OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks and Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks. Atrous convolution is also closely related to the so-called noble identities in multi-rate signal processing.There are many different ways to implement atrous convolution (see the refs above). The implementation here reduces

    atrous_conv2d(value, filters, rate, padding=padding)

to the following three operations:

    paddings = ...net = space_to_batch(value, paddings, block_size=rate)net = conv2d(net, filters, strides=[1, 1, 1, 1], padding="VALID")crops = ...net = batch_to_space(net, crops, block_size=rate)

Advanced usage. Note the following optimization: A sequence of atrous_conv2d operations with identical rate parameters, 'SAME' padding, and filters with odd heights/ widths:

    net = atrous_conv2d(net, filters1, rate, padding="SAME")net = atrous_conv2d(net, filters2, rate, padding="SAME")...net = atrous_conv2d(net, filtersK, rate, padding="SAME")

can be equivalently performed cheaper in terms of computation and memory as:

    pad = ...  # padding so that the input dims are multiples of ratenet = space_to_batch(net, paddings=pad, block_size=rate)net = conv2d(net, filters1, strides=[1, 1, 1, 1], padding="SAME")net = conv2d(net, filters2, strides=[1, 1, 1, 1], padding="SAME")...net = conv2d(net, filtersK, strides=[1, 1, 1, 1], padding="SAME")net = batch_to_space(net, crops=pad, block_size=rate)

because a pair of consecutive space_to_batch and batch_to_space ops with the same block_size cancel out when their respective paddings and crops inputs are identical.

Args:

  • value: A 4-D Tensor of type float. It needs to be in the default "NHWC" format. Its shape is [batch, in_height, in_width, in_channels].
  • filters: A 4-D Tensor with the same type as value and shape [filter_height, filter_width, in_channels, out_channels]. filters' in_channels dimension must match that of value. Atrous convolution is equivalent to standard convolution with upsampled filters with effective height filter_height + (filter_height - 1) * (rate - 1) and effective width filter_width + (filter_width - 1) * (rate - 1), produced by inserting rate - 1 zeros along consecutive elements across the filters' spatial dimensions.
  • rate: A positive int32. The stride with which we sample input values across the height and width dimensions. Equivalently, the rate by which we upsample the filter values by inserting zeros across the height and width dimensions. In the literature, the same parameter is sometimes called input stride or dilation.
  • padding: A string, either 'VALID' or 'SAME'. The padding algorithm.
  • name: Optional name for the returned tensor.

Returns:

A Tensor with the same type as value. Output shape with 'VALID' padding is:

[batch, height - 2 * (filter_width - 1),width - 2 * (filter_height - 1), out_channels].

Output shape with 'SAME' padding is:

[batch, height, width, out_channels].

Raises:

  • ValueError: If input/output depth does not match filters' shape, or if padding is other than 'VALID' or 'SAME'.

7、tf.nn.relu

Defined in generated file: python/ops/gen_nn_ops.py

Computes rectified linear: max(features, 0).

Aliases:

  • tf.compat.v1.nn.relu
  • tf.compat.v2.nn.relu
tf.nn.relu(features,name=None
)

Used in the guide:

  • Better performance with tf.function and AutoGraph
  • Writing custom layers and models with Keras

Used in the tutorials:

  • Custom layers
  • Image captioning with visual attention

Args:

  • features: A Tensor. Must be one of the following types: float32, float64, int32, uint8, int16, int8, int64, bfloat16, uint16, half, uint32, uint64, qint8.
  • name: A name for the operation (optional).

Returns:

A Tensor. Has the same type as features.

8、tf.nn.l2_loss

Defined in generated file: python/ops/gen_nn_ops.py

L2 Loss.

Aliases:

  • tf.compat.v1.nn.l2_loss
  • tf.compat.v2.nn.l2_loss
tf.nn.l2_loss(t,name=None
)

Computes half the L2 norm of a tensor without the sqrt:

output = sum(t ** 2) / 2

Args:

  • t: A Tensor. Must be one of the following types: half, bfloat16, float32, float64. Typically 2-D, but may have any dimensions.
  • name: A name for the operation (optional).

Returns:

A Tensor. Has the same type as t.

9、tf.nn.max_pool

Performs the max pooling on the input.

Aliases:

  • tf.compat.v1.nn.max_pool_v2
  • tf.compat.v2.nn.max_pool
tf.nn.max_pool(input,ksize,strides,padding,data_format=None,name=None
)

Args:

  • input: Tensor of rank N+2, of shape [batch_size] + input_spatial_shape + [num_channels] if data_format does not start with "NC" (default), or [batch_size, num_channels] + input_spatial_shape if data_format starts with "NC". Pooling happens over the spatial dimensions only.
  • ksize: An int or list of ints that has length 1, N or N+2. The size of the window for each dimension of the input tensor.
  • strides: An int or list of ints that has length 1, N or N+2. The stride of the sliding window for each dimension of the input tensor.
  • padding: A string, either 'VALID' or 'SAME'. The padding algorithm. See the "returns" section of tf.nn.convolution for details.
  • data_format: A string. Specifies the channel dimension. For N=1 it can be either "NWC" (default) or "NCW", for N=2 it can be either "NHWC" (default) or "NCHW" and for N=3 either "NDHWC" (default) or "NCDHW".
  • name: Optional name for the operation.

Returns:

  • A Tensor of format specified by data_format. The max pooled output tensor.

10、tf.nn.bias_add

Adds bias to value.

Aliases:

  • tf.compat.v1.nn.bias_add
  • tf.compat.v2.nn.bias_add
tf.nn.bias_add(value,bias,data_format=None,name=None
)

This is (mostly) a special case of tf.add where bias is restricted to 1-D. Broadcasting is supported, so value may have any number of dimensions. Unlike tf.add, the type of bias is allowed to differ from value in the case where both types are quantized.

Args:

  • value: A Tensor with type float, double, int64, int32, uint8, int16, int8, complex64, or complex128.
  • bias: A 1-D Tensor with size matching the channel dimension of value. Must be the same type as value unless value is a quantized type, in which case a different quantized type may be used.
  • data_format: A string. 'N...C' and 'NC...' are supported.
  • name: A name for the operation (optional).

Returns:

  • A Tensor with the same type as value.

11、tf.nn.softmax_cross_entropy_with_logits

Computes softmax cross entropy between logits and labels.

Aliases:

  • tf.compat.v2.nn.softmax_cross_entropy_with_logits
tf.nn.softmax_cross_entropy_with_logits(labels,logits,axis=-1,name=None
)

Used in the guide:

  • Distributed training with TensorFlow

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

If using exclusive labels (wherein one and only one class is true at a time), see sparse_softmax_cross_entropy_with_logits.

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.

A common use case is to have logits and labels of shape [batch_size, num_classes], but higher dimensions are supported, with the axis argument specifying the class dimension.

logits and labels must have the same dtype (either float16, float32, or float64).

Backpropagation will happen into both logits and labels. To disallow backpropagation into labels, pass label tensors through tf.stop_gradient before feeding it to this function.

Note that to avoid confusion, it is required to pass only named arguments to this function.

Args:

  • labels: Each vector along the class dimension should hold a valid probability distribution e.g. for the case in which labels are of shape [batch_size, num_classes], each row of labels[i] must be a valid probability distribution.
  • logits: Per-label activations, typically a linear output. These activation energies are interpreted as unnormalized log probabilities.
  • axis: The class dimension. Defaulted to -1 which is the last dimension.
  • name: A name for the operation (optional).

Returns:

A Tensor that contains the softmax cross entropy loss. Its type is the same as logits and its shape is the same as labels except that it does not have the last dimension of labels.