[Caffe]: HDF5Data Layer

时间： 2023-07-18 admin 互联网

[Caffe]: HDF5Data Layer 推荐度：
相关推荐

Tensorflow、Pytorch和Caffe2的相继问世后，Caffe的热度和优势已逐渐消失。
最近，为了跑对比的实验，用到了HDF5Data层。
遇到一些问题，写篇博客记录以下。

HDF5Data Layer

// Message that stores parameters used by HDF5DataLayer
message HDF5DataParameter {// Specify the data source.optional string source = 1;// Specify the batch size.optional uint32 batch_size = 2;// Specify whether to shuffle the data.// If shuffle == true, the ordering of the HDF5 files is shuffled,// and the ordering of data within any given HDF5 file is shuffled,// but data between different files are not interleaved; all of a file's// data are output (in a random order) before moving onto another file.optional bool shuffle = 3 [default = false];
}

HDF5Data层是Caffe提供的能够灵活控制输入数据的接口，上图是它的参数定义。如上图，HDF5Data层只有3种参数，一个指定路径的list、一个指定每个批次的训练数量和是否打乱训练数据顺序。

HDF5文件生成

使用该层需要将数据存成hdf5格式，这通过python能够很容易的实现。
推荐用h5py库，仅需三行就能将numpy存进一个hdf5文件里。

import h5py
import numpy as npdata = np.random.randn(128,1)
with h5py.File('test.h5', 'w') as fh:fh.create_dataset('data', data=data)

使用HDF5Data层

layer{name:"data"type:"HDF5Data"top:"data"top:"label"include{phase:TRAIN}hdf5_data_param{source:"train.txt"  #train.txt保存h5文件的路径，可包含多个路径，每行一个batch_size: 256shuffle: true       #是否打乱同一个文件内的数据顺序}
}

如图所示，HDF5Data层不支持减均值、随机剪切等数据操作，所以要在创建HDF5文件时完成。

综上，HDF5Data层还是蛮好用的。

                                                                2018.03.16 记