Pooling Models¶
TODO…
Bilinear pooling¶
bilinearpooling.py provides a few convenience functions for creating
symmetric or asymmetric B-CNN models in Keras with bilinear pooling, as
proposed in Bilinear CNNs for Fine-grained Visual
Recognition
(ICCV, 2015).
bilinearpooling.pooling:
- Average pooling of local feature vector outer products in
tensorflow - Includes element-wise signed square root and L2 normalization
- If using
combine, you won’t need to reference this explicitly
bilinearpooling.combine:
- Takes two
kerasmodelsfAandfBwith output shapes(N, H, W, cA),(N, H, W, cB) - Maps
[fA.output, fB.output]to shape(N, cA, cB)withbilinear.pooling - Flattens, connects to
softmaxoutput using a specifiable number ofDenselayers. - Returns the resulting
keras.models.Modelinstance
Usage Notes¶
- Be careful with reuse of single model for
fAandfB(e.g., asymmetry via different output layers). Weights will be shared if you use the same instantiation of the original model to generate both models.
If the dimensionality of local feature vectors is 512, and there are
N classes, the size of a fully-connected classification layer will
be very large (512*512*N=262,144*N). With random weight
initialization, it seems pretty difficult to train a layer of this size
for moderate to large N, so I’m looking at writing an initializer
that uses logistic regression, something which is not mentioned in the
paper, but which is present in the authors’ matlab release.
KernelPooling Layer¶
Implementation of Kernel Pooling for Convolutional Neural
Networks
[CVPR, 2017]. The layer uses the Count Sketch projection to compute a
p-order Taylor series kernel with learnable composition. The
composition weights alpha are initialized to approximate a Gaussian
RBF kernel. The kernel is computed over all local feature vectors
(h_i, w_j) in the input volume and then average pooled.
Construction paramters include p (order of the kernel
approximation), d_i (dimensionality for each order i>=2). Output
has shape (batches, 1+C+(p-1)*d_i), where C is the number of
input channels.
The gamma parameter, which determines alpha values in the approximation under the assumption of L2-normalized input vectors, can optionally be estimated using a set of training feature vectors.