Recursive Layers


TensorTree class is used to decode a tensor to a tree structure. The given input content is a tensor which encodes a constituency parse tree. The tensor should have the following structure:

Each row of the tensor represents a tree node and the row number is node number. For each row, except the last column, all other columns represent the children node number of this node. Assume the value of a certain column of the row is not zero, the value p means this node has a child whose node number is p (lies in the p-th) row. Each leaf has a leaf number, in the tensor, the last column represents the leaf number. Each leaf does not have any children, so all the columns of a leaf except the last should be zero. If a node is the root, the last column should equal to -1.

Note: if any row for padding, the padding rows should be placed at the last rows with all elements equal to -1.

eg. a tensor represents a binary tree:

[11, 10, -1;
 0, 0, 1;
 0, 0, 2;
 0, 0, 3;
 0, 0, 4;
 0, 0, 5;
 0, 0, 6;
 4, 5, 0;
 6, 7, 0;
 8, 9, 0;
 2, 3, 0;
 -1, -1, -1;
 -1, -1, -1]

Parameters: * content the tensor to be encoded


TreeLSTM is a base class of all other kinds of tree lstms, , as described in the paper Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, and Christopher Manning.



val treeLSTM = BinaryTreeLSTM(


tree_lstm = BinaryTreeLSTM(

This class is an implementation of Binary TreeLSTM (Constituency Tree LSTM) receiving Constituency-based parse trees. Tree-LSTM is a kind of recursive neural networks, as described in the paper Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, and Christopher Manning.

The input tensor in forward(input) is expected to be a table, in which the first element is a 3D tensor (batch x leaf number x inputSize) and the second elment is the 3D embedding tensor tree (batch x tree node number x (number of branches + 1)]. output of forward(input) is expected to be a 3D tensor (batch x tree node number x hiddenSize).

Parameters: inputSize the size of each input vector hiddenSize hidden unit size in GRU gateOutput whether gate the output. Default is true withGraph whether create lstms with Default is true.

Scala example:



    val hiddenSize = 2
    val inputSize = 2

    val inputs =
        T(T(T(1f, 2f),
          T(2f, 3f),
          T(4f, 5f))))

    val tree =
        T(T(T(2f, 5f, -1f),
          T(0f, 0f, 1f),
          T(0f, 0f, 2f),
          T(0f, 0f, 3f),
          T(3f, 4f, 0f))))

    val input = T(inputs, tree)

    val gradOutput =
        T(T(T(2f, 5f),
          T(2f, 3f),
          T(4f, 5f),
          T(2f, 3f),
          T(4f, 5f),
          T(6f, 7f))))

    val model = BinaryTreeLSTM(inputSize, hiddenSize)

    val output = model.forward(input)
    (1,.,.) =
    -0.07799375 -0.14419462 
    -0.23495524 -0.04679072 
    -0.15945151 -0.026039641    
    -0.0454074  -0.007066241    
    -0.058696028    -0.13559057 

    [ of size 1x5x2]

    val gradInput = model.backward(input, gradOutput)
        2: (1,.,.) =
           0.0  0.0 0.0 
           0.0  0.0 0.0 
           0.0  0.0 0.0 
           0.0  0.0 0.0 
           0.0  0.0 0.0 

           [ of size 1x5x3]
        1: (1,.,.) =
           0.56145966   -0.3383652  
           0.81720364   -0.46767634 
           0.37739626   -0.23355529 

           [ of size 1x3x2]

Python example:

from bigdl.nn.layer import *
from bigdl.nn.criterion import *
import numpy as np

hidden_size = 2
input_size = 2
inputs = np.array([[
  [1.0, 2.0],
  [2.0, 3.0],
  [4.0, 5.0]

tree = np.array([[
  [2.0, 5.0, -1.0],
  [0.0, 0.0, 1.0],
  [0.0, 0.0, 2.0],
  [0.0, 0.0, 3.0],
  [3.0, 4.0, 0.0]

input = [inputs, tree]

grad_output = np.array([[
  [2.0, 3.0],
  [4.0, 5.0],
  [2.0, 3.0],
  [4.0, 5.0],
  [6.0, 7.0]

model = BinaryTreeLSTM(input_size, hidden_size)
output = model.forward(input)
print output
[[[-0.08113038 -0.0289295 ]
  [ 0.1378704   0.00550814]
  [ 0.33053339 -0.02395477]
  [ 0.26895314 -0.02019646]
  [ 0.34085754 -0.12480961]]]

gradient = model.backward(input, grad_output)
print gradient
[array([[[ 0.43623093,  0.97416967],
        [-0.02283204,  0.99245077],
        [-1.11290622,  0.84173977]]], dtype=float32), array([[[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]]], dtype=float32)]