A hidden state is the output representation of the input tokens after they have been processed by a layer
In a Transformer model, a hidden state is the output representation of the input tokens after they have been processed by a layer. Unlike in Recurrent Neural Networks (RNNs), where a hidden state carries a sequential memory, each hidden state in a Transformer is a vector that represents the comb...