What is Seq2Seq?
The so-called Seq2Seq(Sequence to Sequence) is a method that can generate another sequence by a specific method based on a given sequence.It was firstly proposed in 2014, having first, the two articles describes its main idea, namely Google Brain team "Sequence to Sequence Learning with Neural Networks" and Yoshua Bengio team "Learning Phrase Representation using RNN Encoder-Decoder for Statistical Machine Translation. The two articles coincided with a similar solution,and Seq2Seq was generated.
As a simple example, when we use machine translation: input (Hello) ---> output (hello). For another example, in human-machine dialogue, we ask the machine: "Who are you?", And the machine will return the answer "I am XX".
This figure shows the simple email conversation
The Encoder and Decoder in the figure only show one layer of ordinary LSTM cell(to prevent readers from misunderstanding, what needs to be explained here is that the cell used by Seq2Seq is not limited to LSTM).From the above structure, we can see that the entire model structure is still very simple. The state of the EncoderCell at the last moment is the intermediate semantic vector C, which will be the initial state of the DecoderCell. Then in the DecoderCell, the output of each moment will be used as the input of the next moment. And so on until the end of the predictive output special symbol < END > at the DecoderCell.
Here we call the Encoder stage the encoding stage . The corresponding Decoder stage is called the decoding stage . The intermediate semantic vector C can be regarded as a set of all input contents, and all input contents are included in C. The details will be explained later. Let's first look at the specific application scenarios of the Seq2Seq technology we will learn.
Application scenarios of Seq2Seq
With the development of computer technology, artificial intelligence technology, algorithm research, etc. and the needs of social development, Seq2Seq has produced some applications in many fields.
Machine Translation (currently the most famous Google translation is completely developed based on Seq2Seq + Attention Mechanism)
Chatbot(Microsoft Xiaobing, also used seq2seq technology)
The text summary is automatically generated(this technology is used by
headlines today)
The picture discription is automatically genreated
Machine writing poetry, code completion, generation of comit message, story style rewriting, etc.
Seq2Seq principle analysis
First, we need to make clear that the main idea of seq2seq is to solve the problem is to map a sequence as an input to a sequence as an output through a deep neural network model (commonly LSTM decoder), it consists of 2 links.
Here we must emphasize that input sequence and output sequence length of the
Seq2Seq implementation program is designed to be immutable.
Basic seq2seq model
The Notation of sequence
The seq2seq model converts the input sequence to the output sequence. Let the input sequence and output sequence br X and Y respectively. The i-th element of the input sequence is represented as 𝑥𝑖 , and the j-th element of the output sequence is represented as
𝑦𝑗 . Generally, each of the 𝑥𝑖 and the 𝑦𝑗 is the one-hot vector of the symbols. For e.g, in
Natural Language Processing(NLP), the one-hot vector represents the word and its size becomes the vocabulary size.
Lets think of the seq2seq model in terms of NLP. Let the vocabulary of input and outputs be
𝜈(𝑠) and 𝜈(𝑡), all the elements 𝑥𝑖 and 𝑦𝑗 satisfy
. The input sequence X and the output sequence Y are represented as the following equations:
I and J are the length of the input sequence and the output sequence. Using the NLP notation, 𝑦0 is the one-hot vector of the BOS(beginning of the sentence), which is the virtual word representing the beginning of the sentence, and 𝑦𝑗+1 is the EOS(end of the sentence), which is the virtual word represent the end of the sentence.
The Notation of the Conditional Probability P (Y/X)
Now, lets think about the conditional probability P (Y/X) generating the output sequence Y when the input sequence X is given. The purpose of seq2seq model is modelling the probability P (Y/X). However, seq2seq model does not model the probability P (Y|X) directly. Actually it models the probability P (𝑦𝑗|𝑌< 𝑗, 𝑋) which is the probability of generating the of
probability of j-th element of the output sequence 𝑦𝑗 given the 𝑌< 𝑗 and X. 𝑌< 𝑗 means the output sequence from 1 to j - 1 , We can write the model 𝑃𝜃 (Y|X) with the product of 𝑃𝜃 (𝑦𝑗|𝑌<𝑗, 𝑋) :