Python : Day 11 – Lesson 11


What is Encoder-Decoder?

The Encoder-Decoder model is mainly a concept in the NLP field. It does not specifically value a specific algorithm, but a general term for a class of algorithms. Encoder-Decoder can be regarded as a general framework, under which different algorithms can be used to solve different tasks.


The Encoder-Decoder framework is a good illustration of the core ideas of machine learning:


Encoder is also called an encoder. Its role is to "transform real problems into mathematical problems"




Decoder, also known as decoder, is used to "solve mathematical problems and translate them into real-world solutions"


The two links are connected, and the general diagram is as follows:



Encoder-Decoder model and implementation of RNN

Encoder-Decoder (encoding-decoding) is a very common model framework in deep learning. For example, auto-encoding of unsupervised algorithms is designed and trained with the structure of encoding-decoding; Is the encoding-decoding framework of CNN-RNN; for example, the neural network machine translation NMT model is often the LSTM-LSTM encoding-decoding framework.


Therefore, to be precise, Encoder-Decoder is not a specific model, but a kind of framework. Encoder and Decoder parts can be any text, voice, image, video data, and models can use CNN, RNN, BiRNN, LSTM, GRU, etc. So based on Encoder-Decoder, we can design a variety of application algorithms.


One of the most significant features of the Encoder-Decoder framework is that it is an End-to-End learning algorithm. Such models are often used in machine translation, such as translating French to English . Such a model is also called Sequence to Sequence learning. The so-called encoding is to convert the input sequence into a fixed-length vector and decoding is to convert the previously generated fixed vector into an output sequence.


The Encoder-Decoder framework intuition

It can be considered as a general processing model suitable for processing one sentence (or chapter) to generate another sentence (or chapter). For sentence pair <X, Y>, our goal is to give the input sentence X, and expect to generate the target sentence Y through the Encoder-Decoder framework. X and Y can be the same language or two different languages. X and Y are composed of their respective word sequences:


Encoder, as its name implies, encodes the input sentence X, and transforms the input sentence into an intermediate semantic representation C through a non-linear transformation.



For the decoder Decoder, its task is based on the sentence X The middle semantics means that C and the historical information y1, y2 yi-1 that have been generated before to

generate the word yi to be generated at time i.



Encoder-Decoder is not a model, but a framework, a way to deal with problems.


Applications

Encoder-Decoder is a model framework in the field of NLP. It is widely used for tasks such as machine translation and speech recognition.


Machine translation, dialogue robot, poetry generation, code completion, article abstract (text-text)


"Text-text" is the most typical application, and the length of the input sequence and output sequence may be quite different.


Google's paper " Sequence to Sequence Learning with Neural Networks (https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf) " using Seq2Seq for machine translation.



Speech recognition (audio-text)

Speech recognition also has strong sequence features, which is more suitable for the Encoder-Decoder model.


Google's paper " A Comparison of Sequence-to-Sequence Models for Speech Recognition (https://research.google/pubs/pub46169/) " using Seq2Seq for speech recognition



Image description generation (Picture-Text)

Popularly speaking is "seeing pictures and speaking", the machine extracts the features of the pictures, and then expresses them in words. This application is a combination of computer vision and NLP.


Image description generated paper " Sequence to Sequence – Video to Text (https://arxiv.org/abs/1505.00487) "



Encoder-Decoder flaws

As mentioned above, there is only one "vector c" between the Encoder and the Decoder, and the length of c is fixed.


For the sake of understanding, we analogize the process of "compression-decompression":


Compressing an 800x800 pixel image into 100KB, it looks relatively clear. Compress another 3000X3000 pixel image to 100KB and it looks blurry.



Although the encoder-decoder model is very classic, its limitations are also very large.


The biggest limitation is that the only connection between encoding and decoding is a fixed-

length semantic vector C. That is, the encoder compresses the entire sequence of information into a fixed-length vector.


However, there are two disadvantages to this.


  1. One is that the semantic vector cannot completely represent the information of the entire sequence.

  2. The other is that the information carried by the content entered first will be diluted by the information entered later, or covered. The longer the input sequence, the

more severe this phenomenon is.


This makes it impossible to obtain enough information about the input sequence at the beginning of decoding, so the accuracy of decoding will naturally be compromised.


In [ ]: