import tensorflow as tf print(tf. version )
In [1]:
2.0.0
In [2]:
!pip install nltk
Processing c:\users\win10\appdata\local\pip\cache\wheels\de\5e\42\64a baeca668161c3e2cecc24f864a8fc421e3d07a104fc8a51\nltk-3.5-py3-none-an y.whl
Collecting tqdm
Downloading tqdm-4.49.0-py2.py3-none-any.whl (69 kB) Collecting regex
Using cached regex-2020.7.14-cp36-cp36m-win_amd64.whl (268 kB) Collecting joblib
Using cached joblib-0.16.0-py3-none-any.whl (300 kB) Collecting click
Using cached click-7.1.2-py2.py3-none-any.whl (82 kB) Installing collected packages: tqdm, regex, joblib, click, nltk
Successfully installed click-7.1.2 joblib-0.16.0 nltk-3.5 regex-2020.
7.14 tqdm-4.49.0
import csv
import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))
In [3]:
Put the hyparameters at the top like this to make it easier to change and edit.
vocab_size = 5000
embedding_dim = 64
max_length = 200 trunc_type = 'post' padding_type = 'post' oov_tok = '<OOV>' training_portion = .8
In [4]:
First, let's define two lists that containing articles and labels. In the meantime, we remove stopwords.
articles = [] labels = []
with open("bbc-text.csv", 'r') as csvfile: reader = csv.reader(csvfile, delimiter=',') next(reader)
for row in reader: labels.append(row[0]) article = row[1]
for word in STOPWORDS:
token = ' ' + word + ' '
article = article.replace(token, ' ') article = article.replace(' ', ' ')
articles.append(article) print(len(labels)) print(len(articles))
In [5]:
2225
2225
There are only 2,225 articles in the data. Then we split into training set and validation set, according to the parameter we set earlier, 80% for training, 20% for validation.
train_size = int(len(articles) * training_portion)
train_articles = articles[0: train_size] train_labels = labels[0: train_size]
validation_articles = articles[train_size:] validation_labels = labels[train_size:]
print(train_size) print(len(train_articles)) print(len (train_labels)) print(len(validation_articles)) print(len (validation_labels))
In [6]:
1780
1780
1780
445
445
Tokenizer does all the heavy lifting for us. In our articles that it was tokenizing, it will take 5,000 most common words. oov_token is to put a special value in when an unseen word is encountered. This means I want "OOV" in bracket to be used to for words that are not in the word index. "fit_on_text" will go through all the text and create dictionary like this:
tokenizer = Tokenizer(num_words = vocab_size, oov_token=oov_tok) tokenizer.fit_on_texts(train_articles)
word_index = tokenizer.word_index
In [7]:
You can see that "OOV" in bracket is number 1, "said" is number 2, "mr" is number 3, and so on.
In [8]:
Out[8]:
dict(list(word_index.items())[0:10])
{'<OOV>': 1,
'said': 2,
'mr': 3,
'would': 4,
'year': 5,
'also': 6,
'people': 7,
'new': 8,
'us': 9,
'one': 10}
This process cleans up our text, lowercase, and remove punctuations.
After tokenization, the next step is to turn thoes tokens into lists of sequence.
train_sequences = tokenizer.texts_to_sequences(train_articles)
In [9]:
This is the 11th article in the training data that has been turned into sequences.
In [10]:
print(train_sequences[10])
[2431, 1, 225, 4996, 22, 642, 587, 225, 4996, 1, 1, 1663, 1, 1, 2431,
22, 565, 1, 1, 140, 278, 1, 140, 278, 796, 823, 662, 2307, 1, 1144, 1
694, 1, 1721, 4997, 1, 1, 1, 1, 1, 4738, 1, 1, 122, 4514, 1, 2, 2874,
1505, 352, 4739, 1, 52, 341, 1, 352, 2172, 3962, 41, 22, 3795, 1, 1,
1, 1, 543, 1, 1, 1, 835, 631, 2366, 347, 4740, 1, 365, 22, 1, 787, 23
67, 1, 4302, 138, 10, 1, 3666, 682, 3531, 1, 22, 1, 414, 823, 662, 1,
90, 13, 633, 1, 225, 4996, 1, 599, 1, 1694, 1021, 1, 4998, 808, 1864,
117, 1, 1, 1, 2974, 22, 1, 99, 278, 1, 1608, 4999, 543, 493, 1, 1443,
4741, 778, 1320, 1, 1861, 10, 33, 642, 319, 1, 62, 478, 565, 301, 150
6, |
22, 479, 1, 1, 1666, 1, 797, 1, 3066, 1, 1365, |
6, 1, 2431, 565, 2 |
2, |
2971, 4735, 1, 1, 1, 1, 1, 850, 39, 1825, 675, |
297, 26, 979, 1, 88 |
2, |
22, 361, 22, 13, 301, 1506, 1343, 374, 20, 63, |
883, 1096, 4303, 24 |
7] |
|
|
When we train neural networks for NLP, we need sequences to be in the same size, that's why we use padding. Our max_length is 200, so we use pad_sequences to make all of our articles the same length which is 200 in my example. That's why you see that the 1st article
was 426 in length, becomes 200, the 2nd article was 192 in length, becomes 200, and so on.
In [11]: train_padded = pad_sequences(train_sequences, maxlen=max_length, paddi
print(len(train_sequences[0])) print(len(train_padded[ 0]))
print(len(train_sequences[1])) print(len(train_padded[ 1]))
print(len(train_sequences[10])) print(len(train_padded[ 10]))
In [12]:
425
200
192
200
186
200
In addtion, there is padding type and truncating type, there are all "post". Means for example, for the 11th article, it was 186 in length, we padded to 200, and we padded at the end, add 14 zeros.
In [13]:
print(train_sequences[10])
[2431, 1, 225, 4996, 22, 642, 587, 225, 4996, 1, 1, 1663, 1, 1, 2431,
22, 565, 1, 1, 140, 278, 1, 140, 278, 796, 823, 662, 2307, 1, 1144, 1
694, 1, 1721, 4997, 1, 1, 1, 1, 1, 4738, 1, 1, 122, 4514, 1, 2, 2874,
1505, 352, 4739, 1, 52, 341, 1, 352, 2172, 3962, 41, 22, 3795, 1, 1,
1, 1, 543, 1, 1, 1, 835, 631, 2366, 347, 4740, 1, 365, 22, 1, 787, 23
67, 1, 4302, 138, 10, 1, 3666, 682, 3531, 1, 22, 1, 414, 823, 662, 1,
90, 13, 633, 1, 225, 4996, 1, 599, 1, 1694, 1021, 1, 4998, 808, 1864,
117, 1, 1, 1, 2974, 22, 1, 99, 278, 1, 1608, 4999, 543, 493, 1, 1443,
4741, 778, 1320, 1, 1861, 10, 33, 642, 319, 1, 62, 478, 565, 301, 150
6, 22, 479, 1, 1, 1666, 1, 797, 1, 3066, 1, 1365, 6, 1, 2431, 565, 2
2, 2971, 4735, 1, 1, 1, 1, 1, 850, 39, 1825, 675, 297, 26, 979, 1, 88
2, 22, 361, 22, 13, 301, 1506, 1343, 374, 20, 63, 883, 1096, 4303, 24
7]
print(train_padded[10])
In [14]:
[2431 |
1 |
225 |
4996 |
22 |
642 |
587 |
225 |
4996 |
1 |
1 |
1663 |
1 |
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
2431 |
22 |
565 |
1 |
1 |
140 |
278 |
1 |
140 |
278 |
796 |
823 |
662 |
230 |
7
1 1144 1694 1 1721 4997 1 1 1 1 1 4738 1
1
122 4514 1 2 2874 1505 352 4739 1 52 341 1 352 217
2
3962 41 22 3795 1 1 1 1 543 1 1 1 835 63
1
2366 347 4740 1 365 22 1 787 2367 1 4302 138 10
1
3666 682 3531 1 22 1 414 823 662 1 90 13 633
1
225 4996 1 599 1 1694 1021 1 4998 808 1864 117 1
1
1 2974 22 1 99 278 1 1608 4999 543 493 1 1443 474
1
778 1320 1 1861 10 33 642 319 1 62 478 565 301 150
6
22 479 1 1 1666 1 797 1 3066 1 1365 6 1 243
1
565 22 2971 4735 1 1 1 1 1 850 39 1825 675 29
7
26 979 1 882 22 361 22 13 301 1506 1343 374 20 6
3
883 1096 4303 247 0 0 0 0 0 0 0 0 0
0
0 0 0 0]
And for the 1st article, it was 426 in length, we truncated to 200, and we truncated at the end.
In [15]:
print(train_sequences[0])
[91, 160, 1141, 1106, 49, 979, 755, 1, 89, 1304, 4289, 129, 175, 365
4, 1214, 1195, 1578, 42, 7, 893, 91, 1, 334, 85, 20, 14, 130, 3262, 1
215, 2421, 570, 451, 1376, 58, 3378, 3521, 1661, 8, 921, 730, 10, 84
4, 1, 9, 598, 1579, 1107, 395, 1941, 1106, 731, 49, 538, 1398, 2012,
1623, 134, 249, 113, 2355, 795, 4981, 980, 584, 10, 3957, 3958, 921,
2562, 129, 344, 175, 3654, 1, 1, 39, 62, 2867, 28, 9, 4723, 18, 1305,
136, 416, 7, 143, 1423, 71, 4501, 436, 4982, 91, 1107, 77, 1, 82, 201
3, 53, 1, 91, 6, 1008, 609, |
89, 1304, 91, 1964, 131, 137, 420, |
9, |
286 |
8, 38, 152, 1234, 89, 1304, |
4724, 7, 436, 4982, 3154, 6, 2492, |
1, |
43 |
1, 1126, 1, 1424, 571, 1261, 1902, 1, 766, 9, 538, 1398, 2012, |
134, 2 |
||
069, 400, 845, 1965, 1601, 34, 1717, 2869, 1, 1, 2422, 244, 9, |
2624, |
||
82, 732, 6, 1173, 1196, 152, 720, 591, 1, 124, 28, 1305, 1690, |
432, 8 |
3, 933, 115, 20, 14, 18, 3155, 1, 37, 1484, 1, 23, 37, 87, 335, 2356,
37, 467, 255, 1965, 1359, 328, 1, 299, 732, 1174, 18, 2870, 1717, 1,
294, 756, 1074, 395, 2014, 387, 431, 2014, 2, 1360, 1, 1717, 2166, 6
7, 1, 1, 1718, 249, 1662, 3059, 1175, 395, 41, 878, 246, 2792, 345, 5
3, 548, 400, 2, 1, 1, 655, 1361, 203, 91, 3959, 91, 90, 42, 7, 320, 3
95, 77, |
893, 1, 91, 1106, 400, 538, 9, 845, |
2422, 11, 38, 1, 995, 51 |
||
4, 483, |
2070, 160, 572, 1, 128, 7, 320, 77, |
893, 1216, 1126, 1463, 34 |
||
6, 54, 2214, 1217, 741, 92, 256, 274, 1019, 71, 623, 346, 2423, 756, |
||||
1215, 2357, 1719, 1, 3784, 3522, 1, 1126, 2014, 177, |
371, |
1399, 77, 5 |
||
3, 548, 105, 1141, 3, 1, 1047, 93, 2962, 1, 2625, 1, |
102, |
902, 440, 4 |
52, 2, 3, 1, 2871, 451, 1425, 43, 77, 429, 31, 8, 1019, 921, 1, 2562,
30, 1, 91, 1691, 879, 89, 1304, 91, 1964, 1, 30, 8, 1624, 1, 1, 4290,
1580, 4289, 656, 1, 3785, 1008, 572, 4291, 2867, 10, 880, 656, 58, 1,
1262, 1, 1, 91, 1554, 934, 4723, 1, 577, 4106, 10, 9, 235, 2012, 91,
134, 1, 95, 656, 3263, |
1, 58, 520, 673, 2626, 3785, 4983, 3379, 483, |
4725, 39, 4501, 1, 91, |
1748, 673, 269, 116, 239, 2627, 354, 644, 58, |
4107, 757, 3655, 4723, |
146, 1, 400, 7, 71, 1749, 1107, 767, 910, 118, |
584, 3380, 1316, 1579, |
1, 1602, 7, 893, 77, 77] |
print(train_padded[0])
In [16]:
[ |
91 |
160 |
1141 |
1106 |
49 |
979 |
755 |
1 |
89 |
1304 |
4289 |
129 |
175 |
365 |
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1214 |
1195 |
1578 |
42 |
7 |
893 |
91 |
1 |
334 |
85 |
20 |
14 |
130 |
326 |
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1215 |
2421 |
570 |
451 |
1376 |
58 |
3378 |
3521 |
1661 |
8 |
921 |
730 |
10 |
84 |
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
9 |
598 |
1579 |
1107 |
395 |
1941 |
1106 |
731 |
49 |
538 |
1398 |
2012 |
162 |
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
134 |
249 |
113 |
2355 |
795 |
4981 |
980 |
584 |
10 |
3957 |
3958 |
921 |
2562 |
12 |
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
344 |
175 |
3654 |
1 |
1 |
39 |
62 |
2867 |
28 |
9 |
4723 |
18 |
1305 |
13 |
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
416 |
7 |
143 |
1423 |
71 |
4501 |
436 |
4982 |
91 |
1107 |
77 |
1 |
82 |
201 |
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
1 |
91 |
6 |
1008 |
609 |
89 |
1304 |
91 |
1964 |
131 |
137 |
420 |
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2868 |
38 |
152 |
1234 |
89 |
1304 |
4724 |
7 |
436 |
4982 |
3154 |
6 |
2492 |
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
431 |
1126 |
1 |
1424 |
571 |
1261 |
1902 |
1 |
766 |
9 |
538 |
1398 |
2012 |
13 |
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2069 |
400 |
845 |
1965 |
1601 |
34 |
1717 |
2869 |
1 |
1 |
2422 |
244 |
9 |
262 |
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
732 |
6 |
1173 |
1196 |
152 |
720 |
591 |
1 |
124 |
28 |
1305 |
1690 |
43 |
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
933 |
115 |
20 |
14 |
18 |
3155 |
1 |
37 |
1484 |
1 |
23 |
37 |
8 |
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
335 |
2356 |
37 |
467 |
255 |
1965 |
1359 |
328 |
1 |
299 |
732 |
1174 |
18 |
287 |
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1717 1 294 756]
Then we do the same for the validation sequences. Note that we should expect more out of vocabulary words from validation articles because word index were derived from the training articles.
In [17]:
validation_sequences = tokenizer.texts_to_sequences(validation_article validation_padded = pad_sequences(validation_sequences, maxlen=max_len
print(len(validation_sequences)) print(validation_padded.shape)
445
(445, 200)
Now we are going to look at the labels. because our labels are text, so we will tokenize them, when training, labels are expected to be numpy arrays. So we will turn list of labels into numpy arrays like so:
print(set(labels))
In [18]:
{'politics', 'entertainment', 'tech', 'sport', 'business'}
label_tokenizer = Tokenizer() label_tokenizer.fit_on_texts(labels)
training_label_seq = np.array(label_tokenizer.texts_to_sequences(train validation_label_seq = np.array(label_tokenizer.texts_to_sequences(val
In [19]:
print(training_label_seq[0]) print(training_label_seq[1]) print(training_label_seq[2]) print(training_label_seq.shape)
print(validation_label_seq[0]) print(validation_label_seq[1]) print(validation_label_seq[2]) print(validation_label_seq.shape)
In [20]:
[4]
[2]
[1]
(1780, 1)
[5]
[4]
[3]
(445, 1)
Before training deep neural network, we want to explore what our original article and article after padding look like. Running the following code, we explore the 11th article, we can see that some words become "OOV", because they did not make to the top 5,000.
In [22]:
reverse_word_index = dict([(value, key) for (key, value) in word_index
def decode_article(text):
return ' '.join([reverse_word_index.get(i, '?') for i in text]) print(decode_article(train_padded[10]))
print('---') print(train_articles[10])
berlin <OOV> anti nazi film german movie anti nazi <OOV> <OOV> drawn
<OOV> <OOV> berlin film festival <OOV> <OOV> final days <OOV> final d ays member white rose movement <OOV> 21 arrested <OOV> brother hans < OOV> <OOV> <OOV> <OOV> <OOV> tyranny <OOV> <OOV> director marc <OOV> said feeling responsibility keep legacy <OOV> going must <OOV> keep i deas alive added film drew <OOV> <OOV> <OOV> <OOV> trial <OOV> <OOV>
<OOV> east germany secret police discovery <OOV> behind film <OOV> wo rked closely <OOV> relatives including one <OOV> sisters ensure histo rical <OOV> film <OOV> members white rose <OOV> group first started < OOV> anti nazi <OOV> summer <OOV> arrested dropped <OOV> munich unive rsity calling day <OOV> <OOV> <OOV> regime film <OOV> six days <OOV> arrest intense trial saw <OOV> initially deny charges ended <OOV> app earance one three german films <OOV> top prize festival south african film version <OOV> <OOV> opera <OOV> shot <OOV> town <OOV> language a lso <OOV> berlin festival film entitled u <OOV> <OOV> <OOV> <OOV> <OO V> story set performed 40 strong music theatre <OOV> debut film perfo rmance film first south african feature 25 years second nominated gol den bear award ? ? ? ? ? ? ? ? ? ? ? ? ? ?
---
berlin cheers anti-nazi film german movie anti-nazi resistance heroin e drawn loud applause berlin film festival. sophie scholl - final da ys portrays final days member white rose movement. scholl 21 arrest ed beheaded brother hans 1943 distributing leaflets condemning abh orrent tyranny adolf hitler. director marc rothemund said: feeling responsibility keep legacy scholls going. must somehow keep ideas a live added. film drew transcripts gestapo interrogations scholl tr ial preserved archive communist east germany secret police. discovery inspiration behind film rothemund worked closely surviving relatives including one scholl sisters ensure historical accuracy film. scholl members white rose resistance group first started distributing anti-n azi leaflets summer 1942. arrested dropped leaflets munich university calling day reckoning adolf hitler regime. film focuses six days sc holl arrest intense trial saw scholl initially deny charges ended def iant appearance. one three german films vying top prize festival. so uth african film version bizet tragic opera carmen shot cape town xho sa language also premiered berlin festival. film entitled u-carmen ek hayelitsha carmen khayelitsha township story set. performed 40-strong music theatre troupe debut film performance. film first south african feature 25 years second nominated golden bear award.
Now we can implement LSTM. Here is my code that I build a tf.keras.Sequential model and start with an embedding layer. An embedding layer stores one vector per word. When called, it converts the sequences of word indices into sequences of vectors. After training, words with similar meanings often have the similar vectors.
Next is how to implement LSTM in code. The Bidirectional wrapper is used with a LSTM layer, this propagates the input forwards and backwards through the LSTM layer and then concatenates the outputs. This helps LSTM to learn long term dependencies. We then fit it to a dense neural network to do classification.
model = tf.keras.Sequential([
# Add an Embedding layer expecting input vocab of size 5000, and o tf.keras.layers.Embedding(vocab_size, embedding_dim), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(embedding_dim))
# tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
# use ReLU in place of tanh function since they are very good alte
tf.keras.layers.Dense(embedding_dim, activation='relu'),
# Add a Dense layer with 6 units and softmax activation.
# When we have multiple outputs, softmax convert outputs layers in
tf.keras.layers.Dense(6, activation='softmax')
])
model.summary()
In [39]:
Model: "sequential_3"
Layer (type) Output Shape Param #
=================================================================
embedding_3 (Embedding) |
(None, |
None, |
64) |
320000 |
bidirectional_3 (Bidirection |
(None, |
128) |
|
66048 |
dense_6 (Dense) |
(None, |
64) |
|
8256 |
dense_7 (Dense) |
(None, |
6) |
|
390 |
=================================================================
Total params: 394,694
Trainable params: 394,694
Non-trainable params: 0
In our model summay, we have our embeddings, our Bidirectional contains LSTM, followed by two dense layers. The output from Bidirectional is 128, because it doubled what we put in LSTM. We can also stack LSTM layer but I found the results worse.
In [40]: model.compile(loss='sparse_categorical_crossentropy', optimizer= 'adam'
num_epochs = 10
history = model.fit(train_padded, training_label_seq, epochs=num_epoch
In [41]:
Train on 1780 samples, validate on 445 samples Epoch 1/10
1780/1780 - 7s - loss: 1.5802 - accuracy: 0.2955 - val_loss: 1.3228 -val_accuracy: 0.4337
Epoch 2/10
1780/1780 - 5s - loss: 1.0225 - accuracy: 0.5798 - val_loss: 0.8686 -val_accuracy: 0.5820
Epoch 3/10
1780/1780 - 5s - loss: 0.5797 - accuracy: 0.7831 - val_loss: 0.5539 -val_accuracy: 0.8944
Epoch 4/10
1780/1780 - 5s - loss: 0.1793 - accuracy: 0.9646 - val_loss: 0.2454 -val_accuracy: 0.9416
Epoch 5/10
1780/1780 - 5s - loss: 0.1457 - accuracy: 0.9567 - val_loss: 0.3868 -val_accuracy: 0.8494
Epoch 6/10
1780/1780 - 5s - loss: 0.0972 - accuracy: 0.9691 - val_loss: 0.2848 -val_accuracy: 0.9124
Epoch 7/10
1780/1780 - 5s - loss: 0.0431 - accuracy: 0.9848 - val_loss: 0.2873 -val_accuracy: 0.9169
Epoch 8/10
1780/1780 - 5s - loss: 0.0226 - accuracy: 0.9927 - val_loss: 0.2457 -val_accuracy: 0.9416
Epoch 9/10
1780/1780 - 5s - loss: 0.0179 - accuracy: 0.9949 - val_loss: 0.2983 -val_accuracy: 0.9191
Epoch 10/10
1780/1780 - 5s - loss: 0.0094 - accuracy: 0.9983 - val_loss: 0.2793 -val_accuracy: 0.9281
In [42]:
!pip install matplotlib
Requirement already satisfied: matplotlib in c:\users\win10\.conda\en vs\tensorflow2\lib\site-packages (3.3.2)
Requirement already satisfied: certifi>=2020.06.20 in c:\users\win10
\.conda\envs\tensorflow2\lib\site-packages (from matplotlib) (2020.6. 20)
Requirement already satisfied: numpy>=1.15 in c:\users\win10\.conda\e nvs\tensorflow2\lib\site-packages (from matplotlib) (1.19.1) Requirement already satisfied: python-dateutil>=2.1 in c:\users\win10
\.conda\envs\tensorflow2\lib\site-packages (from matplotlib) (2.8.1) Requirement already satisfied: cycler>=0.10 in c:\users\win10\.conda
\envs\tensorflow2\lib\site-packages (from matplotlib) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\win10\.c onda\envs\tensorflow2\lib\site-packages (from matplotlib) (1.2.0) Requirement already satisfied: pillow>=6.2.0 in c:\users\win10\.conda
\envs\tensorflow2\lib\site-packages (from matplotlib) (7.2.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.
0.3 in c:\users\win10\.conda\envs\tensorflow2\lib\site-packages (from matplotlib) (2.4.7)
Requirement already satisfied: six>=1.5 in c:\users\win10\.conda\envs
\tensorflow2\lib\site-packages (from python-dateutil>=2.1->matplotli b) (1.15.0)
In [ ]:
from matplotlib import pyplot as plt
def plot_graphs(history, string): plt.plot(history.history[string]) plt.plot(history.history[ 'val_'+string]) plt.xlabel( "Epochs")
plt.ylabel(string) plt.legend([string, 'val_'+string]) plt.show()
plot_graphs(history, "accuracy") plot_graphs(history, "loss")
In [43]:
txt = ["A WeWork shareholder has taken the company to court over the n seq = tokenizer.texts_to_sequences(txt) padded = pad_sequences(seq, maxlen=max_length) pred = model.predict(padded) pred |
||
|
|
|
In [44]:
Out[44]: array([[0.11773098, 0.11342432, 0.05740624, 0.43609414, 0.12342227,
0.15192208]], dtype=float32)
In [48]:
txt = ["A WeWork shareholder has taken the company to court over the n seq = tokenizer.texts_to_sequences(txt) padded = pad_sequences(seq, maxlen=max_length) pred = model.predict(padded) labels = ['sport', 'bussiness', 'politics', 'tech', 'entertainment','u print(pred, labels[np.argmax(pred)]) |
||
|
[[0.11773098 0.11342432 0.05740624 0.43609414 0.12342227 0.15192208]]
tech
np.argmax(pred)
In [49]:
Out[49]: 3
In [ ]: