Convolutional Network for Text Classification
$begingroup$
I am trying to train a convolutional neural network with Keras at recognizing tags for Stack Exchange questions about cooking.
The i-th question element of my data-set is like this:
id 2
title How should I cook bacon in an oven?
content <p>I've heard of people cooking bacon in an ov...
tags oven cooking-time bacon
Name: 1, dtype: object
I have removed tags with BeautifulSoup and removed punctuation too.
Since questions' content are very big I have decided to focus on titles.
I have used sklearn CountVectorizer to vectorize words in titles. However they were more than 8000 words (excluding stop words). So I decided apply a part of speech tagging and retrieve only Nouns and Gerunds.
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
titles = dataframes['cooking']['title']
pos_titles =
for i,title in enumerate(titles):
pos =
pt_titl = nltk.pos_tag(word_tokenize(title))
for pt in pt_titl:
if pt[1]=='NN' or pt[1]=='NNS' or pt[1]=='VBG':# or pt[1]=='VBP' or pt[1]=='VBS':
pos.append(pt[0])
pos_titles.append(" ".join(pos))
This represents my input vector. I have vectorized tags too and extract dense matrixes for both input and tags.
tags = [" ".join(x) for x in dataframes['cooking']['tags']]
Xd = X.todense()
Y = vectorizer.fit_transform(tags)
Yd = Y.todense()
Split data into train and validation set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Xd, Yd, test_size=0.33, random_state=42)
Now I am trying to train a Conv1D network
from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten
from keras.layers import Conv2D, MaxPooling2D,Conv1D, Embedding,GlobalMaxPooling1D,Dropout,MaxPooling1D
model = Sequential()
model.add(Embedding(Xd.shape[1],
128,
input_length=Xd.shape[1]))
model.add(Conv1D(32,5,activation='relu'))
model.add(MaxPooling1D(100,stride=50))
model.add(Conv1D(32,5,activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(Yd.shape[1], activation ='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32,verbose=1)
But it gets stucked on a very low accuracy and it shows a barely increasing loss along the epochs
Epoch 1/10
10320/10320 [==============================] - 401s - loss: 15.8098 - acc: 0.0604
Epoch 2/10
10320/10320 [==============================] - 339s - loss: 15.5671 - acc: 0.0577
Epoch 3/10
10320/10320 [==============================] - 314s - loss: 15.5509 - acc: 0.0578
Epoch 4/10
10320/10320 [==============================] - 34953s - loss: 15.5493 - acc: 0.0578
Epoch 5/10
10320/10320 [==============================] - 323s - loss: 15.5587 - acc: 0.0578
Epoch 6/10
6272/10320 [=================>............] - ETA: 133s - loss: 15.6005 - acc: 0.0550
machine-learning deep-learning nlp keras
$endgroup$
bumped to the homepage by Community♦ 3 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I am trying to train a convolutional neural network with Keras at recognizing tags for Stack Exchange questions about cooking.
The i-th question element of my data-set is like this:
id 2
title How should I cook bacon in an oven?
content <p>I've heard of people cooking bacon in an ov...
tags oven cooking-time bacon
Name: 1, dtype: object
I have removed tags with BeautifulSoup and removed punctuation too.
Since questions' content are very big I have decided to focus on titles.
I have used sklearn CountVectorizer to vectorize words in titles. However they were more than 8000 words (excluding stop words). So I decided apply a part of speech tagging and retrieve only Nouns and Gerunds.
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
titles = dataframes['cooking']['title']
pos_titles =
for i,title in enumerate(titles):
pos =
pt_titl = nltk.pos_tag(word_tokenize(title))
for pt in pt_titl:
if pt[1]=='NN' or pt[1]=='NNS' or pt[1]=='VBG':# or pt[1]=='VBP' or pt[1]=='VBS':
pos.append(pt[0])
pos_titles.append(" ".join(pos))
This represents my input vector. I have vectorized tags too and extract dense matrixes for both input and tags.
tags = [" ".join(x) for x in dataframes['cooking']['tags']]
Xd = X.todense()
Y = vectorizer.fit_transform(tags)
Yd = Y.todense()
Split data into train and validation set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Xd, Yd, test_size=0.33, random_state=42)
Now I am trying to train a Conv1D network
from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten
from keras.layers import Conv2D, MaxPooling2D,Conv1D, Embedding,GlobalMaxPooling1D,Dropout,MaxPooling1D
model = Sequential()
model.add(Embedding(Xd.shape[1],
128,
input_length=Xd.shape[1]))
model.add(Conv1D(32,5,activation='relu'))
model.add(MaxPooling1D(100,stride=50))
model.add(Conv1D(32,5,activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(Yd.shape[1], activation ='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32,verbose=1)
But it gets stucked on a very low accuracy and it shows a barely increasing loss along the epochs
Epoch 1/10
10320/10320 [==============================] - 401s - loss: 15.8098 - acc: 0.0604
Epoch 2/10
10320/10320 [==============================] - 339s - loss: 15.5671 - acc: 0.0577
Epoch 3/10
10320/10320 [==============================] - 314s - loss: 15.5509 - acc: 0.0578
Epoch 4/10
10320/10320 [==============================] - 34953s - loss: 15.5493 - acc: 0.0578
Epoch 5/10
10320/10320 [==============================] - 323s - loss: 15.5587 - acc: 0.0578
Epoch 6/10
6272/10320 [=================>............] - ETA: 133s - loss: 15.6005 - acc: 0.0550
machine-learning deep-learning nlp keras
$endgroup$
bumped to the homepage by Community♦ 3 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
$begingroup$
Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
$endgroup$
– Neil Slater
Mar 20 '17 at 7:51
add a comment |
$begingroup$
I am trying to train a convolutional neural network with Keras at recognizing tags for Stack Exchange questions about cooking.
The i-th question element of my data-set is like this:
id 2
title How should I cook bacon in an oven?
content <p>I've heard of people cooking bacon in an ov...
tags oven cooking-time bacon
Name: 1, dtype: object
I have removed tags with BeautifulSoup and removed punctuation too.
Since questions' content are very big I have decided to focus on titles.
I have used sklearn CountVectorizer to vectorize words in titles. However they were more than 8000 words (excluding stop words). So I decided apply a part of speech tagging and retrieve only Nouns and Gerunds.
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
titles = dataframes['cooking']['title']
pos_titles =
for i,title in enumerate(titles):
pos =
pt_titl = nltk.pos_tag(word_tokenize(title))
for pt in pt_titl:
if pt[1]=='NN' or pt[1]=='NNS' or pt[1]=='VBG':# or pt[1]=='VBP' or pt[1]=='VBS':
pos.append(pt[0])
pos_titles.append(" ".join(pos))
This represents my input vector. I have vectorized tags too and extract dense matrixes for both input and tags.
tags = [" ".join(x) for x in dataframes['cooking']['tags']]
Xd = X.todense()
Y = vectorizer.fit_transform(tags)
Yd = Y.todense()
Split data into train and validation set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Xd, Yd, test_size=0.33, random_state=42)
Now I am trying to train a Conv1D network
from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten
from keras.layers import Conv2D, MaxPooling2D,Conv1D, Embedding,GlobalMaxPooling1D,Dropout,MaxPooling1D
model = Sequential()
model.add(Embedding(Xd.shape[1],
128,
input_length=Xd.shape[1]))
model.add(Conv1D(32,5,activation='relu'))
model.add(MaxPooling1D(100,stride=50))
model.add(Conv1D(32,5,activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(Yd.shape[1], activation ='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32,verbose=1)
But it gets stucked on a very low accuracy and it shows a barely increasing loss along the epochs
Epoch 1/10
10320/10320 [==============================] - 401s - loss: 15.8098 - acc: 0.0604
Epoch 2/10
10320/10320 [==============================] - 339s - loss: 15.5671 - acc: 0.0577
Epoch 3/10
10320/10320 [==============================] - 314s - loss: 15.5509 - acc: 0.0578
Epoch 4/10
10320/10320 [==============================] - 34953s - loss: 15.5493 - acc: 0.0578
Epoch 5/10
10320/10320 [==============================] - 323s - loss: 15.5587 - acc: 0.0578
Epoch 6/10
6272/10320 [=================>............] - ETA: 133s - loss: 15.6005 - acc: 0.0550
machine-learning deep-learning nlp keras
$endgroup$
I am trying to train a convolutional neural network with Keras at recognizing tags for Stack Exchange questions about cooking.
The i-th question element of my data-set is like this:
id 2
title How should I cook bacon in an oven?
content <p>I've heard of people cooking bacon in an ov...
tags oven cooking-time bacon
Name: 1, dtype: object
I have removed tags with BeautifulSoup and removed punctuation too.
Since questions' content are very big I have decided to focus on titles.
I have used sklearn CountVectorizer to vectorize words in titles. However they were more than 8000 words (excluding stop words). So I decided apply a part of speech tagging and retrieve only Nouns and Gerunds.
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
titles = dataframes['cooking']['title']
pos_titles =
for i,title in enumerate(titles):
pos =
pt_titl = nltk.pos_tag(word_tokenize(title))
for pt in pt_titl:
if pt[1]=='NN' or pt[1]=='NNS' or pt[1]=='VBG':# or pt[1]=='VBP' or pt[1]=='VBS':
pos.append(pt[0])
pos_titles.append(" ".join(pos))
This represents my input vector. I have vectorized tags too and extract dense matrixes for both input and tags.
tags = [" ".join(x) for x in dataframes['cooking']['tags']]
Xd = X.todense()
Y = vectorizer.fit_transform(tags)
Yd = Y.todense()
Split data into train and validation set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Xd, Yd, test_size=0.33, random_state=42)
Now I am trying to train a Conv1D network
from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten
from keras.layers import Conv2D, MaxPooling2D,Conv1D, Embedding,GlobalMaxPooling1D,Dropout,MaxPooling1D
model = Sequential()
model.add(Embedding(Xd.shape[1],
128,
input_length=Xd.shape[1]))
model.add(Conv1D(32,5,activation='relu'))
model.add(MaxPooling1D(100,stride=50))
model.add(Conv1D(32,5,activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(Yd.shape[1], activation ='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32,verbose=1)
But it gets stucked on a very low accuracy and it shows a barely increasing loss along the epochs
Epoch 1/10
10320/10320 [==============================] - 401s - loss: 15.8098 - acc: 0.0604
Epoch 2/10
10320/10320 [==============================] - 339s - loss: 15.5671 - acc: 0.0577
Epoch 3/10
10320/10320 [==============================] - 314s - loss: 15.5509 - acc: 0.0578
Epoch 4/10
10320/10320 [==============================] - 34953s - loss: 15.5493 - acc: 0.0578
Epoch 5/10
10320/10320 [==============================] - 323s - loss: 15.5587 - acc: 0.0578
Epoch 6/10
6272/10320 [=================>............] - ETA: 133s - loss: 15.6005 - acc: 0.0550
machine-learning deep-learning nlp keras
machine-learning deep-learning nlp keras
asked Mar 20 '17 at 6:59
SindicoSindico
167128
167128
bumped to the homepage by Community♦ 3 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 3 hours ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
$begingroup$
Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
$endgroup$
– Neil Slater
Mar 20 '17 at 7:51
add a comment |
1
$begingroup$
Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
$endgroup$
– Neil Slater
Mar 20 '17 at 7:51
1
1
$begingroup$
Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
$endgroup$
– Neil Slater
Mar 20 '17 at 7:51
$begingroup$
Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
$endgroup$
– Neil Slater
Mar 20 '17 at 7:51
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
This wildml blog post has a very clear explanation of how to use 1D convolution on text. And Debo, DS at x.ai, provided some example Keras code to classify text using a character-based model (input documents are sequences of one-hot encoded characters rather than words or POS tags):
from keras.models import Model
from keras.layers import Input, Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution1D, MaxPooling1D
inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
border_mode='valid', activation='relu',
input_shape=(maxlen, vocab_size))(inputs)
conv = MaxPooling1D(pool_length=3)(conv)
conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
border_mode='valid', activation='relu')(conv)
conv1 = MaxPooling1D(pool_length=3)(conv1)
conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
border_mode='valid', activation='relu')(conv1)
conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
border_mode='valid', activation='relu')(conv2)
conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
border_mode='valid', activation='relu')(conv3)
conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
border_mode='valid', activation='relu')(conv4)
conv5 = MaxPooling1D(pool_length=3)(conv5)
conv5 = Flatten()(conv5)
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))
pred = Dense(n_out, activation='softmax', name='output')(z)
model = Model(input=inputs, output=pred)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
The last 3 lines are important. You can't use softmax
on your output and you can't use 'categorical_crossentropy'
for multi-label tagging (your problem). Your text tagging problem should be broken down into multiple binary classification problems, or you need to use a different loss function like 'binary_crossentropy'
. And for binary_crossentropy
, use a sigmoid
activation function rather than softmax
on the output. See this SO answer for details on multi-label tagging problems in keras and TF.
If you want a more thorough explanation, check out Chapter 7 in my book, NLP In Action.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f17701%2fconvolutional-network-for-text-classification%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This wildml blog post has a very clear explanation of how to use 1D convolution on text. And Debo, DS at x.ai, provided some example Keras code to classify text using a character-based model (input documents are sequences of one-hot encoded characters rather than words or POS tags):
from keras.models import Model
from keras.layers import Input, Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution1D, MaxPooling1D
inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
border_mode='valid', activation='relu',
input_shape=(maxlen, vocab_size))(inputs)
conv = MaxPooling1D(pool_length=3)(conv)
conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
border_mode='valid', activation='relu')(conv)
conv1 = MaxPooling1D(pool_length=3)(conv1)
conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
border_mode='valid', activation='relu')(conv1)
conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
border_mode='valid', activation='relu')(conv2)
conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
border_mode='valid', activation='relu')(conv3)
conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
border_mode='valid', activation='relu')(conv4)
conv5 = MaxPooling1D(pool_length=3)(conv5)
conv5 = Flatten()(conv5)
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))
pred = Dense(n_out, activation='softmax', name='output')(z)
model = Model(input=inputs, output=pred)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
The last 3 lines are important. You can't use softmax
on your output and you can't use 'categorical_crossentropy'
for multi-label tagging (your problem). Your text tagging problem should be broken down into multiple binary classification problems, or you need to use a different loss function like 'binary_crossentropy'
. And for binary_crossentropy
, use a sigmoid
activation function rather than softmax
on the output. See this SO answer for details on multi-label tagging problems in keras and TF.
If you want a more thorough explanation, check out Chapter 7 in my book, NLP In Action.
$endgroup$
add a comment |
$begingroup$
This wildml blog post has a very clear explanation of how to use 1D convolution on text. And Debo, DS at x.ai, provided some example Keras code to classify text using a character-based model (input documents are sequences of one-hot encoded characters rather than words or POS tags):
from keras.models import Model
from keras.layers import Input, Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution1D, MaxPooling1D
inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
border_mode='valid', activation='relu',
input_shape=(maxlen, vocab_size))(inputs)
conv = MaxPooling1D(pool_length=3)(conv)
conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
border_mode='valid', activation='relu')(conv)
conv1 = MaxPooling1D(pool_length=3)(conv1)
conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
border_mode='valid', activation='relu')(conv1)
conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
border_mode='valid', activation='relu')(conv2)
conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
border_mode='valid', activation='relu')(conv3)
conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
border_mode='valid', activation='relu')(conv4)
conv5 = MaxPooling1D(pool_length=3)(conv5)
conv5 = Flatten()(conv5)
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))
pred = Dense(n_out, activation='softmax', name='output')(z)
model = Model(input=inputs, output=pred)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
The last 3 lines are important. You can't use softmax
on your output and you can't use 'categorical_crossentropy'
for multi-label tagging (your problem). Your text tagging problem should be broken down into multiple binary classification problems, or you need to use a different loss function like 'binary_crossentropy'
. And for binary_crossentropy
, use a sigmoid
activation function rather than softmax
on the output. See this SO answer for details on multi-label tagging problems in keras and TF.
If you want a more thorough explanation, check out Chapter 7 in my book, NLP In Action.
$endgroup$
add a comment |
$begingroup$
This wildml blog post has a very clear explanation of how to use 1D convolution on text. And Debo, DS at x.ai, provided some example Keras code to classify text using a character-based model (input documents are sequences of one-hot encoded characters rather than words or POS tags):
from keras.models import Model
from keras.layers import Input, Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution1D, MaxPooling1D
inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
border_mode='valid', activation='relu',
input_shape=(maxlen, vocab_size))(inputs)
conv = MaxPooling1D(pool_length=3)(conv)
conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
border_mode='valid', activation='relu')(conv)
conv1 = MaxPooling1D(pool_length=3)(conv1)
conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
border_mode='valid', activation='relu')(conv1)
conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
border_mode='valid', activation='relu')(conv2)
conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
border_mode='valid', activation='relu')(conv3)
conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
border_mode='valid', activation='relu')(conv4)
conv5 = MaxPooling1D(pool_length=3)(conv5)
conv5 = Flatten()(conv5)
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))
pred = Dense(n_out, activation='softmax', name='output')(z)
model = Model(input=inputs, output=pred)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
The last 3 lines are important. You can't use softmax
on your output and you can't use 'categorical_crossentropy'
for multi-label tagging (your problem). Your text tagging problem should be broken down into multiple binary classification problems, or you need to use a different loss function like 'binary_crossentropy'
. And for binary_crossentropy
, use a sigmoid
activation function rather than softmax
on the output. See this SO answer for details on multi-label tagging problems in keras and TF.
If you want a more thorough explanation, check out Chapter 7 in my book, NLP In Action.
$endgroup$
This wildml blog post has a very clear explanation of how to use 1D convolution on text. And Debo, DS at x.ai, provided some example Keras code to classify text using a character-based model (input documents are sequences of one-hot encoded characters rather than words or POS tags):
from keras.models import Model
from keras.layers import Input, Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution1D, MaxPooling1D
inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
border_mode='valid', activation='relu',
input_shape=(maxlen, vocab_size))(inputs)
conv = MaxPooling1D(pool_length=3)(conv)
conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
border_mode='valid', activation='relu')(conv)
conv1 = MaxPooling1D(pool_length=3)(conv1)
conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
border_mode='valid', activation='relu')(conv1)
conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
border_mode='valid', activation='relu')(conv2)
conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
border_mode='valid', activation='relu')(conv3)
conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
border_mode='valid', activation='relu')(conv4)
conv5 = MaxPooling1D(pool_length=3)(conv5)
conv5 = Flatten()(conv5)
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))
pred = Dense(n_out, activation='softmax', name='output')(z)
model = Model(input=inputs, output=pred)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
The last 3 lines are important. You can't use softmax
on your output and you can't use 'categorical_crossentropy'
for multi-label tagging (your problem). Your text tagging problem should be broken down into multiple binary classification problems, or you need to use a different loss function like 'binary_crossentropy'
. And for binary_crossentropy
, use a sigmoid
activation function rather than softmax
on the output. See this SO answer for details on multi-label tagging problems in keras and TF.
If you want a more thorough explanation, check out Chapter 7 in my book, NLP In Action.
answered Jan 9 at 19:19
hobshobs
1114
1114
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f17701%2fconvolutional-network-for-text-classification%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
$endgroup$
– Neil Slater
Mar 20 '17 at 7:51