Convolutional Network for Text Classification












3












$begingroup$


I am trying to train a convolutional neural network with Keras at recognizing tags for Stack Exchange questions about cooking.



The i-th question element of my data-set is like this:



id                                                         2
title How should I cook bacon in an oven?
content <p>I've heard of people cooking bacon in an ov...
tags oven cooking-time bacon
Name: 1, dtype: object


I have removed tags with BeautifulSoup and removed punctuation too.
Since questions' content are very big I have decided to focus on titles.
I have used sklearn CountVectorizer to vectorize words in titles. However they were more than 8000 words (excluding stop words). So I decided apply a part of speech tagging and retrieve only Nouns and Gerunds.



from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
titles = dataframes['cooking']['title']
pos_titles =
for i,title in enumerate(titles):
pos =
pt_titl = nltk.pos_tag(word_tokenize(title))
for pt in pt_titl:
if pt[1]=='NN' or pt[1]=='NNS' or pt[1]=='VBG':# or pt[1]=='VBP' or pt[1]=='VBS':
pos.append(pt[0])
pos_titles.append(" ".join(pos))


This represents my input vector. I have vectorized tags too and extract dense matrixes for both input and tags.



tags = [" ".join(x) for x in dataframes['cooking']['tags']]
Xd = X.todense()

Y = vectorizer.fit_transform(tags)
Yd = Y.todense()


Split data into train and validation set



from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Xd, Yd, test_size=0.33, random_state=42)


Now I am trying to train a Conv1D network



from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten
from keras.layers import Conv2D, MaxPooling2D,Conv1D, Embedding,GlobalMaxPooling1D,Dropout,MaxPooling1D

model = Sequential()

model.add(Embedding(Xd.shape[1],
128,
input_length=Xd.shape[1]))
model.add(Conv1D(32,5,activation='relu'))
model.add(MaxPooling1D(100,stride=50))
model.add(Conv1D(32,5,activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(Yd.shape[1], activation ='softmax'))


model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32,verbose=1)


But it gets stucked on a very low accuracy and it shows a barely increasing loss along the epochs



Epoch 1/10
10320/10320 [==============================] - 401s - loss: 15.8098 - acc: 0.0604
Epoch 2/10
10320/10320 [==============================] - 339s - loss: 15.5671 - acc: 0.0577
Epoch 3/10
10320/10320 [==============================] - 314s - loss: 15.5509 - acc: 0.0578
Epoch 4/10
10320/10320 [==============================] - 34953s - loss: 15.5493 - acc: 0.0578
Epoch 5/10
10320/10320 [==============================] - 323s - loss: 15.5587 - acc: 0.0578
Epoch 6/10
6272/10320 [=================>............] - ETA: 133s - loss: 15.6005 - acc: 0.0550









share|improve this question









$endgroup$




bumped to the homepage by Community 3 hours ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1




    $begingroup$
    Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
    $endgroup$
    – Neil Slater
    Mar 20 '17 at 7:51
















3












$begingroup$


I am trying to train a convolutional neural network with Keras at recognizing tags for Stack Exchange questions about cooking.



The i-th question element of my data-set is like this:



id                                                         2
title How should I cook bacon in an oven?
content <p>I've heard of people cooking bacon in an ov...
tags oven cooking-time bacon
Name: 1, dtype: object


I have removed tags with BeautifulSoup and removed punctuation too.
Since questions' content are very big I have decided to focus on titles.
I have used sklearn CountVectorizer to vectorize words in titles. However they were more than 8000 words (excluding stop words). So I decided apply a part of speech tagging and retrieve only Nouns and Gerunds.



from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
titles = dataframes['cooking']['title']
pos_titles =
for i,title in enumerate(titles):
pos =
pt_titl = nltk.pos_tag(word_tokenize(title))
for pt in pt_titl:
if pt[1]=='NN' or pt[1]=='NNS' or pt[1]=='VBG':# or pt[1]=='VBP' or pt[1]=='VBS':
pos.append(pt[0])
pos_titles.append(" ".join(pos))


This represents my input vector. I have vectorized tags too and extract dense matrixes for both input and tags.



tags = [" ".join(x) for x in dataframes['cooking']['tags']]
Xd = X.todense()

Y = vectorizer.fit_transform(tags)
Yd = Y.todense()


Split data into train and validation set



from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Xd, Yd, test_size=0.33, random_state=42)


Now I am trying to train a Conv1D network



from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten
from keras.layers import Conv2D, MaxPooling2D,Conv1D, Embedding,GlobalMaxPooling1D,Dropout,MaxPooling1D

model = Sequential()

model.add(Embedding(Xd.shape[1],
128,
input_length=Xd.shape[1]))
model.add(Conv1D(32,5,activation='relu'))
model.add(MaxPooling1D(100,stride=50))
model.add(Conv1D(32,5,activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(Yd.shape[1], activation ='softmax'))


model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32,verbose=1)


But it gets stucked on a very low accuracy and it shows a barely increasing loss along the epochs



Epoch 1/10
10320/10320 [==============================] - 401s - loss: 15.8098 - acc: 0.0604
Epoch 2/10
10320/10320 [==============================] - 339s - loss: 15.5671 - acc: 0.0577
Epoch 3/10
10320/10320 [==============================] - 314s - loss: 15.5509 - acc: 0.0578
Epoch 4/10
10320/10320 [==============================] - 34953s - loss: 15.5493 - acc: 0.0578
Epoch 5/10
10320/10320 [==============================] - 323s - loss: 15.5587 - acc: 0.0578
Epoch 6/10
6272/10320 [=================>............] - ETA: 133s - loss: 15.6005 - acc: 0.0550









share|improve this question









$endgroup$




bumped to the homepage by Community 3 hours ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1




    $begingroup$
    Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
    $endgroup$
    – Neil Slater
    Mar 20 '17 at 7:51














3












3








3





$begingroup$


I am trying to train a convolutional neural network with Keras at recognizing tags for Stack Exchange questions about cooking.



The i-th question element of my data-set is like this:



id                                                         2
title How should I cook bacon in an oven?
content <p>I've heard of people cooking bacon in an ov...
tags oven cooking-time bacon
Name: 1, dtype: object


I have removed tags with BeautifulSoup and removed punctuation too.
Since questions' content are very big I have decided to focus on titles.
I have used sklearn CountVectorizer to vectorize words in titles. However they were more than 8000 words (excluding stop words). So I decided apply a part of speech tagging and retrieve only Nouns and Gerunds.



from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
titles = dataframes['cooking']['title']
pos_titles =
for i,title in enumerate(titles):
pos =
pt_titl = nltk.pos_tag(word_tokenize(title))
for pt in pt_titl:
if pt[1]=='NN' or pt[1]=='NNS' or pt[1]=='VBG':# or pt[1]=='VBP' or pt[1]=='VBS':
pos.append(pt[0])
pos_titles.append(" ".join(pos))


This represents my input vector. I have vectorized tags too and extract dense matrixes for both input and tags.



tags = [" ".join(x) for x in dataframes['cooking']['tags']]
Xd = X.todense()

Y = vectorizer.fit_transform(tags)
Yd = Y.todense()


Split data into train and validation set



from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Xd, Yd, test_size=0.33, random_state=42)


Now I am trying to train a Conv1D network



from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten
from keras.layers import Conv2D, MaxPooling2D,Conv1D, Embedding,GlobalMaxPooling1D,Dropout,MaxPooling1D

model = Sequential()

model.add(Embedding(Xd.shape[1],
128,
input_length=Xd.shape[1]))
model.add(Conv1D(32,5,activation='relu'))
model.add(MaxPooling1D(100,stride=50))
model.add(Conv1D(32,5,activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(Yd.shape[1], activation ='softmax'))


model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32,verbose=1)


But it gets stucked on a very low accuracy and it shows a barely increasing loss along the epochs



Epoch 1/10
10320/10320 [==============================] - 401s - loss: 15.8098 - acc: 0.0604
Epoch 2/10
10320/10320 [==============================] - 339s - loss: 15.5671 - acc: 0.0577
Epoch 3/10
10320/10320 [==============================] - 314s - loss: 15.5509 - acc: 0.0578
Epoch 4/10
10320/10320 [==============================] - 34953s - loss: 15.5493 - acc: 0.0578
Epoch 5/10
10320/10320 [==============================] - 323s - loss: 15.5587 - acc: 0.0578
Epoch 6/10
6272/10320 [=================>............] - ETA: 133s - loss: 15.6005 - acc: 0.0550









share|improve this question









$endgroup$




I am trying to train a convolutional neural network with Keras at recognizing tags for Stack Exchange questions about cooking.



The i-th question element of my data-set is like this:



id                                                         2
title How should I cook bacon in an oven?
content <p>I've heard of people cooking bacon in an ov...
tags oven cooking-time bacon
Name: 1, dtype: object


I have removed tags with BeautifulSoup and removed punctuation too.
Since questions' content are very big I have decided to focus on titles.
I have used sklearn CountVectorizer to vectorize words in titles. However they were more than 8000 words (excluding stop words). So I decided apply a part of speech tagging and retrieve only Nouns and Gerunds.



from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
titles = dataframes['cooking']['title']
pos_titles =
for i,title in enumerate(titles):
pos =
pt_titl = nltk.pos_tag(word_tokenize(title))
for pt in pt_titl:
if pt[1]=='NN' or pt[1]=='NNS' or pt[1]=='VBG':# or pt[1]=='VBP' or pt[1]=='VBS':
pos.append(pt[0])
pos_titles.append(" ".join(pos))


This represents my input vector. I have vectorized tags too and extract dense matrixes for both input and tags.



tags = [" ".join(x) for x in dataframes['cooking']['tags']]
Xd = X.todense()

Y = vectorizer.fit_transform(tags)
Yd = Y.todense()


Split data into train and validation set



from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Xd, Yd, test_size=0.33, random_state=42)


Now I am trying to train a Conv1D network



from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten
from keras.layers import Conv2D, MaxPooling2D,Conv1D, Embedding,GlobalMaxPooling1D,Dropout,MaxPooling1D

model = Sequential()

model.add(Embedding(Xd.shape[1],
128,
input_length=Xd.shape[1]))
model.add(Conv1D(32,5,activation='relu'))
model.add(MaxPooling1D(100,stride=50))
model.add(Conv1D(32,5,activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(Yd.shape[1], activation ='softmax'))


model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=32,verbose=1)


But it gets stucked on a very low accuracy and it shows a barely increasing loss along the epochs



Epoch 1/10
10320/10320 [==============================] - 401s - loss: 15.8098 - acc: 0.0604
Epoch 2/10
10320/10320 [==============================] - 339s - loss: 15.5671 - acc: 0.0577
Epoch 3/10
10320/10320 [==============================] - 314s - loss: 15.5509 - acc: 0.0578
Epoch 4/10
10320/10320 [==============================] - 34953s - loss: 15.5493 - acc: 0.0578
Epoch 5/10
10320/10320 [==============================] - 323s - loss: 15.5587 - acc: 0.0578
Epoch 6/10
6272/10320 [=================>............] - ETA: 133s - loss: 15.6005 - acc: 0.0550






machine-learning deep-learning nlp keras






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 20 '17 at 6:59









SindicoSindico

167128




167128





bumped to the homepage by Community 3 hours ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 3 hours ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1




    $begingroup$
    Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
    $endgroup$
    – Neil Slater
    Mar 20 '17 at 7:51














  • 1




    $begingroup$
    Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
    $endgroup$
    – Neil Slater
    Mar 20 '17 at 7:51








1




1




$begingroup$
Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
$endgroup$
– Neil Slater
Mar 20 '17 at 7:51




$begingroup$
Take a look through the processed data, in human-readable form. I.e. just look at the filtered and stemmed nouns and gerunds. Test yourself. How well do you assign the correct tags? For comparison, try when faced with just the titles. This can be a sense check whether the task is achievable, as humans are naturally good at language processing, but the data you have collected, or how you have simplified it, may have made your task near impossible. If you can still do it, then it is time to consider whether your model needs changes.
$endgroup$
– Neil Slater
Mar 20 '17 at 7:51










1 Answer
1






active

oldest

votes


















0












$begingroup$

This wildml blog post has a very clear explanation of how to use 1D convolution on text. And Debo, DS at x.ai, provided some example Keras code to classify text using a character-based model (input documents are sequences of one-hot encoded characters rather than words or POS tags):



from keras.models import Model
from keras.layers import Input, Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution1D, MaxPooling1D

inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
border_mode='valid', activation='relu',
input_shape=(maxlen, vocab_size))(inputs)
conv = MaxPooling1D(pool_length=3)(conv)
conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
border_mode='valid', activation='relu')(conv)
conv1 = MaxPooling1D(pool_length=3)(conv1)
conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
border_mode='valid', activation='relu')(conv1)
conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
border_mode='valid', activation='relu')(conv2)
conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
border_mode='valid', activation='relu')(conv3)
conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
border_mode='valid', activation='relu')(conv4)
conv5 = MaxPooling1D(pool_length=3)(conv5)
conv5 = Flatten()(conv5)
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))

pred = Dense(n_out, activation='softmax', name='output')(z)
model = Model(input=inputs, output=pred)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])


The last 3 lines are important. You can't use softmax on your output and you can't use 'categorical_crossentropy' for multi-label tagging (your problem). Your text tagging problem should be broken down into multiple binary classification problems, or you need to use a different loss function like 'binary_crossentropy'. And for binary_crossentropy, use a sigmoid activation function rather than softmax on the output. See this SO answer for details on multi-label tagging problems in keras and TF.



If you want a more thorough explanation, check out Chapter 7 in my book, NLP In Action.






share|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f17701%2fconvolutional-network-for-text-classification%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    This wildml blog post has a very clear explanation of how to use 1D convolution on text. And Debo, DS at x.ai, provided some example Keras code to classify text using a character-based model (input documents are sequences of one-hot encoded characters rather than words or POS tags):



    from keras.models import Model
    from keras.layers import Input, Dense, Dropout, Flatten
    from keras.layers.convolutional import Convolution1D, MaxPooling1D

    inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
    conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
    border_mode='valid', activation='relu',
    input_shape=(maxlen, vocab_size))(inputs)
    conv = MaxPooling1D(pool_length=3)(conv)
    conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
    border_mode='valid', activation='relu')(conv)
    conv1 = MaxPooling1D(pool_length=3)(conv1)
    conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
    border_mode='valid', activation='relu')(conv1)
    conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
    border_mode='valid', activation='relu')(conv2)
    conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
    border_mode='valid', activation='relu')(conv3)
    conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
    border_mode='valid', activation='relu')(conv4)
    conv5 = MaxPooling1D(pool_length=3)(conv5)
    conv5 = Flatten()(conv5)
    z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
    z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))

    pred = Dense(n_out, activation='softmax', name='output')(z)
    model = Model(input=inputs, output=pred)
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
    metrics=['accuracy'])


    The last 3 lines are important. You can't use softmax on your output and you can't use 'categorical_crossentropy' for multi-label tagging (your problem). Your text tagging problem should be broken down into multiple binary classification problems, or you need to use a different loss function like 'binary_crossentropy'. And for binary_crossentropy, use a sigmoid activation function rather than softmax on the output. See this SO answer for details on multi-label tagging problems in keras and TF.



    If you want a more thorough explanation, check out Chapter 7 in my book, NLP In Action.






    share|improve this answer









    $endgroup$


















      0












      $begingroup$

      This wildml blog post has a very clear explanation of how to use 1D convolution on text. And Debo, DS at x.ai, provided some example Keras code to classify text using a character-based model (input documents are sequences of one-hot encoded characters rather than words or POS tags):



      from keras.models import Model
      from keras.layers import Input, Dense, Dropout, Flatten
      from keras.layers.convolutional import Convolution1D, MaxPooling1D

      inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
      conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
      border_mode='valid', activation='relu',
      input_shape=(maxlen, vocab_size))(inputs)
      conv = MaxPooling1D(pool_length=3)(conv)
      conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
      border_mode='valid', activation='relu')(conv)
      conv1 = MaxPooling1D(pool_length=3)(conv1)
      conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
      border_mode='valid', activation='relu')(conv1)
      conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
      border_mode='valid', activation='relu')(conv2)
      conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
      border_mode='valid', activation='relu')(conv3)
      conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
      border_mode='valid', activation='relu')(conv4)
      conv5 = MaxPooling1D(pool_length=3)(conv5)
      conv5 = Flatten()(conv5)
      z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
      z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))

      pred = Dense(n_out, activation='softmax', name='output')(z)
      model = Model(input=inputs, output=pred)
      model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
      metrics=['accuracy'])


      The last 3 lines are important. You can't use softmax on your output and you can't use 'categorical_crossentropy' for multi-label tagging (your problem). Your text tagging problem should be broken down into multiple binary classification problems, or you need to use a different loss function like 'binary_crossentropy'. And for binary_crossentropy, use a sigmoid activation function rather than softmax on the output. See this SO answer for details on multi-label tagging problems in keras and TF.



      If you want a more thorough explanation, check out Chapter 7 in my book, NLP In Action.






      share|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        This wildml blog post has a very clear explanation of how to use 1D convolution on text. And Debo, DS at x.ai, provided some example Keras code to classify text using a character-based model (input documents are sequences of one-hot encoded characters rather than words or POS tags):



        from keras.models import Model
        from keras.layers import Input, Dense, Dropout, Flatten
        from keras.layers.convolutional import Convolution1D, MaxPooling1D

        inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
        conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
        border_mode='valid', activation='relu',
        input_shape=(maxlen, vocab_size))(inputs)
        conv = MaxPooling1D(pool_length=3)(conv)
        conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
        border_mode='valid', activation='relu')(conv)
        conv1 = MaxPooling1D(pool_length=3)(conv1)
        conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
        border_mode='valid', activation='relu')(conv1)
        conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
        border_mode='valid', activation='relu')(conv2)
        conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
        border_mode='valid', activation='relu')(conv3)
        conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
        border_mode='valid', activation='relu')(conv4)
        conv5 = MaxPooling1D(pool_length=3)(conv5)
        conv5 = Flatten()(conv5)
        z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
        z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))

        pred = Dense(n_out, activation='softmax', name='output')(z)
        model = Model(input=inputs, output=pred)
        model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
        metrics=['accuracy'])


        The last 3 lines are important. You can't use softmax on your output and you can't use 'categorical_crossentropy' for multi-label tagging (your problem). Your text tagging problem should be broken down into multiple binary classification problems, or you need to use a different loss function like 'binary_crossentropy'. And for binary_crossentropy, use a sigmoid activation function rather than softmax on the output. See this SO answer for details on multi-label tagging problems in keras and TF.



        If you want a more thorough explanation, check out Chapter 7 in my book, NLP In Action.






        share|improve this answer









        $endgroup$



        This wildml blog post has a very clear explanation of how to use 1D convolution on text. And Debo, DS at x.ai, provided some example Keras code to classify text using a character-based model (input documents are sequences of one-hot encoded characters rather than words or POS tags):



        from keras.models import Model
        from keras.layers import Input, Dense, Dropout, Flatten
        from keras.layers.convolutional import Convolution1D, MaxPooling1D

        inputs = Input(shape=(maxlen, vocab_size), name='input', dtype='float32')
        conv = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[0],
        border_mode='valid', activation='relu',
        input_shape=(maxlen, vocab_size))(inputs)
        conv = MaxPooling1D(pool_length=3)(conv)
        conv1 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[1],
        border_mode='valid', activation='relu')(conv)
        conv1 = MaxPooling1D(pool_length=3)(conv1)
        conv2 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[2],
        border_mode='valid', activation='relu')(conv1)
        conv3 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[3],
        border_mode='valid', activation='relu')(conv2)
        conv4 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[4],
        border_mode='valid', activation='relu')(conv3)
        conv5 = Convolution1D(nb_filter=nb_filter, filter_length=filter_kernels[5],
        border_mode='valid', activation='relu')(conv4)
        conv5 = MaxPooling1D(pool_length=3)(conv5)
        conv5 = Flatten()(conv5)
        z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(conv5))
        z = Dropout(0.5)(Dense(dense_outputs, activation='relu')(z))

        pred = Dense(n_out, activation='softmax', name='output')(z)
        model = Model(input=inputs, output=pred)
        model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
        metrics=['accuracy'])


        The last 3 lines are important. You can't use softmax on your output and you can't use 'categorical_crossentropy' for multi-label tagging (your problem). Your text tagging problem should be broken down into multiple binary classification problems, or you need to use a different loss function like 'binary_crossentropy'. And for binary_crossentropy, use a sigmoid activation function rather than softmax on the output. See this SO answer for details on multi-label tagging problems in keras and TF.



        If you want a more thorough explanation, check out Chapter 7 in my book, NLP In Action.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 9 at 19:19









        hobshobs

        1114




        1114






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f17701%2fconvolutional-network-for-text-classification%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ponta tanko

            Tantalo (mitologio)

            Erzsébet Schaár