Matching similar strings
$begingroup$
I have a list of conferences on different topics, e.g.
Conference on genomics and neurosciences
Advances in string theory and astrophysics
Genomics and neuroscience: 20 years of research
Swiss Physics society meeting on string theory and astrophysics
...
They fall into different classes, like 1 and 3, 2 and 4 together. What is the right tool to group those titles?
nlp
$endgroup$
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I have a list of conferences on different topics, e.g.
Conference on genomics and neurosciences
Advances in string theory and astrophysics
Genomics and neuroscience: 20 years of research
Swiss Physics society meeting on string theory and astrophysics
...
They fall into different classes, like 1 and 3, 2 and 4 together. What is the right tool to group those titles?
nlp
$endgroup$
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
$begingroup$
You can try this approach: datascience.stackexchange.com/a/35482/54395. It only checks semantic similarity, but might be enough as a start.
$endgroup$
– BrunoGL
Jul 15 '18 at 9:53
add a comment |
$begingroup$
I have a list of conferences on different topics, e.g.
Conference on genomics and neurosciences
Advances in string theory and astrophysics
Genomics and neuroscience: 20 years of research
Swiss Physics society meeting on string theory and astrophysics
...
They fall into different classes, like 1 and 3, 2 and 4 together. What is the right tool to group those titles?
nlp
$endgroup$
I have a list of conferences on different topics, e.g.
Conference on genomics and neurosciences
Advances in string theory and astrophysics
Genomics and neuroscience: 20 years of research
Swiss Physics society meeting on string theory and astrophysics
...
They fall into different classes, like 1 and 3, 2 and 4 together. What is the right tool to group those titles?
nlp
nlp
asked Jun 25 '18 at 21:46
LazyCatLazyCat
1062
1062
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
$begingroup$
You can try this approach: datascience.stackexchange.com/a/35482/54395. It only checks semantic similarity, but might be enough as a start.
$endgroup$
– BrunoGL
Jul 15 '18 at 9:53
add a comment |
1
$begingroup$
You can try this approach: datascience.stackexchange.com/a/35482/54395. It only checks semantic similarity, but might be enough as a start.
$endgroup$
– BrunoGL
Jul 15 '18 at 9:53
1
1
$begingroup$
You can try this approach: datascience.stackexchange.com/a/35482/54395. It only checks semantic similarity, but might be enough as a start.
$endgroup$
– BrunoGL
Jul 15 '18 at 9:53
$begingroup$
You can try this approach: datascience.stackexchange.com/a/35482/54395. It only checks semantic similarity, but might be enough as a start.
$endgroup$
– BrunoGL
Jul 15 '18 at 9:53
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
I assume you have some training data with labels, i.e. data where the titles are already linked to a given class? This is then supervised learning (as opposed to unsupervised learning), and so you could folow the following steps:
Step 1: you have words as input, so you will need a method to create numerical representation (vectors). For that you could look into algorithms such as Word2Vec, Doc2Vec, GLoVE or something like TF-IDF. If you go for the first, you might consider trying the spaCy library in python. Here is a tutorial on Word2Vec using spaCy.
Step 2: once you have your numerical representations for each of your titles, you need to somehow classify them. You could do this a few ways. Perhaps the simplest would be something like a clustering algorithm, e.g. the DB-Scan algorithm in SciKit Learn - here is a demo.
You could try more complicated methods, such as Support Vector Machines or Neural Networks, but probably best to start with a method that will get you to some results more quickly. You are classififying titles, so be sure to form your problem as a classification as opposed to a regression problem.
Step 3: assess your results and try changing a part of the loop above.
In the above, I assumed you are talking about the semantic meaning of the conference titles, and not similarity between literal word/letter combinations. That could of course be computed analytically, without the use of a model that learns.
In response to OP's comment:
From my experience, using TF-IDF or something called minimal new sets might be a good way to get your titles into representations that allow clustering. Once clusters are formed, it would be up to you to then interpret them and assign labels. If you know that there are e.g. only 10 conference, it shouldn't be too difficult to reach results. Have a look at this master thesis that does a similar thing - instead of conferences, they want to detect topics. Disclaimer: I supervised that thesis.
$endgroup$
$begingroup$
Thank you, I am going over your suggestions. The input is as listed, so no labels. A rather naive question: I can, for example, just try to match words in conference titles, the more words match, the closer the titles are and put a threshold, like if > 3 words match, declare them conferences on the same topic. There is a number of garbage words like "Conference", "Advances", "Workshop", "Society" and such, which I'll have to ignore. On a heuristical level, what would relatively advanced tools that you've mentioned give me over this approach?
$endgroup$
– LazyCat
Jun 26 '18 at 13:59
$begingroup$
@LazyCat - see my edit.
$endgroup$
– n1k31t4
Jun 26 '18 at 14:31
$begingroup$
@LazyCat you can usually turn unsupervised data into supervised. Is there a reason why you can't take a sample of your data and label it and proceed accordingly? That will give you the results you seek in an efficient, algorithmic approach.
$endgroup$
– I_Play_With_Data
Sep 24 '18 at 21:43
$begingroup$
@UnknownCoder It easy to turn supervised into unsupervised, but this the first time I hear about the other way around.
$endgroup$
– LazyCat
Sep 25 '18 at 0:15
$begingroup$
@LazyCat Sure, it happens all the time and is a perfectly acceptable practice. In fact, you can "bootstrap" your way into a model. Label a few dozen records, train a model and then use the model to run predictions. Now sit there and check the predictions and use the correct predictions and add them to your training set. Re-train the model with the newly expanded training set, use the model to run predictions, etc, etc. Before you know it you will have a training set with a pretty good number of labeled examples.
$endgroup$
– I_Play_With_Data
Sep 26 '18 at 18:51
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f33644%2fmatching-similar-strings%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I assume you have some training data with labels, i.e. data where the titles are already linked to a given class? This is then supervised learning (as opposed to unsupervised learning), and so you could folow the following steps:
Step 1: you have words as input, so you will need a method to create numerical representation (vectors). For that you could look into algorithms such as Word2Vec, Doc2Vec, GLoVE or something like TF-IDF. If you go for the first, you might consider trying the spaCy library in python. Here is a tutorial on Word2Vec using spaCy.
Step 2: once you have your numerical representations for each of your titles, you need to somehow classify them. You could do this a few ways. Perhaps the simplest would be something like a clustering algorithm, e.g. the DB-Scan algorithm in SciKit Learn - here is a demo.
You could try more complicated methods, such as Support Vector Machines or Neural Networks, but probably best to start with a method that will get you to some results more quickly. You are classififying titles, so be sure to form your problem as a classification as opposed to a regression problem.
Step 3: assess your results and try changing a part of the loop above.
In the above, I assumed you are talking about the semantic meaning of the conference titles, and not similarity between literal word/letter combinations. That could of course be computed analytically, without the use of a model that learns.
In response to OP's comment:
From my experience, using TF-IDF or something called minimal new sets might be a good way to get your titles into representations that allow clustering. Once clusters are formed, it would be up to you to then interpret them and assign labels. If you know that there are e.g. only 10 conference, it shouldn't be too difficult to reach results. Have a look at this master thesis that does a similar thing - instead of conferences, they want to detect topics. Disclaimer: I supervised that thesis.
$endgroup$
$begingroup$
Thank you, I am going over your suggestions. The input is as listed, so no labels. A rather naive question: I can, for example, just try to match words in conference titles, the more words match, the closer the titles are and put a threshold, like if > 3 words match, declare them conferences on the same topic. There is a number of garbage words like "Conference", "Advances", "Workshop", "Society" and such, which I'll have to ignore. On a heuristical level, what would relatively advanced tools that you've mentioned give me over this approach?
$endgroup$
– LazyCat
Jun 26 '18 at 13:59
$begingroup$
@LazyCat - see my edit.
$endgroup$
– n1k31t4
Jun 26 '18 at 14:31
$begingroup$
@LazyCat you can usually turn unsupervised data into supervised. Is there a reason why you can't take a sample of your data and label it and proceed accordingly? That will give you the results you seek in an efficient, algorithmic approach.
$endgroup$
– I_Play_With_Data
Sep 24 '18 at 21:43
$begingroup$
@UnknownCoder It easy to turn supervised into unsupervised, but this the first time I hear about the other way around.
$endgroup$
– LazyCat
Sep 25 '18 at 0:15
$begingroup$
@LazyCat Sure, it happens all the time and is a perfectly acceptable practice. In fact, you can "bootstrap" your way into a model. Label a few dozen records, train a model and then use the model to run predictions. Now sit there and check the predictions and use the correct predictions and add them to your training set. Re-train the model with the newly expanded training set, use the model to run predictions, etc, etc. Before you know it you will have a training set with a pretty good number of labeled examples.
$endgroup$
– I_Play_With_Data
Sep 26 '18 at 18:51
add a comment |
$begingroup$
I assume you have some training data with labels, i.e. data where the titles are already linked to a given class? This is then supervised learning (as opposed to unsupervised learning), and so you could folow the following steps:
Step 1: you have words as input, so you will need a method to create numerical representation (vectors). For that you could look into algorithms such as Word2Vec, Doc2Vec, GLoVE or something like TF-IDF. If you go for the first, you might consider trying the spaCy library in python. Here is a tutorial on Word2Vec using spaCy.
Step 2: once you have your numerical representations for each of your titles, you need to somehow classify them. You could do this a few ways. Perhaps the simplest would be something like a clustering algorithm, e.g. the DB-Scan algorithm in SciKit Learn - here is a demo.
You could try more complicated methods, such as Support Vector Machines or Neural Networks, but probably best to start with a method that will get you to some results more quickly. You are classififying titles, so be sure to form your problem as a classification as opposed to a regression problem.
Step 3: assess your results and try changing a part of the loop above.
In the above, I assumed you are talking about the semantic meaning of the conference titles, and not similarity between literal word/letter combinations. That could of course be computed analytically, without the use of a model that learns.
In response to OP's comment:
From my experience, using TF-IDF or something called minimal new sets might be a good way to get your titles into representations that allow clustering. Once clusters are formed, it would be up to you to then interpret them and assign labels. If you know that there are e.g. only 10 conference, it shouldn't be too difficult to reach results. Have a look at this master thesis that does a similar thing - instead of conferences, they want to detect topics. Disclaimer: I supervised that thesis.
$endgroup$
$begingroup$
Thank you, I am going over your suggestions. The input is as listed, so no labels. A rather naive question: I can, for example, just try to match words in conference titles, the more words match, the closer the titles are and put a threshold, like if > 3 words match, declare them conferences on the same topic. There is a number of garbage words like "Conference", "Advances", "Workshop", "Society" and such, which I'll have to ignore. On a heuristical level, what would relatively advanced tools that you've mentioned give me over this approach?
$endgroup$
– LazyCat
Jun 26 '18 at 13:59
$begingroup$
@LazyCat - see my edit.
$endgroup$
– n1k31t4
Jun 26 '18 at 14:31
$begingroup$
@LazyCat you can usually turn unsupervised data into supervised. Is there a reason why you can't take a sample of your data and label it and proceed accordingly? That will give you the results you seek in an efficient, algorithmic approach.
$endgroup$
– I_Play_With_Data
Sep 24 '18 at 21:43
$begingroup$
@UnknownCoder It easy to turn supervised into unsupervised, but this the first time I hear about the other way around.
$endgroup$
– LazyCat
Sep 25 '18 at 0:15
$begingroup$
@LazyCat Sure, it happens all the time and is a perfectly acceptable practice. In fact, you can "bootstrap" your way into a model. Label a few dozen records, train a model and then use the model to run predictions. Now sit there and check the predictions and use the correct predictions and add them to your training set. Re-train the model with the newly expanded training set, use the model to run predictions, etc, etc. Before you know it you will have a training set with a pretty good number of labeled examples.
$endgroup$
– I_Play_With_Data
Sep 26 '18 at 18:51
add a comment |
$begingroup$
I assume you have some training data with labels, i.e. data where the titles are already linked to a given class? This is then supervised learning (as opposed to unsupervised learning), and so you could folow the following steps:
Step 1: you have words as input, so you will need a method to create numerical representation (vectors). For that you could look into algorithms such as Word2Vec, Doc2Vec, GLoVE or something like TF-IDF. If you go for the first, you might consider trying the spaCy library in python. Here is a tutorial on Word2Vec using spaCy.
Step 2: once you have your numerical representations for each of your titles, you need to somehow classify them. You could do this a few ways. Perhaps the simplest would be something like a clustering algorithm, e.g. the DB-Scan algorithm in SciKit Learn - here is a demo.
You could try more complicated methods, such as Support Vector Machines or Neural Networks, but probably best to start with a method that will get you to some results more quickly. You are classififying titles, so be sure to form your problem as a classification as opposed to a regression problem.
Step 3: assess your results and try changing a part of the loop above.
In the above, I assumed you are talking about the semantic meaning of the conference titles, and not similarity between literal word/letter combinations. That could of course be computed analytically, without the use of a model that learns.
In response to OP's comment:
From my experience, using TF-IDF or something called minimal new sets might be a good way to get your titles into representations that allow clustering. Once clusters are formed, it would be up to you to then interpret them and assign labels. If you know that there are e.g. only 10 conference, it shouldn't be too difficult to reach results. Have a look at this master thesis that does a similar thing - instead of conferences, they want to detect topics. Disclaimer: I supervised that thesis.
$endgroup$
I assume you have some training data with labels, i.e. data where the titles are already linked to a given class? This is then supervised learning (as opposed to unsupervised learning), and so you could folow the following steps:
Step 1: you have words as input, so you will need a method to create numerical representation (vectors). For that you could look into algorithms such as Word2Vec, Doc2Vec, GLoVE or something like TF-IDF. If you go for the first, you might consider trying the spaCy library in python. Here is a tutorial on Word2Vec using spaCy.
Step 2: once you have your numerical representations for each of your titles, you need to somehow classify them. You could do this a few ways. Perhaps the simplest would be something like a clustering algorithm, e.g. the DB-Scan algorithm in SciKit Learn - here is a demo.
You could try more complicated methods, such as Support Vector Machines or Neural Networks, but probably best to start with a method that will get you to some results more quickly. You are classififying titles, so be sure to form your problem as a classification as opposed to a regression problem.
Step 3: assess your results and try changing a part of the loop above.
In the above, I assumed you are talking about the semantic meaning of the conference titles, and not similarity between literal word/letter combinations. That could of course be computed analytically, without the use of a model that learns.
In response to OP's comment:
From my experience, using TF-IDF or something called minimal new sets might be a good way to get your titles into representations that allow clustering. Once clusters are formed, it would be up to you to then interpret them and assign labels. If you know that there are e.g. only 10 conference, it shouldn't be too difficult to reach results. Have a look at this master thesis that does a similar thing - instead of conferences, they want to detect topics. Disclaimer: I supervised that thesis.
edited Jun 26 '18 at 14:34
answered Jun 25 '18 at 22:46
n1k31t4n1k31t4
6,8312422
6,8312422
$begingroup$
Thank you, I am going over your suggestions. The input is as listed, so no labels. A rather naive question: I can, for example, just try to match words in conference titles, the more words match, the closer the titles are and put a threshold, like if > 3 words match, declare them conferences on the same topic. There is a number of garbage words like "Conference", "Advances", "Workshop", "Society" and such, which I'll have to ignore. On a heuristical level, what would relatively advanced tools that you've mentioned give me over this approach?
$endgroup$
– LazyCat
Jun 26 '18 at 13:59
$begingroup$
@LazyCat - see my edit.
$endgroup$
– n1k31t4
Jun 26 '18 at 14:31
$begingroup$
@LazyCat you can usually turn unsupervised data into supervised. Is there a reason why you can't take a sample of your data and label it and proceed accordingly? That will give you the results you seek in an efficient, algorithmic approach.
$endgroup$
– I_Play_With_Data
Sep 24 '18 at 21:43
$begingroup$
@UnknownCoder It easy to turn supervised into unsupervised, but this the first time I hear about the other way around.
$endgroup$
– LazyCat
Sep 25 '18 at 0:15
$begingroup$
@LazyCat Sure, it happens all the time and is a perfectly acceptable practice. In fact, you can "bootstrap" your way into a model. Label a few dozen records, train a model and then use the model to run predictions. Now sit there and check the predictions and use the correct predictions and add them to your training set. Re-train the model with the newly expanded training set, use the model to run predictions, etc, etc. Before you know it you will have a training set with a pretty good number of labeled examples.
$endgroup$
– I_Play_With_Data
Sep 26 '18 at 18:51
add a comment |
$begingroup$
Thank you, I am going over your suggestions. The input is as listed, so no labels. A rather naive question: I can, for example, just try to match words in conference titles, the more words match, the closer the titles are and put a threshold, like if > 3 words match, declare them conferences on the same topic. There is a number of garbage words like "Conference", "Advances", "Workshop", "Society" and such, which I'll have to ignore. On a heuristical level, what would relatively advanced tools that you've mentioned give me over this approach?
$endgroup$
– LazyCat
Jun 26 '18 at 13:59
$begingroup$
@LazyCat - see my edit.
$endgroup$
– n1k31t4
Jun 26 '18 at 14:31
$begingroup$
@LazyCat you can usually turn unsupervised data into supervised. Is there a reason why you can't take a sample of your data and label it and proceed accordingly? That will give you the results you seek in an efficient, algorithmic approach.
$endgroup$
– I_Play_With_Data
Sep 24 '18 at 21:43
$begingroup$
@UnknownCoder It easy to turn supervised into unsupervised, but this the first time I hear about the other way around.
$endgroup$
– LazyCat
Sep 25 '18 at 0:15
$begingroup$
@LazyCat Sure, it happens all the time and is a perfectly acceptable practice. In fact, you can "bootstrap" your way into a model. Label a few dozen records, train a model and then use the model to run predictions. Now sit there and check the predictions and use the correct predictions and add them to your training set. Re-train the model with the newly expanded training set, use the model to run predictions, etc, etc. Before you know it you will have a training set with a pretty good number of labeled examples.
$endgroup$
– I_Play_With_Data
Sep 26 '18 at 18:51
$begingroup$
Thank you, I am going over your suggestions. The input is as listed, so no labels. A rather naive question: I can, for example, just try to match words in conference titles, the more words match, the closer the titles are and put a threshold, like if > 3 words match, declare them conferences on the same topic. There is a number of garbage words like "Conference", "Advances", "Workshop", "Society" and such, which I'll have to ignore. On a heuristical level, what would relatively advanced tools that you've mentioned give me over this approach?
$endgroup$
– LazyCat
Jun 26 '18 at 13:59
$begingroup$
Thank you, I am going over your suggestions. The input is as listed, so no labels. A rather naive question: I can, for example, just try to match words in conference titles, the more words match, the closer the titles are and put a threshold, like if > 3 words match, declare them conferences on the same topic. There is a number of garbage words like "Conference", "Advances", "Workshop", "Society" and such, which I'll have to ignore. On a heuristical level, what would relatively advanced tools that you've mentioned give me over this approach?
$endgroup$
– LazyCat
Jun 26 '18 at 13:59
$begingroup$
@LazyCat - see my edit.
$endgroup$
– n1k31t4
Jun 26 '18 at 14:31
$begingroup$
@LazyCat - see my edit.
$endgroup$
– n1k31t4
Jun 26 '18 at 14:31
$begingroup$
@LazyCat you can usually turn unsupervised data into supervised. Is there a reason why you can't take a sample of your data and label it and proceed accordingly? That will give you the results you seek in an efficient, algorithmic approach.
$endgroup$
– I_Play_With_Data
Sep 24 '18 at 21:43
$begingroup$
@LazyCat you can usually turn unsupervised data into supervised. Is there a reason why you can't take a sample of your data and label it and proceed accordingly? That will give you the results you seek in an efficient, algorithmic approach.
$endgroup$
– I_Play_With_Data
Sep 24 '18 at 21:43
$begingroup$
@UnknownCoder It easy to turn supervised into unsupervised, but this the first time I hear about the other way around.
$endgroup$
– LazyCat
Sep 25 '18 at 0:15
$begingroup$
@UnknownCoder It easy to turn supervised into unsupervised, but this the first time I hear about the other way around.
$endgroup$
– LazyCat
Sep 25 '18 at 0:15
$begingroup$
@LazyCat Sure, it happens all the time and is a perfectly acceptable practice. In fact, you can "bootstrap" your way into a model. Label a few dozen records, train a model and then use the model to run predictions. Now sit there and check the predictions and use the correct predictions and add them to your training set. Re-train the model with the newly expanded training set, use the model to run predictions, etc, etc. Before you know it you will have a training set with a pretty good number of labeled examples.
$endgroup$
– I_Play_With_Data
Sep 26 '18 at 18:51
$begingroup$
@LazyCat Sure, it happens all the time and is a perfectly acceptable practice. In fact, you can "bootstrap" your way into a model. Label a few dozen records, train a model and then use the model to run predictions. Now sit there and check the predictions and use the correct predictions and add them to your training set. Re-train the model with the newly expanded training set, use the model to run predictions, etc, etc. Before you know it you will have a training set with a pretty good number of labeled examples.
$endgroup$
– I_Play_With_Data
Sep 26 '18 at 18:51
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f33644%2fmatching-similar-strings%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
You can try this approach: datascience.stackexchange.com/a/35482/54395. It only checks semantic similarity, but might be enough as a start.
$endgroup$
– BrunoGL
Jul 15 '18 at 9:53