NLP: Fuzzy Word/Phrase Match












0












$begingroup$


I am attempting to determine if a given phrase (or a few words) is present in a relatively large text. For example:



Text:




Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce sed
tristique purus, id lobortis justo. Vestibulum ante ipsum primis in
faucibus orci luctus et ultrices posuere cubilia Curae; Cras vitae
neque non nibh elementum malesuada convallis et nunc. Nam vel tellus
nec nunc dictum dignissim eu ut felis. In Tony Starkeget efficitur
nunc. Cras ultrices turpis est, ac eleifend leo congue at. Donec lorem
diam, mattis sed sollicitudin ac, tincidunt eu sem. Curabitur vel
euismod lectus, sit amet tempor massa. Vivamus ut dictum nisl. Aliquam
et urna sit amet urna hendrerit tincidunt in a mauris. Class aptent
taciti sociosqu ad litora torquent per conubia nostra, per inceptos
himenaeos. Maecenas vel justo metus. Sed gravida egestas velit,
porttitor pulvinar justo hendrerit et.




Phrase/Words to match in the text above:



tony.stark  
t.stark
stark_tony
starktony


The intention here is to infer if the person(Tony Stark) is being mentioned in a block of text.



I have read up on some fuzzy word match algorithms like Levenshtein and Soundex and also tested them in the above application but they appear to be useful for one word to one word match, not in the above application where various permutations of Tony stark is possible in both the pattern and text.



Would anyone be able to advice which fuzzy word matching algorithms would be ideal for this application, and perhaps share resources and sample code for its implementation.



Thanks.









share









$endgroup$

















    0












    $begingroup$


    I am attempting to determine if a given phrase (or a few words) is present in a relatively large text. For example:



    Text:




    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce sed
    tristique purus, id lobortis justo. Vestibulum ante ipsum primis in
    faucibus orci luctus et ultrices posuere cubilia Curae; Cras vitae
    neque non nibh elementum malesuada convallis et nunc. Nam vel tellus
    nec nunc dictum dignissim eu ut felis. In Tony Starkeget efficitur
    nunc. Cras ultrices turpis est, ac eleifend leo congue at. Donec lorem
    diam, mattis sed sollicitudin ac, tincidunt eu sem. Curabitur vel
    euismod lectus, sit amet tempor massa. Vivamus ut dictum nisl. Aliquam
    et urna sit amet urna hendrerit tincidunt in a mauris. Class aptent
    taciti sociosqu ad litora torquent per conubia nostra, per inceptos
    himenaeos. Maecenas vel justo metus. Sed gravida egestas velit,
    porttitor pulvinar justo hendrerit et.




    Phrase/Words to match in the text above:



    tony.stark  
    t.stark
    stark_tony
    starktony


    The intention here is to infer if the person(Tony Stark) is being mentioned in a block of text.



    I have read up on some fuzzy word match algorithms like Levenshtein and Soundex and also tested them in the above application but they appear to be useful for one word to one word match, not in the above application where various permutations of Tony stark is possible in both the pattern and text.



    Would anyone be able to advice which fuzzy word matching algorithms would be ideal for this application, and perhaps share resources and sample code for its implementation.



    Thanks.









    share









    $endgroup$















      0












      0








      0





      $begingroup$


      I am attempting to determine if a given phrase (or a few words) is present in a relatively large text. For example:



      Text:




      Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce sed
      tristique purus, id lobortis justo. Vestibulum ante ipsum primis in
      faucibus orci luctus et ultrices posuere cubilia Curae; Cras vitae
      neque non nibh elementum malesuada convallis et nunc. Nam vel tellus
      nec nunc dictum dignissim eu ut felis. In Tony Starkeget efficitur
      nunc. Cras ultrices turpis est, ac eleifend leo congue at. Donec lorem
      diam, mattis sed sollicitudin ac, tincidunt eu sem. Curabitur vel
      euismod lectus, sit amet tempor massa. Vivamus ut dictum nisl. Aliquam
      et urna sit amet urna hendrerit tincidunt in a mauris. Class aptent
      taciti sociosqu ad litora torquent per conubia nostra, per inceptos
      himenaeos. Maecenas vel justo metus. Sed gravida egestas velit,
      porttitor pulvinar justo hendrerit et.




      Phrase/Words to match in the text above:



      tony.stark  
      t.stark
      stark_tony
      starktony


      The intention here is to infer if the person(Tony Stark) is being mentioned in a block of text.



      I have read up on some fuzzy word match algorithms like Levenshtein and Soundex and also tested them in the above application but they appear to be useful for one word to one word match, not in the above application where various permutations of Tony stark is possible in both the pattern and text.



      Would anyone be able to advice which fuzzy word matching algorithms would be ideal for this application, and perhaps share resources and sample code for its implementation.



      Thanks.









      share









      $endgroup$




      I am attempting to determine if a given phrase (or a few words) is present in a relatively large text. For example:



      Text:




      Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce sed
      tristique purus, id lobortis justo. Vestibulum ante ipsum primis in
      faucibus orci luctus et ultrices posuere cubilia Curae; Cras vitae
      neque non nibh elementum malesuada convallis et nunc. Nam vel tellus
      nec nunc dictum dignissim eu ut felis. In Tony Starkeget efficitur
      nunc. Cras ultrices turpis est, ac eleifend leo congue at. Donec lorem
      diam, mattis sed sollicitudin ac, tincidunt eu sem. Curabitur vel
      euismod lectus, sit amet tempor massa. Vivamus ut dictum nisl. Aliquam
      et urna sit amet urna hendrerit tincidunt in a mauris. Class aptent
      taciti sociosqu ad litora torquent per conubia nostra, per inceptos
      himenaeos. Maecenas vel justo metus. Sed gravida egestas velit,
      porttitor pulvinar justo hendrerit et.




      Phrase/Words to match in the text above:



      tony.stark  
      t.stark
      stark_tony
      starktony


      The intention here is to infer if the person(Tony Stark) is being mentioned in a block of text.



      I have read up on some fuzzy word match algorithms like Levenshtein and Soundex and also tested them in the above application but they appear to be useful for one word to one word match, not in the above application where various permutations of Tony stark is possible in both the pattern and text.



      Would anyone be able to advice which fuzzy word matching algorithms would be ideal for this application, and perhaps share resources and sample code for its implementation.



      Thanks.







      nlp





      share












      share










      share



      share










      asked 8 mins ago









      KohKoh

      1162




      1162






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47057%2fnlp-fuzzy-word-phrase-match%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47057%2fnlp-fuzzy-word-phrase-match%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Ponta tanko

          Tantalo (mitologio)

          Erzsébet Schaár