Time series feature extraction from raw sensor data for classification?












1












$begingroup$


I have a tabular raw data from sensors with associated label and i want to extract the time series features like mean,max,min and std from the data all the sensor data and form another table or export to csv file so that i can do classification task on that data.



Data table



enter image description here










share|improve this question











$endgroup$








  • 1




    $begingroup$
    Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
    $endgroup$
    – El Burro
    May 29 '18 at 11:40










  • $begingroup$
    I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
    $endgroup$
    – rosy
    May 29 '18 at 19:07
















1












$begingroup$


I have a tabular raw data from sensors with associated label and i want to extract the time series features like mean,max,min and std from the data all the sensor data and form another table or export to csv file so that i can do classification task on that data.



Data table



enter image description here










share|improve this question











$endgroup$








  • 1




    $begingroup$
    Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
    $endgroup$
    – El Burro
    May 29 '18 at 11:40










  • $begingroup$
    I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
    $endgroup$
    – rosy
    May 29 '18 at 19:07














1












1








1





$begingroup$


I have a tabular raw data from sensors with associated label and i want to extract the time series features like mean,max,min and std from the data all the sensor data and form another table or export to csv file so that i can do classification task on that data.



Data table



enter image description here










share|improve this question











$endgroup$




I have a tabular raw data from sensors with associated label and i want to extract the time series features like mean,max,min and std from the data all the sensor data and form another table or export to csv file so that i can do classification task on that data.



Data table



enter image description here







time-series feature-selection feature-extraction multiclass-classification feature-engineering






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 29 '18 at 19:16







rosy

















asked May 28 '18 at 20:38









rosyrosy

1815




1815








  • 1




    $begingroup$
    Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
    $endgroup$
    – El Burro
    May 29 '18 at 11:40










  • $begingroup$
    I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
    $endgroup$
    – rosy
    May 29 '18 at 19:07














  • 1




    $begingroup$
    Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
    $endgroup$
    – El Burro
    May 29 '18 at 11:40










  • $begingroup$
    I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
    $endgroup$
    – rosy
    May 29 '18 at 19:07








1




1




$begingroup$
Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
$endgroup$
– El Burro
May 29 '18 at 11:40




$begingroup$
Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
$endgroup$
– El Burro
May 29 '18 at 11:40












$begingroup$
I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
$endgroup$
– rosy
May 29 '18 at 19:07




$begingroup$
I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
$endgroup$
– rosy
May 29 '18 at 19:07










4 Answers
4






active

oldest

votes


















0












$begingroup$

For clarification: mean,max,min,std are not "time series features", they are data features in general.



Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.



You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.






share|improve this answer









$endgroup$













  • $begingroup$
    Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
    $endgroup$
    – rosy
    May 29 '18 at 19:05










  • $begingroup$
    This was not in the original question. Anyway, you can always create a for-loop going through all samples in batches of 10s with 50% overlapping, then assign the batch in a DataFrame object (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
    $endgroup$
    – pcko1
    May 29 '18 at 19:17










  • $begingroup$
    also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
    $endgroup$
    – pcko1
    May 29 '18 at 19:37










  • $begingroup$
    Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
    $endgroup$
    – rosy
    May 29 '18 at 19:41










  • $begingroup$
    Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
    $endgroup$
    – pcko1
    May 29 '18 at 19:44



















0












$begingroup$

I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?



When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?






share|improve this answer









$endgroup$





















    0












    $begingroup$

    Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.



    Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.






    share|improve this answer









    $endgroup$





















      0












      $begingroup$

      you can look at the tsfresh repository on github. It extract time series features from the sensor logs






      share|improve this answer








      New contributor




      Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













        Your Answer





        StackExchange.ifUsing("editor", function () {
        return StackExchange.using("mathjaxEditing", function () {
        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
        });
        });
        }, "mathjax-editing");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "557"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f32280%2ftime-series-feature-extraction-from-raw-sensor-data-for-classification%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        0












        $begingroup$

        For clarification: mean,max,min,std are not "time series features", they are data features in general.



        Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.



        You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.






        share|improve this answer









        $endgroup$













        • $begingroup$
          Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
          $endgroup$
          – rosy
          May 29 '18 at 19:05










        • $begingroup$
          This was not in the original question. Anyway, you can always create a for-loop going through all samples in batches of 10s with 50% overlapping, then assign the batch in a DataFrame object (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
          $endgroup$
          – pcko1
          May 29 '18 at 19:17










        • $begingroup$
          also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
          $endgroup$
          – pcko1
          May 29 '18 at 19:37










        • $begingroup$
          Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
          $endgroup$
          – rosy
          May 29 '18 at 19:41










        • $begingroup$
          Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
          $endgroup$
          – pcko1
          May 29 '18 at 19:44
















        0












        $begingroup$

        For clarification: mean,max,min,std are not "time series features", they are data features in general.



        Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.



        You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.






        share|improve this answer









        $endgroup$













        • $begingroup$
          Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
          $endgroup$
          – rosy
          May 29 '18 at 19:05










        • $begingroup$
          This was not in the original question. Anyway, you can always create a for-loop going through all samples in batches of 10s with 50% overlapping, then assign the batch in a DataFrame object (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
          $endgroup$
          – pcko1
          May 29 '18 at 19:17










        • $begingroup$
          also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
          $endgroup$
          – pcko1
          May 29 '18 at 19:37










        • $begingroup$
          Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
          $endgroup$
          – rosy
          May 29 '18 at 19:41










        • $begingroup$
          Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
          $endgroup$
          – pcko1
          May 29 '18 at 19:44














        0












        0








        0





        $begingroup$

        For clarification: mean,max,min,std are not "time series features", they are data features in general.



        Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.



        You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.






        share|improve this answer









        $endgroup$



        For clarification: mean,max,min,std are not "time series features", they are data features in general.



        Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.



        You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered May 29 '18 at 8:38









        pcko1pcko1

        1,521317




        1,521317












        • $begingroup$
          Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
          $endgroup$
          – rosy
          May 29 '18 at 19:05










        • $begingroup$
          This was not in the original question. Anyway, you can always create a for-loop going through all samples in batches of 10s with 50% overlapping, then assign the batch in a DataFrame object (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
          $endgroup$
          – pcko1
          May 29 '18 at 19:17










        • $begingroup$
          also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
          $endgroup$
          – pcko1
          May 29 '18 at 19:37










        • $begingroup$
          Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
          $endgroup$
          – rosy
          May 29 '18 at 19:41










        • $begingroup$
          Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
          $endgroup$
          – pcko1
          May 29 '18 at 19:44


















        • $begingroup$
          Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
          $endgroup$
          – rosy
          May 29 '18 at 19:05










        • $begingroup$
          This was not in the original question. Anyway, you can always create a for-loop going through all samples in batches of 10s with 50% overlapping, then assign the batch in a DataFrame object (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
          $endgroup$
          – pcko1
          May 29 '18 at 19:17










        • $begingroup$
          also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
          $endgroup$
          – pcko1
          May 29 '18 at 19:37










        • $begingroup$
          Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
          $endgroup$
          – rosy
          May 29 '18 at 19:41










        • $begingroup$
          Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
          $endgroup$
          – pcko1
          May 29 '18 at 19:44
















        $begingroup$
        Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
        $endgroup$
        – rosy
        May 29 '18 at 19:05




        $begingroup$
        Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
        $endgroup$
        – rosy
        May 29 '18 at 19:05












        $begingroup$
        This was not in the original question. Anyway, you can always create a for-loop going through all samples in batches of 10s with 50% overlapping, then assign the batch in a DataFrame object (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
        $endgroup$
        – pcko1
        May 29 '18 at 19:17




        $begingroup$
        This was not in the original question. Anyway, you can always create a for-loop going through all samples in batches of 10s with 50% overlapping, then assign the batch in a DataFrame object (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
        $endgroup$
        – pcko1
        May 29 '18 at 19:17












        $begingroup$
        also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
        $endgroup$
        – pcko1
        May 29 '18 at 19:37




        $begingroup$
        also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
        $endgroup$
        – pcko1
        May 29 '18 at 19:37












        $begingroup$
        Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
        $endgroup$
        – rosy
        May 29 '18 at 19:41




        $begingroup$
        Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
        $endgroup$
        – rosy
        May 29 '18 at 19:41












        $begingroup$
        Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
        $endgroup$
        – pcko1
        May 29 '18 at 19:44




        $begingroup$
        Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
        $endgroup$
        – pcko1
        May 29 '18 at 19:44











        0












        $begingroup$

        I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?



        When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?






        share|improve this answer









        $endgroup$


















          0












          $begingroup$

          I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?



          When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?






          share|improve this answer









          $endgroup$
















            0












            0








            0





            $begingroup$

            I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?



            When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?






            share|improve this answer









            $endgroup$



            I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?



            When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered May 28 '18 at 22:16









            The LyristThe Lyrist

            419112




            419112























                0












                $begingroup$

                Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.



                Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.






                share|improve this answer









                $endgroup$


















                  0












                  $begingroup$

                  Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.



                  Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.






                  share|improve this answer









                  $endgroup$
















                    0












                    0








                    0





                    $begingroup$

                    Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.



                    Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.






                    share|improve this answer









                    $endgroup$



                    Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.



                    Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered May 29 '18 at 8:24









                    Majid MortazaviMajid Mortazavi

                    1,7751224




                    1,7751224























                        0












                        $begingroup$

                        you can look at the tsfresh repository on github. It extract time series features from the sensor logs






                        share|improve this answer








                        New contributor




                        Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.






                        $endgroup$


















                          0












                          $begingroup$

                          you can look at the tsfresh repository on github. It extract time series features from the sensor logs






                          share|improve this answer








                          New contributor




                          Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.






                          $endgroup$
















                            0












                            0








                            0





                            $begingroup$

                            you can look at the tsfresh repository on github. It extract time series features from the sensor logs






                            share|improve this answer








                            New contributor




                            Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            $endgroup$



                            you can look at the tsfresh repository on github. It extract time series features from the sensor logs







                            share|improve this answer








                            New contributor




                            Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            share|improve this answer



                            share|improve this answer






                            New contributor




                            Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            answered 12 mins ago









                            Fahad Ali SarwarFahad Ali Sarwar

                            61




                            61




                            New contributor




                            Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.





                            New contributor





                            Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Data Science Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f32280%2ftime-series-feature-extraction-from-raw-sensor-data-for-classification%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Aikido

                                Tivadar Csontváry Kosztka

                                Metroo de Marsejlo