Time series feature extraction from raw sensor data for classification?
$begingroup$
I have a tabular raw data from sensors with associated label and i want to extract the time series features like mean,max,min and std from the data all the sensor data and form another table or export to csv file so that i can do classification task on that data.
Data table

time-series feature-selection feature-extraction multiclass-classification feature-engineering
$endgroup$
add a comment |
$begingroup$
I have a tabular raw data from sensors with associated label and i want to extract the time series features like mean,max,min and std from the data all the sensor data and form another table or export to csv file so that i can do classification task on that data.
Data table

time-series feature-selection feature-extraction multiclass-classification feature-engineering
$endgroup$
1
$begingroup$
Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
$endgroup$
– El Burro
May 29 '18 at 11:40
$begingroup$
I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
$endgroup$
– rosy
May 29 '18 at 19:07
add a comment |
$begingroup$
I have a tabular raw data from sensors with associated label and i want to extract the time series features like mean,max,min and std from the data all the sensor data and form another table or export to csv file so that i can do classification task on that data.
Data table

time-series feature-selection feature-extraction multiclass-classification feature-engineering
$endgroup$
I have a tabular raw data from sensors with associated label and i want to extract the time series features like mean,max,min and std from the data all the sensor data and form another table or export to csv file so that i can do classification task on that data.
Data table

time-series feature-selection feature-extraction multiclass-classification feature-engineering
time-series feature-selection feature-extraction multiclass-classification feature-engineering
edited May 29 '18 at 19:16
rosy
asked May 28 '18 at 20:38
rosyrosy
1815
1815
1
$begingroup$
Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
$endgroup$
– El Burro
May 29 '18 at 11:40
$begingroup$
I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
$endgroup$
– rosy
May 29 '18 at 19:07
add a comment |
1
$begingroup$
Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
$endgroup$
– El Burro
May 29 '18 at 11:40
$begingroup$
I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
$endgroup$
– rosy
May 29 '18 at 19:07
1
1
$begingroup$
Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
$endgroup$
– El Burro
May 29 '18 at 11:40
$begingroup$
Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
$endgroup$
– El Burro
May 29 '18 at 11:40
$begingroup$
I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
$endgroup$
– rosy
May 29 '18 at 19:07
$begingroup$
I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
$endgroup$
– rosy
May 29 '18 at 19:07
add a comment |
4 Answers
4
active
oldest
votes
$begingroup$
For clarification: mean,max,min,std are not "time series features", they are data features in general.
Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.
You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.
$endgroup$
$begingroup$
Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
$endgroup$
– rosy
May 29 '18 at 19:05
$begingroup$
This was not in the original question. Anyway, you can always create afor-loopgoing through all samples in batches of 10s with 50% overlapping, then assign the batch in aDataFrameobject (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
$endgroup$
– pcko1
May 29 '18 at 19:17
$begingroup$
also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
$endgroup$
– pcko1
May 29 '18 at 19:37
$begingroup$
Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
$endgroup$
– rosy
May 29 '18 at 19:41
$begingroup$
Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
$endgroup$
– pcko1
May 29 '18 at 19:44
|
show 1 more comment
$begingroup$
I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?
When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?
$endgroup$
add a comment |
$begingroup$
Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.
Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.
$endgroup$
add a comment |
$begingroup$
you can look at the tsfresh repository on github. It extract time series features from the sensor logs
New contributor
Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f32280%2ftime-series-feature-extraction-from-raw-sensor-data-for-classification%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
For clarification: mean,max,min,std are not "time series features", they are data features in general.
Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.
You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.
$endgroup$
$begingroup$
Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
$endgroup$
– rosy
May 29 '18 at 19:05
$begingroup$
This was not in the original question. Anyway, you can always create afor-loopgoing through all samples in batches of 10s with 50% overlapping, then assign the batch in aDataFrameobject (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
$endgroup$
– pcko1
May 29 '18 at 19:17
$begingroup$
also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
$endgroup$
– pcko1
May 29 '18 at 19:37
$begingroup$
Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
$endgroup$
– rosy
May 29 '18 at 19:41
$begingroup$
Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
$endgroup$
– pcko1
May 29 '18 at 19:44
|
show 1 more comment
$begingroup$
For clarification: mean,max,min,std are not "time series features", they are data features in general.
Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.
You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.
$endgroup$
$begingroup$
Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
$endgroup$
– rosy
May 29 '18 at 19:05
$begingroup$
This was not in the original question. Anyway, you can always create afor-loopgoing through all samples in batches of 10s with 50% overlapping, then assign the batch in aDataFrameobject (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
$endgroup$
– pcko1
May 29 '18 at 19:17
$begingroup$
also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
$endgroup$
– pcko1
May 29 '18 at 19:37
$begingroup$
Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
$endgroup$
– rosy
May 29 '18 at 19:41
$begingroup$
Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
$endgroup$
– pcko1
May 29 '18 at 19:44
|
show 1 more comment
$begingroup$
For clarification: mean,max,min,std are not "time series features", they are data features in general.
Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.
You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.
$endgroup$
For clarification: mean,max,min,std are not "time series features", they are data features in general.
Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.
You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.
answered May 29 '18 at 8:38
pcko1pcko1
1,521317
1,521317
$begingroup$
Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
$endgroup$
– rosy
May 29 '18 at 19:05
$begingroup$
This was not in the original question. Anyway, you can always create afor-loopgoing through all samples in batches of 10s with 50% overlapping, then assign the batch in aDataFrameobject (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
$endgroup$
– pcko1
May 29 '18 at 19:17
$begingroup$
also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
$endgroup$
– pcko1
May 29 '18 at 19:37
$begingroup$
Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
$endgroup$
– rosy
May 29 '18 at 19:41
$begingroup$
Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
$endgroup$
– pcko1
May 29 '18 at 19:44
|
show 1 more comment
$begingroup$
Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
$endgroup$
– rosy
May 29 '18 at 19:05
$begingroup$
This was not in the original question. Anyway, you can always create afor-loopgoing through all samples in batches of 10s with 50% overlapping, then assign the batch in aDataFrameobject (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.
$endgroup$
– pcko1
May 29 '18 at 19:17
$begingroup$
also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
$endgroup$
– pcko1
May 29 '18 at 19:37
$begingroup$
Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
$endgroup$
– rosy
May 29 '18 at 19:41
$begingroup$
Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
$endgroup$
– pcko1
May 29 '18 at 19:44
$begingroup$
Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
$endgroup$
– rosy
May 29 '18 at 19:05
$begingroup$
Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label.
$endgroup$
– rosy
May 29 '18 at 19:05
$begingroup$
This was not in the original question. Anyway, you can always create a
for-loop going through all samples in batches of 10s with 50% overlapping, then assign the batch in a DataFrame object (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.$endgroup$
– pcko1
May 29 '18 at 19:17
$begingroup$
This was not in the original question. Anyway, you can always create a
for-loop going through all samples in batches of 10s with 50% overlapping, then assign the batch in a DataFrame object (inside the loop) and then call the appropriate methods for the batch. It will work smoothly.$endgroup$
– pcko1
May 29 '18 at 19:17
$begingroup$
also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
$endgroup$
– pcko1
May 29 '18 at 19:37
$begingroup$
also check this pandas.pydata.org/pandas-docs/stable/generated/… :)
$endgroup$
– pcko1
May 29 '18 at 19:37
$begingroup$
Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
$endgroup$
– rosy
May 29 '18 at 19:41
$begingroup$
Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window.
$endgroup$
– rosy
May 29 '18 at 19:41
$begingroup$
Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
$endgroup$
– pcko1
May 29 '18 at 19:44
$begingroup$
Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned.
$endgroup$
– pcko1
May 29 '18 at 19:44
|
show 1 more comment
$begingroup$
I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?
When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?
$endgroup$
add a comment |
$begingroup$
I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?
When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?
$endgroup$
add a comment |
$begingroup$
I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?
When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?
$endgroup$
I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?
When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?
answered May 28 '18 at 22:16
The LyristThe Lyrist
419112
419112
add a comment |
add a comment |
$begingroup$
Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.
Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.
$endgroup$
add a comment |
$begingroup$
Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.
Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.
$endgroup$
add a comment |
$begingroup$
Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.
Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.
$endgroup$
Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.
Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.
answered May 29 '18 at 8:24
Majid MortazaviMajid Mortazavi
1,7751224
1,7751224
add a comment |
add a comment |
$begingroup$
you can look at the tsfresh repository on github. It extract time series features from the sensor logs
New contributor
Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
you can look at the tsfresh repository on github. It extract time series features from the sensor logs
New contributor
Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
you can look at the tsfresh repository on github. It extract time series features from the sensor logs
New contributor
Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
you can look at the tsfresh repository on github. It extract time series features from the sensor logs
New contributor
Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 12 mins ago
Fahad Ali SarwarFahad Ali Sarwar
61
61
New contributor
Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Fahad Ali Sarwar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f32280%2ftime-series-feature-extraction-from-raw-sensor-data-for-classification%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Could you be more precise problem as what you problem is and what tools/ languages you are planning to use?
$endgroup$
– El Burro
May 29 '18 at 11:40
$begingroup$
I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label.
$endgroup$
– rosy
May 29 '18 at 19:07