Process mining with ML
$begingroup$
I have a little more general question. My dataset consists of N sequences of events. Example of one sequence could be [A,B,C,D,X,Y] and another [A,B,Z], where letters represent different events. The sequences are at most 80 steps long.
The idea is to predict next letter or next step from known previous events. For very simple example maybe after A will always come B. Next step would be measuring time of each event and the ultimate goal is to predict how long until process reaches specific event.
I tried N-gram, MLP neural network and lastly LSTM network, which had around 80% accuracy.
That would not be bad if the events were balanced in the dataset. To account for that I used weighted loss function in training of the LSTM and then the overall accuracy is around 66%. However the less frequent classes have much much higher accuracy (still not perfect, but higher). How can I create model that will have the best of both? That will learn the less frequent AND the most frequent at the same time.
Also I have read that tree base methods perform very good on unbalanced dataset. However all examples always consider one big timeseries data. My data are many short timeseries. Is it possible to train RandomForest on such data? How?
If you know about different algorithm/method that could be applied to such data please post it :)
Thank you.
machine-learning lstm sequential-pattern-mining
$endgroup$
bumped to the homepage by Community♦ 7 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I have a little more general question. My dataset consists of N sequences of events. Example of one sequence could be [A,B,C,D,X,Y] and another [A,B,Z], where letters represent different events. The sequences are at most 80 steps long.
The idea is to predict next letter or next step from known previous events. For very simple example maybe after A will always come B. Next step would be measuring time of each event and the ultimate goal is to predict how long until process reaches specific event.
I tried N-gram, MLP neural network and lastly LSTM network, which had around 80% accuracy.
That would not be bad if the events were balanced in the dataset. To account for that I used weighted loss function in training of the LSTM and then the overall accuracy is around 66%. However the less frequent classes have much much higher accuracy (still not perfect, but higher). How can I create model that will have the best of both? That will learn the less frequent AND the most frequent at the same time.
Also I have read that tree base methods perform very good on unbalanced dataset. However all examples always consider one big timeseries data. My data are many short timeseries. Is it possible to train RandomForest on such data? How?
If you know about different algorithm/method that could be applied to such data please post it :)
Thank you.
machine-learning lstm sequential-pattern-mining
$endgroup$
bumped to the homepage by Community♦ 7 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I have a little more general question. My dataset consists of N sequences of events. Example of one sequence could be [A,B,C,D,X,Y] and another [A,B,Z], where letters represent different events. The sequences are at most 80 steps long.
The idea is to predict next letter or next step from known previous events. For very simple example maybe after A will always come B. Next step would be measuring time of each event and the ultimate goal is to predict how long until process reaches specific event.
I tried N-gram, MLP neural network and lastly LSTM network, which had around 80% accuracy.
That would not be bad if the events were balanced in the dataset. To account for that I used weighted loss function in training of the LSTM and then the overall accuracy is around 66%. However the less frequent classes have much much higher accuracy (still not perfect, but higher). How can I create model that will have the best of both? That will learn the less frequent AND the most frequent at the same time.
Also I have read that tree base methods perform very good on unbalanced dataset. However all examples always consider one big timeseries data. My data are many short timeseries. Is it possible to train RandomForest on such data? How?
If you know about different algorithm/method that could be applied to such data please post it :)
Thank you.
machine-learning lstm sequential-pattern-mining
$endgroup$
I have a little more general question. My dataset consists of N sequences of events. Example of one sequence could be [A,B,C,D,X,Y] and another [A,B,Z], where letters represent different events. The sequences are at most 80 steps long.
The idea is to predict next letter or next step from known previous events. For very simple example maybe after A will always come B. Next step would be measuring time of each event and the ultimate goal is to predict how long until process reaches specific event.
I tried N-gram, MLP neural network and lastly LSTM network, which had around 80% accuracy.
That would not be bad if the events were balanced in the dataset. To account for that I used weighted loss function in training of the LSTM and then the overall accuracy is around 66%. However the less frequent classes have much much higher accuracy (still not perfect, but higher). How can I create model that will have the best of both? That will learn the less frequent AND the most frequent at the same time.
Also I have read that tree base methods perform very good on unbalanced dataset. However all examples always consider one big timeseries data. My data are many short timeseries. Is it possible to train RandomForest on such data? How?
If you know about different algorithm/method that could be applied to such data please post it :)
Thank you.
machine-learning lstm sequential-pattern-mining
machine-learning lstm sequential-pattern-mining
asked Aug 15 '18 at 20:46
Matúš KošíkMatúš Košík
1
1
bumped to the homepage by Community♦ 7 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 7 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
I suspect that the problem has more to do with your data than with your algorithms. My recommendation is to spend some time studying your data and ensuring that it is a robust representation of the kinds of problems you're expecting to solve. If possible, come up with a way to generate extra data. Given the fact that you already have many permutations, you could perhaps write a script to create additional permutations by modifying existing samples with rules that you know.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36992%2fprocess-mining-with-ml%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I suspect that the problem has more to do with your data than with your algorithms. My recommendation is to spend some time studying your data and ensuring that it is a robust representation of the kinds of problems you're expecting to solve. If possible, come up with a way to generate extra data. Given the fact that you already have many permutations, you could perhaps write a script to create additional permutations by modifying existing samples with rules that you know.
$endgroup$
add a comment |
$begingroup$
I suspect that the problem has more to do with your data than with your algorithms. My recommendation is to spend some time studying your data and ensuring that it is a robust representation of the kinds of problems you're expecting to solve. If possible, come up with a way to generate extra data. Given the fact that you already have many permutations, you could perhaps write a script to create additional permutations by modifying existing samples with rules that you know.
$endgroup$
add a comment |
$begingroup$
I suspect that the problem has more to do with your data than with your algorithms. My recommendation is to spend some time studying your data and ensuring that it is a robust representation of the kinds of problems you're expecting to solve. If possible, come up with a way to generate extra data. Given the fact that you already have many permutations, you could perhaps write a script to create additional permutations by modifying existing samples with rules that you know.
$endgroup$
I suspect that the problem has more to do with your data than with your algorithms. My recommendation is to spend some time studying your data and ensuring that it is a robust representation of the kinds of problems you're expecting to solve. If possible, come up with a way to generate extra data. Given the fact that you already have many permutations, you could perhaps write a script to create additional permutations by modifying existing samples with rules that you know.
answered Aug 15 '18 at 21:22
David ShapiroDavid Shapiro
111
111
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36992%2fprocess-mining-with-ml%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown