Process mining with ML

I have a little more general question. My dataset consists of N sequences of events. Example of one sequence could be [A,B,C,D,X,Y] and another [A,B,Z], where letters represent different events. The sequences are at most 80 steps long.

The idea is to predict next letter or next step from known previous events. For very simple example maybe after A will always come B. Next step would be measuring time of each event and the ultimate goal is to predict how long until process reaches specific event.

I tried N-gram, MLP neural network and lastly LSTM network, which had around 80% accuracy.

That would not be bad if the events were balanced in the dataset. To account for that I used weighted loss function in training of the LSTM and then the overall accuracy is around 66%. However the less frequent classes have much much higher accuracy (still not perfect, but higher). How can I create model that will have the best of both? That will learn the less frequent AND the most frequent at the same time.

Also I have read that tree base methods perform very good on unbalanced dataset. However all examples always consider one big timeseries data. My data are many short timeseries. Is it possible to train RandomForest on such data? How?

If you know about different algorithm/method that could be applied to such data please post it :)

Thank you.

asked Aug 15 '18 at 20:46

Matúš Košík

bumped to the homepage by Community♦ 7 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

I tried N-gram, MLP neural network and lastly LSTM network, which had around 80% accuracy.

If you know about different algorithm/method that could be applied to such data please post it :)

Thank you.

asked Aug 15 '18 at 20:46

Matúš Košík

bumped to the homepage by Community♦ 7 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

I tried N-gram, MLP neural network and lastly LSTM network, which had around 80% accuracy.

If you know about different algorithm/method that could be applied to such data please post it :)

Thank you.

asked Aug 15 '18 at 20:46

Matúš Košík

I tried N-gram, MLP neural network and lastly LSTM network, which had around 80% accuracy.

If you know about different algorithm/method that could be applied to such data please post it :)

Thank you.

machine-learning lstm sequential-pattern-mining

asked Aug 15 '18 at 20:46

Matúš Košík

asked Aug 15 '18 at 20:46

Matúš Košík

asked Aug 15 '18 at 20:46

Matúš Košík

asked Aug 15 '18 at 20:46

Matúš Košík

asked Aug 15 '18 at 20:46

Matúš Košík

bumped to the homepage by Community♦ 7 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 7 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

1 Answer
1

active

oldest

votes

I suspect that the problem has more to do with your data than with your algorithms. My recommendation is to spend some time studying your data and ensuring that it is a robust representation of the kinds of problems you're expecting to solve. If possible, come up with a way to generate extra data. Given the fact that you already have many permutations, you could perhaps write a script to create additional permutations by modifying existing samples with rules that you know.

answered Aug 15 '18 at 21:22

David Shapiro

111

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f36992%2fprocess-mining-with-ml%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Aug 15 '18 at 21:22

David Shapiro

111

add a comment |

answered Aug 15 '18 at 21:22

David Shapiro

111

add a comment |

answered Aug 15 '18 at 21:22

David Shapiro

111

answered Aug 15 '18 at 21:22

David Shapiro

111

answered Aug 15 '18 at 21:22

David Shapiro

111

answered Aug 15 '18 at 21:22

David Shapiro

111

answered Aug 15 '18 at 21:22

David Shapiro

111

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfyuki