What's the best classification model for this recommendation engine?

I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.

My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:

Dataframe

0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).

My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.

I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?

What classification model would you recommend me to use? Ideally, my result should be something like:

[0.95, 0.1, 0.54, 0.3, 0.87...]

Cheers!

asked May 21 '18 at 14:34

grpaiva

bumped to the homepage by Community♦ 4 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52

$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03

$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50

$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10

add a comment |

I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.

My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:

Dataframe

0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).

My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.

I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?

What classification model would you recommend me to use? Ideally, my result should be something like:

[0.95, 0.1, 0.54, 0.3, 0.87...]

Cheers!

asked May 21 '18 at 14:34

grpaiva

bumped to the homepage by Community♦ 4 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52

$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03

$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50

$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10

add a comment |

I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.

My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:

Dataframe

0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).

My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.

I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?

What classification model would you recommend me to use? Ideally, my result should be something like:

[0.95, 0.1, 0.54, 0.3, 0.87...]

Cheers!

asked May 21 '18 at 14:34

grpaiva

I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.

My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:

Dataframe

0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).

My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.

I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?

What classification model would you recommend me to use? Ideally, my result should be something like:

[0.95, 0.1, 0.54, 0.3, 0.87...]

Cheers!

python recommender-system multiclass-classification

asked May 21 '18 at 14:34

grpaiva

asked May 21 '18 at 14:34

grpaiva

asked May 21 '18 at 14:34

grpaiva

asked May 21 '18 at 14:34

grpaiva

asked May 21 '18 at 14:34

grpaiva

bumped to the homepage by Community♦ 4 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 4 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

1

$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52

$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03

$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50

$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10

add a comment |

1

$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52

$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03

$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50

$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10

Formulate the problem as a Collaborative filtering task.

– Fadi Bakoura
May 21 '18 at 14:52

Thanks @FadiBakoura, will research on this and let you know.

– grpaiva
May 21 '18 at 18:03

Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...

– Intruso
Aug 20 '18 at 13:50

Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.

– davmor
Nov 18 '18 at 11:10

add a comment |

1 Answer
1

active

oldest

votes

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits

digits = load_digits(2)

from sklearn.linear_model import LogisticRegression

preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)

print([i[1] for i in preds])

answered May 21 '18 at 14:47

marco_gorelli

4819

$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02

$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31932%2fwhats-the-best-classification-model-for-this-recommendation-engine%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits

digits = load_digits(2)

from sklearn.linear_model import LogisticRegression

preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)

print([i[1] for i in preds])

answered May 21 '18 at 14:47

marco_gorelli

4819

$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02

$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16

add a comment |

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits

digits = load_digits(2)

from sklearn.linear_model import LogisticRegression

preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)

print([i[1] for i in preds])

answered May 21 '18 at 14:47

marco_gorelli

4819

$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02

$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16

add a comment |

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits

digits = load_digits(2)

from sklearn.linear_model import LogisticRegression

preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)

print([i[1] for i in preds])

answered May 21 '18 at 14:47

marco_gorelli

4819

Without more information about your dataset, it's impossible to recommend one particular classifier over another.

If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba method.

Here's an example:

from sklearn.datasets import load_digits

digits = load_digits(2)

from sklearn.linear_model import LogisticRegression

preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)

print([i[1] for i in preds])

answered May 21 '18 at 14:47

marco_gorelli

4819

answered May 21 '18 at 14:47

marco_gorelli

4819

answered May 21 '18 at 14:47

marco_gorelli

4819

answered May 21 '18 at 14:47

marco_gorelli

4819

$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02

$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16

add a comment |

$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02

$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16

Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error: ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]

– grpaiva
May 21 '18 at 18:02

The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by shape gives you the number of rows, so (360, 64) and (360,) is exactly what we'd expect.

– marco_gorelli
May 22 '18 at 8:16

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfyuki