Dimension reduction for data with categorical features
$begingroup$
I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).
Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?
data-cleaning categorical-data dimensionality-reduction
New contributor
$endgroup$
add a comment |
$begingroup$
I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).
Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?
data-cleaning categorical-data dimensionality-reduction
New contributor
$endgroup$
add a comment |
$begingroup$
I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).
Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?
data-cleaning categorical-data dimensionality-reduction
New contributor
$endgroup$
I am trying to reduce the dimensionality of the dataset. My data contains a large number of categorical features which are creating problems with the dimensionality reduction techniques I am using (such as calculating variance of variable).
Do I need to convert each and every categorical variable to dummy variables before reducing the dimensions of dataset or is there another way around?
data-cleaning categorical-data dimensionality-reduction
data-cleaning categorical-data dimensionality-reduction
New contributor
New contributor
edited 4 hours ago
Puneet Shekhawat
New contributor
asked 16 hours ago
Puneet ShekhawatPuneet Shekhawat
12
12
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.
There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).
$endgroup$
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
4 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,category_encoders
inPython
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
$endgroup$
– bradS
3 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Puneet Shekhawat is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44172%2fdimension-reduction-for-data-with-categorical-features%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.
There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).
$endgroup$
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
4 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,category_encoders
inPython
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
$endgroup$
– bradS
3 hours ago
add a comment |
$begingroup$
If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.
There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).
$endgroup$
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
4 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,category_encoders
inPython
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
$endgroup$
– bradS
3 hours ago
add a comment |
$begingroup$
If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.
There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).
$endgroup$
If you're interested in applying dimensionality reduction techniques which only operate on numeric features, then you will need to convert your categorical features to a numeric format.
There are multiple ways of doing this - a quick internet search will point you in the right direction - but it might be worth your while to investigate target encoding (also called mean encoding).
answered 9 hours ago
bradSbradS
55312
55312
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
4 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,category_encoders
inPython
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
$endgroup$
– bradS
3 hours ago
add a comment |
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
4 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,category_encoders
inPython
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.
$endgroup$
– bradS
3 hours ago
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
4 hours ago
$begingroup$
I know about dummy encoding. it just that the data set have huge amount of feature which are not int or float, so i was asking if there is a way to convert these all categorical data in one go or i have to use one hot encoding for each and every feature in one at a time or to preprocess data, in the format it is available, to reduce dimension so that their are less categorical features to worry about.(pardon me if it is a stupid question but I am new to machine learning and searching for easy way around, i can't find solution to this using google)
$endgroup$
– Puneet Shekhawat
4 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,
category_encoders
in Python
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.$endgroup$
– bradS
3 hours ago
$begingroup$
Ah - I misunderstood your question. It depends on the encoder / toolkit you use. For instance,
category_encoders
in Python
allows you to specify which columns to perform the encoding on, and then performs the encoding all in one go.$endgroup$
– bradS
3 hours ago
add a comment |
Puneet Shekhawat is a new contributor. Be nice, and check out our Code of Conduct.
Puneet Shekhawat is a new contributor. Be nice, and check out our Code of Conduct.
Puneet Shekhawat is a new contributor. Be nice, and check out our Code of Conduct.
Puneet Shekhawat is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44172%2fdimension-reduction-for-data-with-categorical-features%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown