How to find the most important attribute for each class

I have a dataset with 28 attributes and 7 class values. I want to know if its possible to find out the most important attribute(s) for deciding the class value, for each class.

For example an answer could be: Attribute 2 is most important for class 1, Attribute 6 is most important for class 2 etc. Or an even more informed answer could be: Attribute 2 being below 0.5 is most important for class 1, Attribute 6 being above 0.75 is most important for class 2 etc

My initial approach to this was to build a decision tree on the data and find the node that had the largest information gain/gain ratio for each class and that would be the most determining factor for that class. The problem with this is that the decision tree implementations I have found don't give the information gain/gain ratio for each node and as this is time bound I don't have the time to implement my own version. My current thought is to create multiple datasets which are all one class vs the rest and then perform attribute selection (eg. information gain) on them to find the most important attribute. Is this the right direction to go down or is their a better option?

edited Oct 16 '18 at 17:23

Vaalizaadeh

7,59562263

asked Oct 16 '18 at 12:46

Nate

2

$begingroup$
Covariance matrix for each attribute and the output.
$endgroup$
– Vaalizaadeh
Oct 16 '18 at 12:47

$begingroup$
@Media could you expand on this please? I thought a covariance matrix would only let me know the correlation between attributes. If you have a solution you could add it as an answer
$endgroup$
– Nate
Oct 16 '18 at 13:15

$begingroup$
Important in what way? That best split your data (given that you were talking about decision trees and splits)? There are ML algorithms that return feature importances, for ex. random forests.
$endgroup$
– user2974951
Oct 16 '18 at 13:37

$begingroup$
Important in the way that I want to know the attributes that have the biggest say in determining each class. I believe this means the best split of the data. As far as I know random forest doesn't return feature importance for each class, or am I wrong about this?
$endgroup$
– Nate
Oct 16 '18 at 13:52

$begingroup$
@Nate, "My current thought is to create multiple datasets which are all one class vs the rest and then perform attribute selection (eg. information gain) on them to find the most important attribute." ---> if you're happy with that definition of "most important" then yes. Your work is to define what you mean by "most important", but your approach (classification ONE vs ALL_OTHERS) is a good one.
$endgroup$
– f.g.
Oct 16 '18 at 17:33

|
show 1 more comment

I have a dataset with 28 attributes and 7 class values. I want to know if its possible to find out the most important attribute(s) for deciding the class value, for each class.

edited Oct 16 '18 at 17:23

Vaalizaadeh

7,59562263

asked Oct 16 '18 at 12:46

Nate

2

$begingroup$
Covariance matrix for each attribute and the output.
$endgroup$
– Vaalizaadeh
Oct 16 '18 at 12:47

$begingroup$
@Media could you expand on this please? I thought a covariance matrix would only let me know the correlation between attributes. If you have a solution you could add it as an answer
$endgroup$
– Nate
Oct 16 '18 at 13:15

$begingroup$
Important in what way? That best split your data (given that you were talking about decision trees and splits)? There are ML algorithms that return feature importances, for ex. random forests.
$endgroup$
– user2974951
Oct 16 '18 at 13:37

$begingroup$
Important in the way that I want to know the attributes that have the biggest say in determining each class. I believe this means the best split of the data. As far as I know random forest doesn't return feature importance for each class, or am I wrong about this?
$endgroup$
– Nate
Oct 16 '18 at 13:52

$begingroup$
@Nate, "My current thought is to create multiple datasets which are all one class vs the rest and then perform attribute selection (eg. information gain) on them to find the most important attribute." ---> if you're happy with that definition of "most important" then yes. Your work is to define what you mean by "most important", but your approach (classification ONE vs ALL_OTHERS) is a good one.
$endgroup$
– f.g.
Oct 16 '18 at 17:33

|
show 1 more comment

I have a dataset with 28 attributes and 7 class values. I want to know if its possible to find out the most important attribute(s) for deciding the class value, for each class.

edited Oct 16 '18 at 17:23

Vaalizaadeh

7,59562263

asked Oct 16 '18 at 12:46

Nate

I have a dataset with 28 attributes and 7 class values. I want to know if its possible to find out the most important attribute(s) for deciding the class value, for each class.

machine-learning neural-network deep-learning feature-selection multiclass-classification

edited Oct 16 '18 at 17:23

Vaalizaadeh

7,59562263

asked Oct 16 '18 at 12:46

Nate

edited Oct 16 '18 at 17:23

Vaalizaadeh

7,59562263

asked Oct 16 '18 at 12:46

Nate

edited Oct 16 '18 at 17:23

Vaalizaadeh

7,59562263

edited Oct 16 '18 at 17:23

Vaalizaadeh

7,59562263

edited Oct 16 '18 at 17:23

Vaalizaadeh

7,59562263

asked Oct 16 '18 at 12:46

Nate

asked Oct 16 '18 at 12:46

Nate

asked Oct 16 '18 at 12:46

Nate

2

$begingroup$
Covariance matrix for each attribute and the output.
$endgroup$
– Vaalizaadeh
Oct 16 '18 at 12:47

$begingroup$
@Media could you expand on this please? I thought a covariance matrix would only let me know the correlation between attributes. If you have a solution you could add it as an answer
$endgroup$
– Nate
Oct 16 '18 at 13:15

$begingroup$
Important in what way? That best split your data (given that you were talking about decision trees and splits)? There are ML algorithms that return feature importances, for ex. random forests.
$endgroup$
– user2974951
Oct 16 '18 at 13:37

$begingroup$
Important in the way that I want to know the attributes that have the biggest say in determining each class. I believe this means the best split of the data. As far as I know random forest doesn't return feature importance for each class, or am I wrong about this?
$endgroup$
– Nate
Oct 16 '18 at 13:52

$begingroup$
@Nate, "My current thought is to create multiple datasets which are all one class vs the rest and then perform attribute selection (eg. information gain) on them to find the most important attribute." ---> if you're happy with that definition of "most important" then yes. Your work is to define what you mean by "most important", but your approach (classification ONE vs ALL_OTHERS) is a good one.
$endgroup$
– f.g.
Oct 16 '18 at 17:33

|
show 1 more comment

2

$begingroup$
Covariance matrix for each attribute and the output.
$endgroup$
– Vaalizaadeh
Oct 16 '18 at 12:47

$begingroup$
@Media could you expand on this please? I thought a covariance matrix would only let me know the correlation between attributes. If you have a solution you could add it as an answer
$endgroup$
– Nate
Oct 16 '18 at 13:15

$begingroup$
Important in what way? That best split your data (given that you were talking about decision trees and splits)? There are ML algorithms that return feature importances, for ex. random forests.
$endgroup$
– user2974951
Oct 16 '18 at 13:37

$begingroup$
Important in the way that I want to know the attributes that have the biggest say in determining each class. I believe this means the best split of the data. As far as I know random forest doesn't return feature importance for each class, or am I wrong about this?
$endgroup$
– Nate
Oct 16 '18 at 13:52

$begingroup$
@Nate, "My current thought is to create multiple datasets which are all one class vs the rest and then perform attribute selection (eg. information gain) on them to find the most important attribute." ---> if you're happy with that definition of "most important" then yes. Your work is to define what you mean by "most important", but your approach (classification ONE vs ALL_OTHERS) is a good one.
$endgroup$
– f.g.
Oct 16 '18 at 17:33

Covariance matrix for each attribute and the output.

– Vaalizaadeh
Oct 16 '18 at 12:47

@Media could you expand on this please? I thought a covariance matrix would only let me know the correlation between attributes. If you have a solution you could add it as an answer

– Nate
Oct 16 '18 at 13:15

Important in what way? That best split your data (given that you were talking about decision trees and splits)? There are ML algorithms that return feature importances, for ex. random forests.

– user2974951
Oct 16 '18 at 13:37

Important in the way that I want to know the attributes that have the biggest say in determining each class. I believe this means the best split of the data. As far as I know random forest doesn't return feature importance for each class, or am I wrong about this?

– Nate
Oct 16 '18 at 13:52

@Nate, "My current thought is to create multiple datasets which are all one class vs the rest and then perform attribute selection (eg. information gain) on them to find the most important attribute." ---> if you're happy with that definition of "most important" then yes. Your work is to define what you mean by "most important", but your approach (classification ONE vs ALL_OTHERS) is a good one.

– f.g.
Oct 16 '18 at 17:33

|
show 1 more comment

4 Answers
4

active

oldest

votes

If you must split the dataset for each class then I suggest you try PCA. Principle Component Analysis is basically used for dimensionality reduction in the sense that it gives you the subset of attributes which represent the distribution of data best. You could use that for all classes and in that way get the attributes which affect the class data distribution the best.

answered Oct 16 '18 at 14:00

Saket Kumar Singh

1362

$begingroup$
Could also 'rank' the importance of the attributes by looking at each attributes coefficient?
$endgroup$
– Nate
Oct 16 '18 at 16:12

add a comment |

Or an even more informed answer could be: Attribute 2 being below 0.5
is most important for class 1, Attribute 6 being above 0.75 is most
important for class 2 etc

A way of doing this is by discretizing your continuous variables into histograms. Every bin of a histogram can thereafter be considered as a separate variable and its importance can be easily found using standard decision tree implementations such as the one in sklearn that provides a _feature_importances attribute. This will offer you insight on the important regions of each variable.

This is demonstrated in Figure 9 in this paper.

Welcome to the site, Nate!

answered Oct 16 '18 at 14:10

pcko1

1,681418

$begingroup$
Ah yes good idea, the only thing is that wouldn't the _feature_importances attribute tell me the importance of the feetures for the whole model, rather than the importance per class.
$endgroup$
– Nate
Oct 16 '18 at 14:23

$begingroup$
when a feature is important to classify a sample into class A, it means that it is also important for not classifying the same sample in class B. I mean that you shouldn't ask "which feature is important to classify the sample in class A and which one to classify it in class B" BUT "which feature is important to classify the sample correctly". This is feature importance.
$endgroup$
– pcko1
Oct 16 '18 at 14:35

add a comment |

Let's say that in this way. The easiest way to ascertain the relation among different features and the outputs is to use covariance matrix. You can even visualise the data for each class. Take a look at the following image.

enter image description here

Suppose that the vertical axis is the output and the horizontal axis is for one of the features. As you can see, having knowledge about the features informs us of the changes in the output. Now, consider the following illustration.

enter image description here

In this figure, you can see that considering this typical feature does not inform you of the changes in the output.

Another approach can be using PCA which finds the appropriate features itself. What it does is finding a linear combination of important features which are more relevant to the output.

answered Oct 16 '18 at 17:22

Vaalizaadeh

7,59562263

add a comment |

It is difficult to get the answer to "which one is the most important" for every class, normally, the distintion between is "between classes" not "for a specific class".

I use the feature importance from xgboost, this method allows to measure which feature participates in more trees of the boosted forest xgboost has. There is even, the possibility of plotting these importances resulting in very nice data to show.

Feature importance in XGBoost

answered 3 mins ago

Juan Esteban de la Calle

18310

New contributor

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f39759%2fhow-to-find-the-most-important-attribute-for-each-class%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

answered Oct 16 '18 at 14:00

Saket Kumar Singh

1362

$begingroup$
Could also 'rank' the importance of the attributes by looking at each attributes coefficient?
$endgroup$
– Nate
Oct 16 '18 at 16:12

add a comment |

answered Oct 16 '18 at 14:00

Saket Kumar Singh

1362

$begingroup$
Could also 'rank' the importance of the attributes by looking at each attributes coefficient?
$endgroup$
– Nate
Oct 16 '18 at 16:12

add a comment |

answered Oct 16 '18 at 14:00

Saket Kumar Singh

1362

answered Oct 16 '18 at 14:00

Saket Kumar Singh

1362

answered Oct 16 '18 at 14:00

Saket Kumar Singh

1362

answered Oct 16 '18 at 14:00

Saket Kumar Singh

1362

answered Oct 16 '18 at 14:00

Saket Kumar Singh

1362

$begingroup$
Could also 'rank' the importance of the attributes by looking at each attributes coefficient?
$endgroup$
– Nate
Oct 16 '18 at 16:12

add a comment |

$begingroup$
Could also 'rank' the importance of the attributes by looking at each attributes coefficient?
$endgroup$
– Nate
Oct 16 '18 at 16:12

Could also 'rank' the importance of the attributes by looking at each attributes coefficient?

– Nate
Oct 16 '18 at 16:12

add a comment |

Or an even more informed answer could be: Attribute 2 being below 0.5
is most important for class 1, Attribute 6 being above 0.75 is most
important for class 2 etc

This is demonstrated in Figure 9 in this paper.

Welcome to the site, Nate!

answered Oct 16 '18 at 14:10

pcko1

1,681418

$begingroup$
Ah yes good idea, the only thing is that wouldn't the _feature_importances attribute tell me the importance of the feetures for the whole model, rather than the importance per class.
$endgroup$
– Nate
Oct 16 '18 at 14:23

$begingroup$
when a feature is important to classify a sample into class A, it means that it is also important for not classifying the same sample in class B. I mean that you shouldn't ask "which feature is important to classify the sample in class A and which one to classify it in class B" BUT "which feature is important to classify the sample correctly". This is feature importance.
$endgroup$
– pcko1
Oct 16 '18 at 14:35

add a comment |

Or an even more informed answer could be: Attribute 2 being below 0.5
is most important for class 1, Attribute 6 being above 0.75 is most
important for class 2 etc

This is demonstrated in Figure 9 in this paper.

Welcome to the site, Nate!

answered Oct 16 '18 at 14:10

pcko1

1,681418

$begingroup$
Ah yes good idea, the only thing is that wouldn't the _feature_importances attribute tell me the importance of the feetures for the whole model, rather than the importance per class.
$endgroup$
– Nate
Oct 16 '18 at 14:23

$begingroup$
when a feature is important to classify a sample into class A, it means that it is also important for not classifying the same sample in class B. I mean that you shouldn't ask "which feature is important to classify the sample in class A and which one to classify it in class B" BUT "which feature is important to classify the sample correctly". This is feature importance.
$endgroup$
– pcko1
Oct 16 '18 at 14:35

add a comment |

Or an even more informed answer could be: Attribute 2 being below 0.5
is most important for class 1, Attribute 6 being above 0.75 is most
important for class 2 etc

This is demonstrated in Figure 9 in this paper.

Welcome to the site, Nate!

answered Oct 16 '18 at 14:10

pcko1

1,681418

Or an even more informed answer could be: Attribute 2 being below 0.5
is most important for class 1, Attribute 6 being above 0.75 is most
important for class 2 etc

This is demonstrated in Figure 9 in this paper.

Welcome to the site, Nate!

answered Oct 16 '18 at 14:10

pcko1

1,681418

answered Oct 16 '18 at 14:10

pcko1

1,681418

answered Oct 16 '18 at 14:10

pcko1

1,681418

answered Oct 16 '18 at 14:10

pcko1

1,681418

$begingroup$
Ah yes good idea, the only thing is that wouldn't the _feature_importances attribute tell me the importance of the feetures for the whole model, rather than the importance per class.
$endgroup$
– Nate
Oct 16 '18 at 14:23

$begingroup$
when a feature is important to classify a sample into class A, it means that it is also important for not classifying the same sample in class B. I mean that you shouldn't ask "which feature is important to classify the sample in class A and which one to classify it in class B" BUT "which feature is important to classify the sample correctly". This is feature importance.
$endgroup$
– pcko1
Oct 16 '18 at 14:35

add a comment |

$begingroup$
Ah yes good idea, the only thing is that wouldn't the _feature_importances attribute tell me the importance of the feetures for the whole model, rather than the importance per class.
$endgroup$
– Nate
Oct 16 '18 at 14:23

$begingroup$
when a feature is important to classify a sample into class A, it means that it is also important for not classifying the same sample in class B. I mean that you shouldn't ask "which feature is important to classify the sample in class A and which one to classify it in class B" BUT "which feature is important to classify the sample correctly". This is feature importance.
$endgroup$
– pcko1
Oct 16 '18 at 14:35

Ah yes good idea, the only thing is that wouldn't the _feature_importances attribute tell me the importance of the feetures for the whole model, rather than the importance per class.

– Nate
Oct 16 '18 at 14:23

when a feature is important to classify a sample into class A, it means that it is also important for not classifying the same sample in class B. I mean that you shouldn't ask "which feature is important to classify the sample in class A and which one to classify it in class B" BUT "which feature is important to classify the sample correctly". This is feature importance.

– pcko1
Oct 16 '18 at 14:35

add a comment |

enter image description here

In this figure, you can see that considering this typical feature does not inform you of the changes in the output.

Another approach can be using PCA which finds the appropriate features itself. What it does is finding a linear combination of important features which are more relevant to the output.

answered Oct 16 '18 at 17:22

Vaalizaadeh

7,59562263

add a comment |

enter image description here

In this figure, you can see that considering this typical feature does not inform you of the changes in the output.

Another approach can be using PCA which finds the appropriate features itself. What it does is finding a linear combination of important features which are more relevant to the output.

answered Oct 16 '18 at 17:22

Vaalizaadeh

7,59562263

add a comment |

enter image description here

In this figure, you can see that considering this typical feature does not inform you of the changes in the output.

Another approach can be using PCA which finds the appropriate features itself. What it does is finding a linear combination of important features which are more relevant to the output.

answered Oct 16 '18 at 17:22

Vaalizaadeh

7,59562263

enter image description here

In this figure, you can see that considering this typical feature does not inform you of the changes in the output.

Another approach can be using PCA which finds the appropriate features itself. What it does is finding a linear combination of important features which are more relevant to the output.

answered Oct 16 '18 at 17:22

Vaalizaadeh

7,59562263

answered Oct 16 '18 at 17:22

Vaalizaadeh

7,59562263

answered Oct 16 '18 at 17:22

Vaalizaadeh

7,59562263

answered Oct 16 '18 at 17:22

Vaalizaadeh

7,59562263

add a comment |

It is difficult to get the answer to "which one is the most important" for every class, normally, the distintion between is "between classes" not "for a specific class".

Feature importance in XGBoost

answered 3 mins ago

Juan Esteban de la Calle

18310

New contributor

add a comment |

It is difficult to get the answer to "which one is the most important" for every class, normally, the distintion between is "between classes" not "for a specific class".

Feature importance in XGBoost

answered 3 mins ago

Juan Esteban de la Calle

18310

New contributor

add a comment |

It is difficult to get the answer to "which one is the most important" for every class, normally, the distintion between is "between classes" not "for a specific class".

Feature importance in XGBoost

answered 3 mins ago

Juan Esteban de la Calle

18310

New contributor

It is difficult to get the answer to "which one is the most important" for every class, normally, the distintion between is "between classes" not "for a specific class".

Feature importance in XGBoost

answered 3 mins ago

Juan Esteban de la Calle

18310

New contributor

answered 3 mins ago

Juan Esteban de la Calle

18310

New contributor

answered 3 mins ago

Juan Esteban de la Calle

18310

answered 3 mins ago

Juan Esteban de la Calle

18310

New contributor

Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

PirAS2JfidWGgee7OVMYybFY

搜尋此網誌

Gfyuki