Why is my training accuracy decreasing higher degrees of polynomial features?












2












$begingroup$


I am new to Machine Learning and started solving the Titanic Survivor problem on Kaggle.



While solving the problem using Logistic Regression I used various models having polynomial features with degree $2,3,4,5,6$ . Theoretically the accuracy on training set should increase with degree however it started decreasing post degree $2$ . The graph is as per below enter image description here










share|improve this question









New contributor




Apoorv Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Welcome to the site! "Theoretically the accuracy on training set should increase with degree" - I disagree with this premise. Can you provide a citation or your rationale? I don't think this is a reasonable statement.
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    I read this in the Andrew NG course and logically speaking wouldn't the boundary fit more effectively if the degree of polynomial features increase ?
    $endgroup$
    – Apoorv Jain
    10 hours ago










  • $begingroup$
    No, not necessarily. The most common use of polynomials is when you have data that shows a correlation but isn't linear (so like an exponential curve, a parabola, etc). You can't just randomly try new polynomials, you should be trying a particular polynomial because it's better suited to the general layout of your data.
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    Could you please suggest a reading for this type of feature engineering .
    $endgroup$
    – Apoorv Jain
    10 hours ago
















2












$begingroup$


I am new to Machine Learning and started solving the Titanic Survivor problem on Kaggle.



While solving the problem using Logistic Regression I used various models having polynomial features with degree $2,3,4,5,6$ . Theoretically the accuracy on training set should increase with degree however it started decreasing post degree $2$ . The graph is as per below enter image description here










share|improve this question









New contributor




Apoorv Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Welcome to the site! "Theoretically the accuracy on training set should increase with degree" - I disagree with this premise. Can you provide a citation or your rationale? I don't think this is a reasonable statement.
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    I read this in the Andrew NG course and logically speaking wouldn't the boundary fit more effectively if the degree of polynomial features increase ?
    $endgroup$
    – Apoorv Jain
    10 hours ago










  • $begingroup$
    No, not necessarily. The most common use of polynomials is when you have data that shows a correlation but isn't linear (so like an exponential curve, a parabola, etc). You can't just randomly try new polynomials, you should be trying a particular polynomial because it's better suited to the general layout of your data.
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    Could you please suggest a reading for this type of feature engineering .
    $endgroup$
    – Apoorv Jain
    10 hours ago














2












2








2





$begingroup$


I am new to Machine Learning and started solving the Titanic Survivor problem on Kaggle.



While solving the problem using Logistic Regression I used various models having polynomial features with degree $2,3,4,5,6$ . Theoretically the accuracy on training set should increase with degree however it started decreasing post degree $2$ . The graph is as per below enter image description here










share|improve this question









New contributor




Apoorv Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I am new to Machine Learning and started solving the Titanic Survivor problem on Kaggle.



While solving the problem using Logistic Regression I used various models having polynomial features with degree $2,3,4,5,6$ . Theoretically the accuracy on training set should increase with degree however it started decreasing post degree $2$ . The graph is as per below enter image description here







scikit-learn logistic-regression accuracy classifier






share|improve this question









New contributor




Apoorv Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Apoorv Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 10 mins ago









Siong Thye Goh

1,122418




1,122418






New contributor




Apoorv Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 10 hours ago









Apoorv JainApoorv Jain

1112




1112




New contributor




Apoorv Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Apoorv Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Apoorv Jain is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    Welcome to the site! "Theoretically the accuracy on training set should increase with degree" - I disagree with this premise. Can you provide a citation or your rationale? I don't think this is a reasonable statement.
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    I read this in the Andrew NG course and logically speaking wouldn't the boundary fit more effectively if the degree of polynomial features increase ?
    $endgroup$
    – Apoorv Jain
    10 hours ago










  • $begingroup$
    No, not necessarily. The most common use of polynomials is when you have data that shows a correlation but isn't linear (so like an exponential curve, a parabola, etc). You can't just randomly try new polynomials, you should be trying a particular polynomial because it's better suited to the general layout of your data.
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    Could you please suggest a reading for this type of feature engineering .
    $endgroup$
    – Apoorv Jain
    10 hours ago


















  • $begingroup$
    Welcome to the site! "Theoretically the accuracy on training set should increase with degree" - I disagree with this premise. Can you provide a citation or your rationale? I don't think this is a reasonable statement.
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    I read this in the Andrew NG course and logically speaking wouldn't the boundary fit more effectively if the degree of polynomial features increase ?
    $endgroup$
    – Apoorv Jain
    10 hours ago










  • $begingroup$
    No, not necessarily. The most common use of polynomials is when you have data that shows a correlation but isn't linear (so like an exponential curve, a parabola, etc). You can't just randomly try new polynomials, you should be trying a particular polynomial because it's better suited to the general layout of your data.
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    Could you please suggest a reading for this type of feature engineering .
    $endgroup$
    – Apoorv Jain
    10 hours ago
















$begingroup$
Welcome to the site! "Theoretically the accuracy on training set should increase with degree" - I disagree with this premise. Can you provide a citation or your rationale? I don't think this is a reasonable statement.
$endgroup$
– I_Play_With_Data
10 hours ago




$begingroup$
Welcome to the site! "Theoretically the accuracy on training set should increase with degree" - I disagree with this premise. Can you provide a citation or your rationale? I don't think this is a reasonable statement.
$endgroup$
– I_Play_With_Data
10 hours ago












$begingroup$
I read this in the Andrew NG course and logically speaking wouldn't the boundary fit more effectively if the degree of polynomial features increase ?
$endgroup$
– Apoorv Jain
10 hours ago




$begingroup$
I read this in the Andrew NG course and logically speaking wouldn't the boundary fit more effectively if the degree of polynomial features increase ?
$endgroup$
– Apoorv Jain
10 hours ago












$begingroup$
No, not necessarily. The most common use of polynomials is when you have data that shows a correlation but isn't linear (so like an exponential curve, a parabola, etc). You can't just randomly try new polynomials, you should be trying a particular polynomial because it's better suited to the general layout of your data.
$endgroup$
– I_Play_With_Data
10 hours ago




$begingroup$
No, not necessarily. The most common use of polynomials is when you have data that shows a correlation but isn't linear (so like an exponential curve, a parabola, etc). You can't just randomly try new polynomials, you should be trying a particular polynomial because it's better suited to the general layout of your data.
$endgroup$
– I_Play_With_Data
10 hours ago












$begingroup$
Could you please suggest a reading for this type of feature engineering .
$endgroup$
– Apoorv Jain
10 hours ago




$begingroup$
Could you please suggest a reading for this type of feature engineering .
$endgroup$
– Apoorv Jain
10 hours ago










1 Answer
1






active

oldest

votes


















2












$begingroup$

I disagree with the assertion of, "Theoretically the accuracy on training set should increase with degree". The goal of polynomial regression is not to randomly try new polynomials. The goal is to use a polynomial that better fits your data because the correlation is not linear.



Let's think about the end result of linear regression - it usually something like y = mx + b



If you show that to a data scientist, they're going to tell you it's linear regression. You show that to a math student and they will tell you its the formula for a straight line. Either way, it's just a formula for a graph. But, note that this is for a straight line and not all data is linear. So, knowing that you're just coming up with a formula, you should think about polynomial regression in the same way - what graph am I trying to draw?



If you use a scatter plot and you are seeing a correlation but that relationship is exponential, then you should use the corresponding polynomial; same goes for all of the other variations. There is no logical explanation to use a polynomial that will not draw a graph that will closely align with your data correlation.






share|improve this answer









$endgroup$













  • $begingroup$
    Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy
    $endgroup$
    – Apoorv Jain
    10 hours ago






  • 1




    $begingroup$
    @ApoorvJain Dont start with the formula, start with your data, start with a scatterplot. What does that plot look like? What polynomial would you use to draw a similar graph? When you start thinking in those terms, then you start to think like a data scientist :-)
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy then we have a boundary comprising x,y,xy,x^2,y^2 hence the boundary represented with the above features would be of the form ax+by+cxy+dx^2+ey^2 hence we could anyway construct the same boundary as we could have with single degree features . Since loss function would take every possible boundary hence shouldn't our error with degree 2 <= degree 1
    $endgroup$
    – Apoorv Jain
    10 hours ago











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Apoorv Jain is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46052%2fwhy-is-my-training-accuracy-decreasing-higher-degrees-of-polynomial-features%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

I disagree with the assertion of, "Theoretically the accuracy on training set should increase with degree". The goal of polynomial regression is not to randomly try new polynomials. The goal is to use a polynomial that better fits your data because the correlation is not linear.



Let's think about the end result of linear regression - it usually something like y = mx + b



If you show that to a data scientist, they're going to tell you it's linear regression. You show that to a math student and they will tell you its the formula for a straight line. Either way, it's just a formula for a graph. But, note that this is for a straight line and not all data is linear. So, knowing that you're just coming up with a formula, you should think about polynomial regression in the same way - what graph am I trying to draw?



If you use a scatter plot and you are seeing a correlation but that relationship is exponential, then you should use the corresponding polynomial; same goes for all of the other variations. There is no logical explanation to use a polynomial that will not draw a graph that will closely align with your data correlation.






share|improve this answer









$endgroup$













  • $begingroup$
    Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy
    $endgroup$
    – Apoorv Jain
    10 hours ago






  • 1




    $begingroup$
    @ApoorvJain Dont start with the formula, start with your data, start with a scatterplot. What does that plot look like? What polynomial would you use to draw a similar graph? When you start thinking in those terms, then you start to think like a data scientist :-)
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy then we have a boundary comprising x,y,xy,x^2,y^2 hence the boundary represented with the above features would be of the form ax+by+cxy+dx^2+ey^2 hence we could anyway construct the same boundary as we could have with single degree features . Since loss function would take every possible boundary hence shouldn't our error with degree 2 <= degree 1
    $endgroup$
    – Apoorv Jain
    10 hours ago
















2












$begingroup$

I disagree with the assertion of, "Theoretically the accuracy on training set should increase with degree". The goal of polynomial regression is not to randomly try new polynomials. The goal is to use a polynomial that better fits your data because the correlation is not linear.



Let's think about the end result of linear regression - it usually something like y = mx + b



If you show that to a data scientist, they're going to tell you it's linear regression. You show that to a math student and they will tell you its the formula for a straight line. Either way, it's just a formula for a graph. But, note that this is for a straight line and not all data is linear. So, knowing that you're just coming up with a formula, you should think about polynomial regression in the same way - what graph am I trying to draw?



If you use a scatter plot and you are seeing a correlation but that relationship is exponential, then you should use the corresponding polynomial; same goes for all of the other variations. There is no logical explanation to use a polynomial that will not draw a graph that will closely align with your data correlation.






share|improve this answer









$endgroup$













  • $begingroup$
    Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy
    $endgroup$
    – Apoorv Jain
    10 hours ago






  • 1




    $begingroup$
    @ApoorvJain Dont start with the formula, start with your data, start with a scatterplot. What does that plot look like? What polynomial would you use to draw a similar graph? When you start thinking in those terms, then you start to think like a data scientist :-)
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy then we have a boundary comprising x,y,xy,x^2,y^2 hence the boundary represented with the above features would be of the form ax+by+cxy+dx^2+ey^2 hence we could anyway construct the same boundary as we could have with single degree features . Since loss function would take every possible boundary hence shouldn't our error with degree 2 <= degree 1
    $endgroup$
    – Apoorv Jain
    10 hours ago














2












2








2





$begingroup$

I disagree with the assertion of, "Theoretically the accuracy on training set should increase with degree". The goal of polynomial regression is not to randomly try new polynomials. The goal is to use a polynomial that better fits your data because the correlation is not linear.



Let's think about the end result of linear regression - it usually something like y = mx + b



If you show that to a data scientist, they're going to tell you it's linear regression. You show that to a math student and they will tell you its the formula for a straight line. Either way, it's just a formula for a graph. But, note that this is for a straight line and not all data is linear. So, knowing that you're just coming up with a formula, you should think about polynomial regression in the same way - what graph am I trying to draw?



If you use a scatter plot and you are seeing a correlation but that relationship is exponential, then you should use the corresponding polynomial; same goes for all of the other variations. There is no logical explanation to use a polynomial that will not draw a graph that will closely align with your data correlation.






share|improve this answer









$endgroup$



I disagree with the assertion of, "Theoretically the accuracy on training set should increase with degree". The goal of polynomial regression is not to randomly try new polynomials. The goal is to use a polynomial that better fits your data because the correlation is not linear.



Let's think about the end result of linear regression - it usually something like y = mx + b



If you show that to a data scientist, they're going to tell you it's linear regression. You show that to a math student and they will tell you its the formula for a straight line. Either way, it's just a formula for a graph. But, note that this is for a straight line and not all data is linear. So, knowing that you're just coming up with a formula, you should think about polynomial regression in the same way - what graph am I trying to draw?



If you use a scatter plot and you are seeing a correlation but that relationship is exponential, then you should use the corresponding polynomial; same goes for all of the other variations. There is no logical explanation to use a polynomial that will not draw a graph that will closely align with your data correlation.







share|improve this answer












share|improve this answer



share|improve this answer










answered 10 hours ago









I_Play_With_DataI_Play_With_Data

979419




979419












  • $begingroup$
    Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy
    $endgroup$
    – Apoorv Jain
    10 hours ago






  • 1




    $begingroup$
    @ApoorvJain Dont start with the formula, start with your data, start with a scatterplot. What does that plot look like? What polynomial would you use to draw a similar graph? When you start thinking in those terms, then you start to think like a data scientist :-)
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy then we have a boundary comprising x,y,xy,x^2,y^2 hence the boundary represented with the above features would be of the form ax+by+cxy+dx^2+ey^2 hence we could anyway construct the same boundary as we could have with single degree features . Since loss function would take every possible boundary hence shouldn't our error with degree 2 <= degree 1
    $endgroup$
    – Apoorv Jain
    10 hours ago


















  • $begingroup$
    Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy
    $endgroup$
    – Apoorv Jain
    10 hours ago






  • 1




    $begingroup$
    @ApoorvJain Dont start with the formula, start with your data, start with a scatterplot. What does that plot look like? What polynomial would you use to draw a similar graph? When you start thinking in those terms, then you start to think like a data scientist :-)
    $endgroup$
    – I_Play_With_Data
    10 hours ago










  • $begingroup$
    Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy then we have a boundary comprising x,y,xy,x^2,y^2 hence the boundary represented with the above features would be of the form ax+by+cxy+dx^2+ey^2 hence we could anyway construct the same boundary as we could have with single degree features . Since loss function would take every possible boundary hence shouldn't our error with degree 2 <= degree 1
    $endgroup$
    – Apoorv Jain
    10 hours ago
















$begingroup$
Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy
$endgroup$
– Apoorv Jain
10 hours ago




$begingroup$
Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy
$endgroup$
– Apoorv Jain
10 hours ago




1




1




$begingroup$
@ApoorvJain Dont start with the formula, start with your data, start with a scatterplot. What does that plot look like? What polynomial would you use to draw a similar graph? When you start thinking in those terms, then you start to think like a data scientist :-)
$endgroup$
– I_Play_With_Data
10 hours ago




$begingroup$
@ApoorvJain Dont start with the formula, start with your data, start with a scatterplot. What does that plot look like? What polynomial would you use to draw a similar graph? When you start thinking in those terms, then you start to think like a data scientist :-)
$endgroup$
– I_Play_With_Data
10 hours ago












$begingroup$
Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy then we have a boundary comprising x,y,xy,x^2,y^2 hence the boundary represented with the above features would be of the form ax+by+cxy+dx^2+ey^2 hence we could anyway construct the same boundary as we could have with single degree features . Since loss function would take every possible boundary hence shouldn't our error with degree 2 <= degree 1
$endgroup$
– Apoorv Jain
10 hours ago




$begingroup$
Let's say my initial features were x,y hence I have degree 1 .Now lets say we come up with polynomial features of degree 2 ie x^2, y^2, xy then we have a boundary comprising x,y,xy,x^2,y^2 hence the boundary represented with the above features would be of the form ax+by+cxy+dx^2+ey^2 hence we could anyway construct the same boundary as we could have with single degree features . Since loss function would take every possible boundary hence shouldn't our error with degree 2 <= degree 1
$endgroup$
– Apoorv Jain
10 hours ago










Apoorv Jain is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Apoorv Jain is a new contributor. Be nice, and check out our Code of Conduct.













Apoorv Jain is a new contributor. Be nice, and check out our Code of Conduct.












Apoorv Jain is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46052%2fwhy-is-my-training-accuracy-decreasing-higher-degrees-of-polynomial-features%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Ponta tanko

Tantalo (mitologio)

Erzsébet Schaár