What is the difference between “equivariant to translation” and “invariant to translation”












23












$begingroup$


I'm having trouble understanding the difference between equivariant to translation and invariant to translation.



In the book Deep Learning. MIT Press, 2016 (I. Goodfellow, A. Courville, and Y. Bengio), one can find on the convolutional networks:




  • [...] the particular form of parameter sharing causes the layer to have a property called equivariance to translation

  • [...] pooling helps to make the representation become approximately invariant to small translations of the input


Is there any difference between them or are the terms interchangeably used?










share|improve this question











$endgroup$








  • 2




    $begingroup$
    In the old days of Statistics, as in the time of Pitman, invariant was used in the meaning of equivariant.
    $endgroup$
    – Xi'an
    Oct 12 '18 at 18:12
















23












$begingroup$


I'm having trouble understanding the difference between equivariant to translation and invariant to translation.



In the book Deep Learning. MIT Press, 2016 (I. Goodfellow, A. Courville, and Y. Bengio), one can find on the convolutional networks:




  • [...] the particular form of parameter sharing causes the layer to have a property called equivariance to translation

  • [...] pooling helps to make the representation become approximately invariant to small translations of the input


Is there any difference between them or are the terms interchangeably used?










share|improve this question











$endgroup$








  • 2




    $begingroup$
    In the old days of Statistics, as in the time of Pitman, invariant was used in the meaning of equivariant.
    $endgroup$
    – Xi'an
    Oct 12 '18 at 18:12














23












23








23


13



$begingroup$


I'm having trouble understanding the difference between equivariant to translation and invariant to translation.



In the book Deep Learning. MIT Press, 2016 (I. Goodfellow, A. Courville, and Y. Bengio), one can find on the convolutional networks:




  • [...] the particular form of parameter sharing causes the layer to have a property called equivariance to translation

  • [...] pooling helps to make the representation become approximately invariant to small translations of the input


Is there any difference between them or are the terms interchangeably used?










share|improve this question











$endgroup$




I'm having trouble understanding the difference between equivariant to translation and invariant to translation.



In the book Deep Learning. MIT Press, 2016 (I. Goodfellow, A. Courville, and Y. Bengio), one can find on the convolutional networks:




  • [...] the particular form of parameter sharing causes the layer to have a property called equivariance to translation

  • [...] pooling helps to make the representation become approximately invariant to small translations of the input


Is there any difference between them or are the terms interchangeably used?







neural-network deep-learning convolution






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 10 mins ago









nbro

290417




290417










asked Jan 4 '17 at 8:41









Aamir Aamir

14315




14315








  • 2




    $begingroup$
    In the old days of Statistics, as in the time of Pitman, invariant was used in the meaning of equivariant.
    $endgroup$
    – Xi'an
    Oct 12 '18 at 18:12














  • 2




    $begingroup$
    In the old days of Statistics, as in the time of Pitman, invariant was used in the meaning of equivariant.
    $endgroup$
    – Xi'an
    Oct 12 '18 at 18:12








2




2




$begingroup$
In the old days of Statistics, as in the time of Pitman, invariant was used in the meaning of equivariant.
$endgroup$
– Xi'an
Oct 12 '18 at 18:12




$begingroup$
In the old days of Statistics, as in the time of Pitman, invariant was used in the meaning of equivariant.
$endgroup$
– Xi'an
Oct 12 '18 at 18:12










3 Answers
3






active

oldest

votes


















24












$begingroup$

Equivariance and invariance are sometimes used interchangeably. As pointed out by @Xi'an, you can find uses in the statistical literature, for instance on the notions of Invariant estimator and especially Pitman estimator.



However, I would like to mention that it would be better if both terms keep separated, as the prefix "in-" in invariant is privative (meaning "no variance" at all), while "equi-" refers to "varying in a similar or equivalent proportion".



Let us start from simple image features, and suppose that image $I$ has a unique maximum $m$ at location $(x_m,y_m)$, which is here the main classification feature.



An interesting property of classifiers is their ability to classify in the same manner some distorted versions $I'$ of $I$, for instance translations by all vectors $(u,v)$. The maximum value $m'$ of $I'$ is invariant: $m'=m$: the value is the same. While its location will be at $(x'_m,y'_m)=(x_m-u,y_m-v)$, and is equivariant, meaning that is varies "equally" with the distortion.



The precise formulations given in mathematics for equivariance may depend on the objects and transformations one considers, so I prefer here the notion that is most often used in practice (and I may get the blame from a theoretical stand-point).



Here, translations (or some more generic action) can be equipped with the structure of a group $G$, $g$ being one specific translation operator. A function or feature $f$ is invariant under $G$ if for all images in a class, and for any $g$,
$$f(g(I)) = f(I),.$$



It becomes equivariant if there exists another structure (often a group) $G'$ that reflects
the
transformations in $G$ in a meaningful way. In other words, such that for each $g$, you have one a unique $g' in G'$ such that



$$f(g(I)) = g'(f(I)),.$$



In the above example on the group of translations, $g$ and $g'$ are the same (and hence $G'=G$): an integer translation of the image reflects as the exact same translation of the maximum location.



Another common definition is:



$$f(g(I)) = g(f(I)),.$$



I however used potentially different $G$ and $G'$ because sometimes $f(I)$ and $g(I)$ are not in the same domain. This happens for instance in multivariate statistics (see e.g. Equivariance and invariance properties of multivariate quantile and related functions, and the role of standardisation).
But here, the uniqueness of the mapping between $g$ and $g'$ allows to get back to the original transformation $g$.



Often people use the term invariance because the equivariance concept is unknown, or everybody else uses invariance, and equivariance would seem more pedantic.



For the record, other related notions (esp. in maths and physics) are termed covariance, contravariance, differential invariance.



In addition, translation-invariance, as least approximate, or in envelope, has been a quest for several signal and image processing tools. Notably, multi-rate (filter-banks) and multi-scale (wavelets or pyramids) transformations have been design in the past 25 years, for instance under the hood of shift-invariant, cycle-spinning, stationary, complex, dual-tree wavelet transforms (for a review on 2D wavelets, A panorama on multiscale geometric representations). The wavelets can absorb a few discrete scale variations. All theses (approximate) invariances often come with the price of redundancy in the number of transformed coefficients.






share|improve this answer











$endgroup$









  • 4




    $begingroup$
    Great! I really admire your effort for the detailed reply @Laurent Duval
    $endgroup$
    – Aamir
    Jan 5 '17 at 8:32



















17












$begingroup$

The terms are different:




  • Equivariant to translation means that a translation of input features results in an equivalent translation of outputs. So if your pattern 0,3,2,0,0 on the input results in 0,1,0,0 in the output, then the pattern 0,0,3,2,0 might lead to 0,0,1,0


  • Invariant to translation means that a translation of input features doe not change the outputs at all. So if your pattern 0,3,2,0,0 on the input results in 0,1,0 in the output, then the pattern 0,0,3,2,0 would also lead to 0,1,0



For feature maps in convolutional networks to be useful, they typically need both properties in some balance. The equivariance allows the network to generalise edge, texture, shape detection in different locations. The invariance allows precise location of the detected features to matter less. These are two complementary types of generalisation for many image processing tasks.






share|improve this answer











$endgroup$













  • $begingroup$
    Translated feature yields translated output at some layer. Please elaborate about considerably translated whole object being detected. Seems it will be detected even if CNN was not trained with images containing different positions? Does equivariance hold in this case (looks more similar to invariance)?
    $endgroup$
    – VladimirLenin
    Jul 14 '17 at 10:14










  • $begingroup$
    @VladimirLenin: I don't think that elaboration is required for this question, it is definitely not something the OP has asked here. I suggest you ask a separate question, with a concrete example if possible. Even if visually a "whole object" has been translated, that does not mean feature maps in a CNN are tracking the same thing as you expect.
    $endgroup$
    – Neil Slater
    Jul 14 '17 at 10:24



















2












$begingroup$

Just adding my 2 cents



Regarding an image classification task solved with a typical CNN Architecture consisting of a Backend (Convolutions + NL + possibly Spatial Pooling) which performs Representation Learning and of a Frontend (e.g. Fully Connected Layers, MLP) which solves the specific task, in this case image classification, the idea is to build a function $ f : I rightarrow L $ able to map from the Spatial Domain $ I $ (Input Image) to the Semantic Domain $ L $ (Label Set) in a 2 step process which is




  • Backend (Representation Learning) : $ f : I rightarrow mathcal{L} $ maps the Input to the Latent Semantic Space

  • Frontend (Task Specific Solver) : $ f : mathcal{L} rightarrow L $ maps from the Latent Semantic Space to the Final Label Space


and it is performed using the following properties




  • spatial equivariance, regarding ConvLayer (Spatial 2D Convolution+NonLin e.g. ReLU) as a shift in the Layer Input produces a shift in the Layer Output (Note: it is about the Layer, not the single Convolution Operator)

  • spatial invariance, regarding the Pooling Operator (e.g. Max Pooling passes over the max value in its receptive field regardless of its spatial position)


The closer to the input layer, the closer to the purely spatial domain $ I $ and the more important the spatial equivariance property which allows to build spatially equivariant hierarchical (increasingly) semantic representation



The closer to the frontend, the closer to the latent purely semantic domain $ mathcal{L} $ and the more important the spatial invariance as the specific meaning of the image is desired to be independent from the spatial positions of the features



Using fully connected layers in the frontend makes the classifier sensitive to feature position at some extent, depending on the backend structure : the deeper it is and the more the translation invariant operator (Pooling) used



It has been shown in Quantifying Translation-Invariance in Convolutional Neural Networks that to improve the CNN Classifier Translation Invariance, instead of acting on the inductive bias (architecture hence depth, pooling, …) it's more effective to act on the dataset bias (data augmentation)






share|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f16060%2fwhat-is-the-difference-between-equivariant-to-translation-and-invariant-to-tr%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    24












    $begingroup$

    Equivariance and invariance are sometimes used interchangeably. As pointed out by @Xi'an, you can find uses in the statistical literature, for instance on the notions of Invariant estimator and especially Pitman estimator.



    However, I would like to mention that it would be better if both terms keep separated, as the prefix "in-" in invariant is privative (meaning "no variance" at all), while "equi-" refers to "varying in a similar or equivalent proportion".



    Let us start from simple image features, and suppose that image $I$ has a unique maximum $m$ at location $(x_m,y_m)$, which is here the main classification feature.



    An interesting property of classifiers is their ability to classify in the same manner some distorted versions $I'$ of $I$, for instance translations by all vectors $(u,v)$. The maximum value $m'$ of $I'$ is invariant: $m'=m$: the value is the same. While its location will be at $(x'_m,y'_m)=(x_m-u,y_m-v)$, and is equivariant, meaning that is varies "equally" with the distortion.



    The precise formulations given in mathematics for equivariance may depend on the objects and transformations one considers, so I prefer here the notion that is most often used in practice (and I may get the blame from a theoretical stand-point).



    Here, translations (or some more generic action) can be equipped with the structure of a group $G$, $g$ being one specific translation operator. A function or feature $f$ is invariant under $G$ if for all images in a class, and for any $g$,
    $$f(g(I)) = f(I),.$$



    It becomes equivariant if there exists another structure (often a group) $G'$ that reflects
    the
    transformations in $G$ in a meaningful way. In other words, such that for each $g$, you have one a unique $g' in G'$ such that



    $$f(g(I)) = g'(f(I)),.$$



    In the above example on the group of translations, $g$ and $g'$ are the same (and hence $G'=G$): an integer translation of the image reflects as the exact same translation of the maximum location.



    Another common definition is:



    $$f(g(I)) = g(f(I)),.$$



    I however used potentially different $G$ and $G'$ because sometimes $f(I)$ and $g(I)$ are not in the same domain. This happens for instance in multivariate statistics (see e.g. Equivariance and invariance properties of multivariate quantile and related functions, and the role of standardisation).
    But here, the uniqueness of the mapping between $g$ and $g'$ allows to get back to the original transformation $g$.



    Often people use the term invariance because the equivariance concept is unknown, or everybody else uses invariance, and equivariance would seem more pedantic.



    For the record, other related notions (esp. in maths and physics) are termed covariance, contravariance, differential invariance.



    In addition, translation-invariance, as least approximate, or in envelope, has been a quest for several signal and image processing tools. Notably, multi-rate (filter-banks) and multi-scale (wavelets or pyramids) transformations have been design in the past 25 years, for instance under the hood of shift-invariant, cycle-spinning, stationary, complex, dual-tree wavelet transforms (for a review on 2D wavelets, A panorama on multiscale geometric representations). The wavelets can absorb a few discrete scale variations. All theses (approximate) invariances often come with the price of redundancy in the number of transformed coefficients.






    share|improve this answer











    $endgroup$









    • 4




      $begingroup$
      Great! I really admire your effort for the detailed reply @Laurent Duval
      $endgroup$
      – Aamir
      Jan 5 '17 at 8:32
















    24












    $begingroup$

    Equivariance and invariance are sometimes used interchangeably. As pointed out by @Xi'an, you can find uses in the statistical literature, for instance on the notions of Invariant estimator and especially Pitman estimator.



    However, I would like to mention that it would be better if both terms keep separated, as the prefix "in-" in invariant is privative (meaning "no variance" at all), while "equi-" refers to "varying in a similar or equivalent proportion".



    Let us start from simple image features, and suppose that image $I$ has a unique maximum $m$ at location $(x_m,y_m)$, which is here the main classification feature.



    An interesting property of classifiers is their ability to classify in the same manner some distorted versions $I'$ of $I$, for instance translations by all vectors $(u,v)$. The maximum value $m'$ of $I'$ is invariant: $m'=m$: the value is the same. While its location will be at $(x'_m,y'_m)=(x_m-u,y_m-v)$, and is equivariant, meaning that is varies "equally" with the distortion.



    The precise formulations given in mathematics for equivariance may depend on the objects and transformations one considers, so I prefer here the notion that is most often used in practice (and I may get the blame from a theoretical stand-point).



    Here, translations (or some more generic action) can be equipped with the structure of a group $G$, $g$ being one specific translation operator. A function or feature $f$ is invariant under $G$ if for all images in a class, and for any $g$,
    $$f(g(I)) = f(I),.$$



    It becomes equivariant if there exists another structure (often a group) $G'$ that reflects
    the
    transformations in $G$ in a meaningful way. In other words, such that for each $g$, you have one a unique $g' in G'$ such that



    $$f(g(I)) = g'(f(I)),.$$



    In the above example on the group of translations, $g$ and $g'$ are the same (and hence $G'=G$): an integer translation of the image reflects as the exact same translation of the maximum location.



    Another common definition is:



    $$f(g(I)) = g(f(I)),.$$



    I however used potentially different $G$ and $G'$ because sometimes $f(I)$ and $g(I)$ are not in the same domain. This happens for instance in multivariate statistics (see e.g. Equivariance and invariance properties of multivariate quantile and related functions, and the role of standardisation).
    But here, the uniqueness of the mapping between $g$ and $g'$ allows to get back to the original transformation $g$.



    Often people use the term invariance because the equivariance concept is unknown, or everybody else uses invariance, and equivariance would seem more pedantic.



    For the record, other related notions (esp. in maths and physics) are termed covariance, contravariance, differential invariance.



    In addition, translation-invariance, as least approximate, or in envelope, has been a quest for several signal and image processing tools. Notably, multi-rate (filter-banks) and multi-scale (wavelets or pyramids) transformations have been design in the past 25 years, for instance under the hood of shift-invariant, cycle-spinning, stationary, complex, dual-tree wavelet transforms (for a review on 2D wavelets, A panorama on multiscale geometric representations). The wavelets can absorb a few discrete scale variations. All theses (approximate) invariances often come with the price of redundancy in the number of transformed coefficients.






    share|improve this answer











    $endgroup$









    • 4




      $begingroup$
      Great! I really admire your effort for the detailed reply @Laurent Duval
      $endgroup$
      – Aamir
      Jan 5 '17 at 8:32














    24












    24








    24





    $begingroup$

    Equivariance and invariance are sometimes used interchangeably. As pointed out by @Xi'an, you can find uses in the statistical literature, for instance on the notions of Invariant estimator and especially Pitman estimator.



    However, I would like to mention that it would be better if both terms keep separated, as the prefix "in-" in invariant is privative (meaning "no variance" at all), while "equi-" refers to "varying in a similar or equivalent proportion".



    Let us start from simple image features, and suppose that image $I$ has a unique maximum $m$ at location $(x_m,y_m)$, which is here the main classification feature.



    An interesting property of classifiers is their ability to classify in the same manner some distorted versions $I'$ of $I$, for instance translations by all vectors $(u,v)$. The maximum value $m'$ of $I'$ is invariant: $m'=m$: the value is the same. While its location will be at $(x'_m,y'_m)=(x_m-u,y_m-v)$, and is equivariant, meaning that is varies "equally" with the distortion.



    The precise formulations given in mathematics for equivariance may depend on the objects and transformations one considers, so I prefer here the notion that is most often used in practice (and I may get the blame from a theoretical stand-point).



    Here, translations (or some more generic action) can be equipped with the structure of a group $G$, $g$ being one specific translation operator. A function or feature $f$ is invariant under $G$ if for all images in a class, and for any $g$,
    $$f(g(I)) = f(I),.$$



    It becomes equivariant if there exists another structure (often a group) $G'$ that reflects
    the
    transformations in $G$ in a meaningful way. In other words, such that for each $g$, you have one a unique $g' in G'$ such that



    $$f(g(I)) = g'(f(I)),.$$



    In the above example on the group of translations, $g$ and $g'$ are the same (and hence $G'=G$): an integer translation of the image reflects as the exact same translation of the maximum location.



    Another common definition is:



    $$f(g(I)) = g(f(I)),.$$



    I however used potentially different $G$ and $G'$ because sometimes $f(I)$ and $g(I)$ are not in the same domain. This happens for instance in multivariate statistics (see e.g. Equivariance and invariance properties of multivariate quantile and related functions, and the role of standardisation).
    But here, the uniqueness of the mapping between $g$ and $g'$ allows to get back to the original transformation $g$.



    Often people use the term invariance because the equivariance concept is unknown, or everybody else uses invariance, and equivariance would seem more pedantic.



    For the record, other related notions (esp. in maths and physics) are termed covariance, contravariance, differential invariance.



    In addition, translation-invariance, as least approximate, or in envelope, has been a quest for several signal and image processing tools. Notably, multi-rate (filter-banks) and multi-scale (wavelets or pyramids) transformations have been design in the past 25 years, for instance under the hood of shift-invariant, cycle-spinning, stationary, complex, dual-tree wavelet transforms (for a review on 2D wavelets, A panorama on multiscale geometric representations). The wavelets can absorb a few discrete scale variations. All theses (approximate) invariances often come with the price of redundancy in the number of transformed coefficients.






    share|improve this answer











    $endgroup$



    Equivariance and invariance are sometimes used interchangeably. As pointed out by @Xi'an, you can find uses in the statistical literature, for instance on the notions of Invariant estimator and especially Pitman estimator.



    However, I would like to mention that it would be better if both terms keep separated, as the prefix "in-" in invariant is privative (meaning "no variance" at all), while "equi-" refers to "varying in a similar or equivalent proportion".



    Let us start from simple image features, and suppose that image $I$ has a unique maximum $m$ at location $(x_m,y_m)$, which is here the main classification feature.



    An interesting property of classifiers is their ability to classify in the same manner some distorted versions $I'$ of $I$, for instance translations by all vectors $(u,v)$. The maximum value $m'$ of $I'$ is invariant: $m'=m$: the value is the same. While its location will be at $(x'_m,y'_m)=(x_m-u,y_m-v)$, and is equivariant, meaning that is varies "equally" with the distortion.



    The precise formulations given in mathematics for equivariance may depend on the objects and transformations one considers, so I prefer here the notion that is most often used in practice (and I may get the blame from a theoretical stand-point).



    Here, translations (or some more generic action) can be equipped with the structure of a group $G$, $g$ being one specific translation operator. A function or feature $f$ is invariant under $G$ if for all images in a class, and for any $g$,
    $$f(g(I)) = f(I),.$$



    It becomes equivariant if there exists another structure (often a group) $G'$ that reflects
    the
    transformations in $G$ in a meaningful way. In other words, such that for each $g$, you have one a unique $g' in G'$ such that



    $$f(g(I)) = g'(f(I)),.$$



    In the above example on the group of translations, $g$ and $g'$ are the same (and hence $G'=G$): an integer translation of the image reflects as the exact same translation of the maximum location.



    Another common definition is:



    $$f(g(I)) = g(f(I)),.$$



    I however used potentially different $G$ and $G'$ because sometimes $f(I)$ and $g(I)$ are not in the same domain. This happens for instance in multivariate statistics (see e.g. Equivariance and invariance properties of multivariate quantile and related functions, and the role of standardisation).
    But here, the uniqueness of the mapping between $g$ and $g'$ allows to get back to the original transformation $g$.



    Often people use the term invariance because the equivariance concept is unknown, or everybody else uses invariance, and equivariance would seem more pedantic.



    For the record, other related notions (esp. in maths and physics) are termed covariance, contravariance, differential invariance.



    In addition, translation-invariance, as least approximate, or in envelope, has been a quest for several signal and image processing tools. Notably, multi-rate (filter-banks) and multi-scale (wavelets or pyramids) transformations have been design in the past 25 years, for instance under the hood of shift-invariant, cycle-spinning, stationary, complex, dual-tree wavelet transforms (for a review on 2D wavelets, A panorama on multiscale geometric representations). The wavelets can absorb a few discrete scale variations. All theses (approximate) invariances often come with the price of redundancy in the number of transformed coefficients.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 30 '18 at 14:17

























    answered Jan 4 '17 at 22:53









    Laurent DuvalLaurent Duval

    762619




    762619








    • 4




      $begingroup$
      Great! I really admire your effort for the detailed reply @Laurent Duval
      $endgroup$
      – Aamir
      Jan 5 '17 at 8:32














    • 4




      $begingroup$
      Great! I really admire your effort for the detailed reply @Laurent Duval
      $endgroup$
      – Aamir
      Jan 5 '17 at 8:32








    4




    4




    $begingroup$
    Great! I really admire your effort for the detailed reply @Laurent Duval
    $endgroup$
    – Aamir
    Jan 5 '17 at 8:32




    $begingroup$
    Great! I really admire your effort for the detailed reply @Laurent Duval
    $endgroup$
    – Aamir
    Jan 5 '17 at 8:32











    17












    $begingroup$

    The terms are different:




    • Equivariant to translation means that a translation of input features results in an equivalent translation of outputs. So if your pattern 0,3,2,0,0 on the input results in 0,1,0,0 in the output, then the pattern 0,0,3,2,0 might lead to 0,0,1,0


    • Invariant to translation means that a translation of input features doe not change the outputs at all. So if your pattern 0,3,2,0,0 on the input results in 0,1,0 in the output, then the pattern 0,0,3,2,0 would also lead to 0,1,0



    For feature maps in convolutional networks to be useful, they typically need both properties in some balance. The equivariance allows the network to generalise edge, texture, shape detection in different locations. The invariance allows precise location of the detected features to matter less. These are two complementary types of generalisation for many image processing tasks.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Translated feature yields translated output at some layer. Please elaborate about considerably translated whole object being detected. Seems it will be detected even if CNN was not trained with images containing different positions? Does equivariance hold in this case (looks more similar to invariance)?
      $endgroup$
      – VladimirLenin
      Jul 14 '17 at 10:14










    • $begingroup$
      @VladimirLenin: I don't think that elaboration is required for this question, it is definitely not something the OP has asked here. I suggest you ask a separate question, with a concrete example if possible. Even if visually a "whole object" has been translated, that does not mean feature maps in a CNN are tracking the same thing as you expect.
      $endgroup$
      – Neil Slater
      Jul 14 '17 at 10:24
















    17












    $begingroup$

    The terms are different:




    • Equivariant to translation means that a translation of input features results in an equivalent translation of outputs. So if your pattern 0,3,2,0,0 on the input results in 0,1,0,0 in the output, then the pattern 0,0,3,2,0 might lead to 0,0,1,0


    • Invariant to translation means that a translation of input features doe not change the outputs at all. So if your pattern 0,3,2,0,0 on the input results in 0,1,0 in the output, then the pattern 0,0,3,2,0 would also lead to 0,1,0



    For feature maps in convolutional networks to be useful, they typically need both properties in some balance. The equivariance allows the network to generalise edge, texture, shape detection in different locations. The invariance allows precise location of the detected features to matter less. These are two complementary types of generalisation for many image processing tasks.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Translated feature yields translated output at some layer. Please elaborate about considerably translated whole object being detected. Seems it will be detected even if CNN was not trained with images containing different positions? Does equivariance hold in this case (looks more similar to invariance)?
      $endgroup$
      – VladimirLenin
      Jul 14 '17 at 10:14










    • $begingroup$
      @VladimirLenin: I don't think that elaboration is required for this question, it is definitely not something the OP has asked here. I suggest you ask a separate question, with a concrete example if possible. Even if visually a "whole object" has been translated, that does not mean feature maps in a CNN are tracking the same thing as you expect.
      $endgroup$
      – Neil Slater
      Jul 14 '17 at 10:24














    17












    17








    17





    $begingroup$

    The terms are different:




    • Equivariant to translation means that a translation of input features results in an equivalent translation of outputs. So if your pattern 0,3,2,0,0 on the input results in 0,1,0,0 in the output, then the pattern 0,0,3,2,0 might lead to 0,0,1,0


    • Invariant to translation means that a translation of input features doe not change the outputs at all. So if your pattern 0,3,2,0,0 on the input results in 0,1,0 in the output, then the pattern 0,0,3,2,0 would also lead to 0,1,0



    For feature maps in convolutional networks to be useful, they typically need both properties in some balance. The equivariance allows the network to generalise edge, texture, shape detection in different locations. The invariance allows precise location of the detected features to matter less. These are two complementary types of generalisation for many image processing tasks.






    share|improve this answer











    $endgroup$



    The terms are different:




    • Equivariant to translation means that a translation of input features results in an equivalent translation of outputs. So if your pattern 0,3,2,0,0 on the input results in 0,1,0,0 in the output, then the pattern 0,0,3,2,0 might lead to 0,0,1,0


    • Invariant to translation means that a translation of input features doe not change the outputs at all. So if your pattern 0,3,2,0,0 on the input results in 0,1,0 in the output, then the pattern 0,0,3,2,0 would also lead to 0,1,0



    For feature maps in convolutional networks to be useful, they typically need both properties in some balance. The equivariance allows the network to generalise edge, texture, shape detection in different locations. The invariance allows precise location of the detected features to matter less. These are two complementary types of generalisation for many image processing tasks.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 4 '17 at 20:45

























    answered Jan 4 '17 at 20:39









    Neil SlaterNeil Slater

    17k22961




    17k22961












    • $begingroup$
      Translated feature yields translated output at some layer. Please elaborate about considerably translated whole object being detected. Seems it will be detected even if CNN was not trained with images containing different positions? Does equivariance hold in this case (looks more similar to invariance)?
      $endgroup$
      – VladimirLenin
      Jul 14 '17 at 10:14










    • $begingroup$
      @VladimirLenin: I don't think that elaboration is required for this question, it is definitely not something the OP has asked here. I suggest you ask a separate question, with a concrete example if possible. Even if visually a "whole object" has been translated, that does not mean feature maps in a CNN are tracking the same thing as you expect.
      $endgroup$
      – Neil Slater
      Jul 14 '17 at 10:24


















    • $begingroup$
      Translated feature yields translated output at some layer. Please elaborate about considerably translated whole object being detected. Seems it will be detected even if CNN was not trained with images containing different positions? Does equivariance hold in this case (looks more similar to invariance)?
      $endgroup$
      – VladimirLenin
      Jul 14 '17 at 10:14










    • $begingroup$
      @VladimirLenin: I don't think that elaboration is required for this question, it is definitely not something the OP has asked here. I suggest you ask a separate question, with a concrete example if possible. Even if visually a "whole object" has been translated, that does not mean feature maps in a CNN are tracking the same thing as you expect.
      $endgroup$
      – Neil Slater
      Jul 14 '17 at 10:24
















    $begingroup$
    Translated feature yields translated output at some layer. Please elaborate about considerably translated whole object being detected. Seems it will be detected even if CNN was not trained with images containing different positions? Does equivariance hold in this case (looks more similar to invariance)?
    $endgroup$
    – VladimirLenin
    Jul 14 '17 at 10:14




    $begingroup$
    Translated feature yields translated output at some layer. Please elaborate about considerably translated whole object being detected. Seems it will be detected even if CNN was not trained with images containing different positions? Does equivariance hold in this case (looks more similar to invariance)?
    $endgroup$
    – VladimirLenin
    Jul 14 '17 at 10:14












    $begingroup$
    @VladimirLenin: I don't think that elaboration is required for this question, it is definitely not something the OP has asked here. I suggest you ask a separate question, with a concrete example if possible. Even if visually a "whole object" has been translated, that does not mean feature maps in a CNN are tracking the same thing as you expect.
    $endgroup$
    – Neil Slater
    Jul 14 '17 at 10:24




    $begingroup$
    @VladimirLenin: I don't think that elaboration is required for this question, it is definitely not something the OP has asked here. I suggest you ask a separate question, with a concrete example if possible. Even if visually a "whole object" has been translated, that does not mean feature maps in a CNN are tracking the same thing as you expect.
    $endgroup$
    – Neil Slater
    Jul 14 '17 at 10:24











    2












    $begingroup$

    Just adding my 2 cents



    Regarding an image classification task solved with a typical CNN Architecture consisting of a Backend (Convolutions + NL + possibly Spatial Pooling) which performs Representation Learning and of a Frontend (e.g. Fully Connected Layers, MLP) which solves the specific task, in this case image classification, the idea is to build a function $ f : I rightarrow L $ able to map from the Spatial Domain $ I $ (Input Image) to the Semantic Domain $ L $ (Label Set) in a 2 step process which is




    • Backend (Representation Learning) : $ f : I rightarrow mathcal{L} $ maps the Input to the Latent Semantic Space

    • Frontend (Task Specific Solver) : $ f : mathcal{L} rightarrow L $ maps from the Latent Semantic Space to the Final Label Space


    and it is performed using the following properties




    • spatial equivariance, regarding ConvLayer (Spatial 2D Convolution+NonLin e.g. ReLU) as a shift in the Layer Input produces a shift in the Layer Output (Note: it is about the Layer, not the single Convolution Operator)

    • spatial invariance, regarding the Pooling Operator (e.g. Max Pooling passes over the max value in its receptive field regardless of its spatial position)


    The closer to the input layer, the closer to the purely spatial domain $ I $ and the more important the spatial equivariance property which allows to build spatially equivariant hierarchical (increasingly) semantic representation



    The closer to the frontend, the closer to the latent purely semantic domain $ mathcal{L} $ and the more important the spatial invariance as the specific meaning of the image is desired to be independent from the spatial positions of the features



    Using fully connected layers in the frontend makes the classifier sensitive to feature position at some extent, depending on the backend structure : the deeper it is and the more the translation invariant operator (Pooling) used



    It has been shown in Quantifying Translation-Invariance in Convolutional Neural Networks that to improve the CNN Classifier Translation Invariance, instead of acting on the inductive bias (architecture hence depth, pooling, …) it's more effective to act on the dataset bias (data augmentation)






    share|improve this answer









    $endgroup$


















      2












      $begingroup$

      Just adding my 2 cents



      Regarding an image classification task solved with a typical CNN Architecture consisting of a Backend (Convolutions + NL + possibly Spatial Pooling) which performs Representation Learning and of a Frontend (e.g. Fully Connected Layers, MLP) which solves the specific task, in this case image classification, the idea is to build a function $ f : I rightarrow L $ able to map from the Spatial Domain $ I $ (Input Image) to the Semantic Domain $ L $ (Label Set) in a 2 step process which is




      • Backend (Representation Learning) : $ f : I rightarrow mathcal{L} $ maps the Input to the Latent Semantic Space

      • Frontend (Task Specific Solver) : $ f : mathcal{L} rightarrow L $ maps from the Latent Semantic Space to the Final Label Space


      and it is performed using the following properties




      • spatial equivariance, regarding ConvLayer (Spatial 2D Convolution+NonLin e.g. ReLU) as a shift in the Layer Input produces a shift in the Layer Output (Note: it is about the Layer, not the single Convolution Operator)

      • spatial invariance, regarding the Pooling Operator (e.g. Max Pooling passes over the max value in its receptive field regardless of its spatial position)


      The closer to the input layer, the closer to the purely spatial domain $ I $ and the more important the spatial equivariance property which allows to build spatially equivariant hierarchical (increasingly) semantic representation



      The closer to the frontend, the closer to the latent purely semantic domain $ mathcal{L} $ and the more important the spatial invariance as the specific meaning of the image is desired to be independent from the spatial positions of the features



      Using fully connected layers in the frontend makes the classifier sensitive to feature position at some extent, depending on the backend structure : the deeper it is and the more the translation invariant operator (Pooling) used



      It has been shown in Quantifying Translation-Invariance in Convolutional Neural Networks that to improve the CNN Classifier Translation Invariance, instead of acting on the inductive bias (architecture hence depth, pooling, …) it's more effective to act on the dataset bias (data augmentation)






      share|improve this answer









      $endgroup$
















        2












        2








        2





        $begingroup$

        Just adding my 2 cents



        Regarding an image classification task solved with a typical CNN Architecture consisting of a Backend (Convolutions + NL + possibly Spatial Pooling) which performs Representation Learning and of a Frontend (e.g. Fully Connected Layers, MLP) which solves the specific task, in this case image classification, the idea is to build a function $ f : I rightarrow L $ able to map from the Spatial Domain $ I $ (Input Image) to the Semantic Domain $ L $ (Label Set) in a 2 step process which is




        • Backend (Representation Learning) : $ f : I rightarrow mathcal{L} $ maps the Input to the Latent Semantic Space

        • Frontend (Task Specific Solver) : $ f : mathcal{L} rightarrow L $ maps from the Latent Semantic Space to the Final Label Space


        and it is performed using the following properties




        • spatial equivariance, regarding ConvLayer (Spatial 2D Convolution+NonLin e.g. ReLU) as a shift in the Layer Input produces a shift in the Layer Output (Note: it is about the Layer, not the single Convolution Operator)

        • spatial invariance, regarding the Pooling Operator (e.g. Max Pooling passes over the max value in its receptive field regardless of its spatial position)


        The closer to the input layer, the closer to the purely spatial domain $ I $ and the more important the spatial equivariance property which allows to build spatially equivariant hierarchical (increasingly) semantic representation



        The closer to the frontend, the closer to the latent purely semantic domain $ mathcal{L} $ and the more important the spatial invariance as the specific meaning of the image is desired to be independent from the spatial positions of the features



        Using fully connected layers in the frontend makes the classifier sensitive to feature position at some extent, depending on the backend structure : the deeper it is and the more the translation invariant operator (Pooling) used



        It has been shown in Quantifying Translation-Invariance in Convolutional Neural Networks that to improve the CNN Classifier Translation Invariance, instead of acting on the inductive bias (architecture hence depth, pooling, …) it's more effective to act on the dataset bias (data augmentation)






        share|improve this answer









        $endgroup$



        Just adding my 2 cents



        Regarding an image classification task solved with a typical CNN Architecture consisting of a Backend (Convolutions + NL + possibly Spatial Pooling) which performs Representation Learning and of a Frontend (e.g. Fully Connected Layers, MLP) which solves the specific task, in this case image classification, the idea is to build a function $ f : I rightarrow L $ able to map from the Spatial Domain $ I $ (Input Image) to the Semantic Domain $ L $ (Label Set) in a 2 step process which is




        • Backend (Representation Learning) : $ f : I rightarrow mathcal{L} $ maps the Input to the Latent Semantic Space

        • Frontend (Task Specific Solver) : $ f : mathcal{L} rightarrow L $ maps from the Latent Semantic Space to the Final Label Space


        and it is performed using the following properties




        • spatial equivariance, regarding ConvLayer (Spatial 2D Convolution+NonLin e.g. ReLU) as a shift in the Layer Input produces a shift in the Layer Output (Note: it is about the Layer, not the single Convolution Operator)

        • spatial invariance, regarding the Pooling Operator (e.g. Max Pooling passes over the max value in its receptive field regardless of its spatial position)


        The closer to the input layer, the closer to the purely spatial domain $ I $ and the more important the spatial equivariance property which allows to build spatially equivariant hierarchical (increasingly) semantic representation



        The closer to the frontend, the closer to the latent purely semantic domain $ mathcal{L} $ and the more important the spatial invariance as the specific meaning of the image is desired to be independent from the spatial positions of the features



        Using fully connected layers in the frontend makes the classifier sensitive to feature position at some extent, depending on the backend structure : the deeper it is and the more the translation invariant operator (Pooling) used



        It has been shown in Quantifying Translation-Invariance in Convolutional Neural Networks that to improve the CNN Classifier Translation Invariance, instead of acting on the inductive bias (architecture hence depth, pooling, …) it's more effective to act on the dataset bias (data augmentation)







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 15 '18 at 15:42









        Nicola BerniniNicola Bernini

        1511




        1511






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f16060%2fwhat-is-the-difference-between-equivariant-to-translation-and-invariant-to-tr%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ponta tanko

            Tantalo (mitologio)

            Prelog