How to count categorical values including zero occurrence?












1












$begingroup$


I want to count number of code by month.
This is my example dataframe.



        id    month  code
0 sally 0 s_A
1 sally 0 s_B
2 sally 0 s_C
3 sally 0 s_D
4 sally 0 s_E
5 sally 0 s_A
6 sally 0 s_A
7 sally 0 s_B
8 sally 0 s_C
9 sally 0 s_A


I transformed to this Series using count().



df.groupby(['id', 'code', 'month']).m.count()

id code month count
sally s_A 0 12
1 10
2 3
7 15


But, I want to include zero occurrence, like this.



id      code   month  count
sally s_A 0 12
1 10
2 3
3 0
4 0
5 0
6 0
7 15
8 0
9 0
10 0
11 0









share|improve this question









$endgroup$












  • $begingroup$
    Without transforming it into a Series, just try this: df['month'].value_counts(), where df is your pandas dataframe
    $endgroup$
    – Nain
    Apr 6 '17 at 9:41










  • $begingroup$
    @Nain thanks, but I need to group by 'sally' and there are missing months like the example above.
    $endgroup$
    – planaria
    Apr 6 '17 at 9:44










  • $begingroup$
    @Kyle. Could you explain more? Months are not complete. How can I insert month with zero count?
    $endgroup$
    – planaria
    Apr 6 '17 at 22:21










  • $begingroup$
    df.month.value_counts(dropna=False)
    $endgroup$
    – Andrew L
    May 15 '17 at 9:03
















1












$begingroup$


I want to count number of code by month.
This is my example dataframe.



        id    month  code
0 sally 0 s_A
1 sally 0 s_B
2 sally 0 s_C
3 sally 0 s_D
4 sally 0 s_E
5 sally 0 s_A
6 sally 0 s_A
7 sally 0 s_B
8 sally 0 s_C
9 sally 0 s_A


I transformed to this Series using count().



df.groupby(['id', 'code', 'month']).m.count()

id code month count
sally s_A 0 12
1 10
2 3
7 15


But, I want to include zero occurrence, like this.



id      code   month  count
sally s_A 0 12
1 10
2 3
3 0
4 0
5 0
6 0
7 15
8 0
9 0
10 0
11 0









share|improve this question









$endgroup$












  • $begingroup$
    Without transforming it into a Series, just try this: df['month'].value_counts(), where df is your pandas dataframe
    $endgroup$
    – Nain
    Apr 6 '17 at 9:41










  • $begingroup$
    @Nain thanks, but I need to group by 'sally' and there are missing months like the example above.
    $endgroup$
    – planaria
    Apr 6 '17 at 9:44










  • $begingroup$
    @Kyle. Could you explain more? Months are not complete. How can I insert month with zero count?
    $endgroup$
    – planaria
    Apr 6 '17 at 22:21










  • $begingroup$
    df.month.value_counts(dropna=False)
    $endgroup$
    – Andrew L
    May 15 '17 at 9:03














1












1








1





$begingroup$


I want to count number of code by month.
This is my example dataframe.



        id    month  code
0 sally 0 s_A
1 sally 0 s_B
2 sally 0 s_C
3 sally 0 s_D
4 sally 0 s_E
5 sally 0 s_A
6 sally 0 s_A
7 sally 0 s_B
8 sally 0 s_C
9 sally 0 s_A


I transformed to this Series using count().



df.groupby(['id', 'code', 'month']).m.count()

id code month count
sally s_A 0 12
1 10
2 3
7 15


But, I want to include zero occurrence, like this.



id      code   month  count
sally s_A 0 12
1 10
2 3
3 0
4 0
5 0
6 0
7 15
8 0
9 0
10 0
11 0









share|improve this question









$endgroup$




I want to count number of code by month.
This is my example dataframe.



        id    month  code
0 sally 0 s_A
1 sally 0 s_B
2 sally 0 s_C
3 sally 0 s_D
4 sally 0 s_E
5 sally 0 s_A
6 sally 0 s_A
7 sally 0 s_B
8 sally 0 s_C
9 sally 0 s_A


I transformed to this Series using count().



df.groupby(['id', 'code', 'month']).m.count()

id code month count
sally s_A 0 12
1 10
2 3
7 15


But, I want to include zero occurrence, like this.



id      code   month  count
sally s_A 0 12
1 10
2 3
3 0
4 0
5 0
6 0
7 15
8 0
9 0
10 0
11 0






python pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Apr 6 '17 at 8:23









planariaplanaria

3814




3814












  • $begingroup$
    Without transforming it into a Series, just try this: df['month'].value_counts(), where df is your pandas dataframe
    $endgroup$
    – Nain
    Apr 6 '17 at 9:41










  • $begingroup$
    @Nain thanks, but I need to group by 'sally' and there are missing months like the example above.
    $endgroup$
    – planaria
    Apr 6 '17 at 9:44










  • $begingroup$
    @Kyle. Could you explain more? Months are not complete. How can I insert month with zero count?
    $endgroup$
    – planaria
    Apr 6 '17 at 22:21










  • $begingroup$
    df.month.value_counts(dropna=False)
    $endgroup$
    – Andrew L
    May 15 '17 at 9:03


















  • $begingroup$
    Without transforming it into a Series, just try this: df['month'].value_counts(), where df is your pandas dataframe
    $endgroup$
    – Nain
    Apr 6 '17 at 9:41










  • $begingroup$
    @Nain thanks, but I need to group by 'sally' and there are missing months like the example above.
    $endgroup$
    – planaria
    Apr 6 '17 at 9:44










  • $begingroup$
    @Kyle. Could you explain more? Months are not complete. How can I insert month with zero count?
    $endgroup$
    – planaria
    Apr 6 '17 at 22:21










  • $begingroup$
    df.month.value_counts(dropna=False)
    $endgroup$
    – Andrew L
    May 15 '17 at 9:03
















$begingroup$
Without transforming it into a Series, just try this: df['month'].value_counts(), where df is your pandas dataframe
$endgroup$
– Nain
Apr 6 '17 at 9:41




$begingroup$
Without transforming it into a Series, just try this: df['month'].value_counts(), where df is your pandas dataframe
$endgroup$
– Nain
Apr 6 '17 at 9:41












$begingroup$
@Nain thanks, but I need to group by 'sally' and there are missing months like the example above.
$endgroup$
– planaria
Apr 6 '17 at 9:44




$begingroup$
@Nain thanks, but I need to group by 'sally' and there are missing months like the example above.
$endgroup$
– planaria
Apr 6 '17 at 9:44












$begingroup$
@Kyle. Could you explain more? Months are not complete. How can I insert month with zero count?
$endgroup$
– planaria
Apr 6 '17 at 22:21




$begingroup$
@Kyle. Could you explain more? Months are not complete. How can I insert month with zero count?
$endgroup$
– planaria
Apr 6 '17 at 22:21












$begingroup$
df.month.value_counts(dropna=False)
$endgroup$
– Andrew L
May 15 '17 at 9:03




$begingroup$
df.month.value_counts(dropna=False)
$endgroup$
– Andrew L
May 15 '17 at 9:03










1 Answer
1






active

oldest

votes


















2












$begingroup$

Based on the short example DataFrame you provided, this block of code will include all of the months. It is based on using the Series.reindex method and creating a new MultiIndex with the additional values for the months:



import pandas as pd
# Load example data into DataFrame
df = pd.read_table("categorical_data.txt", delim_whitespace=True)

# Transform to a count
count = df.groupby(['id', 'code', 'month']).month.count()

# Re-create a new array of levels, now including all 12 months
levels = [count.index.levels[0].values, count.index.levels[1].values, range(12)]
new_index = pd.MultiIndex.from_product(levels, names=count.index.names)

# Reindex the count and fill empty values with zero (NaN by default)
count = count.reindex(new_index, fill_value=0)
print(count)


Printing the result, I get something like this (only showing the first entry for sally/s_A):



id     code  month
sally s_A 0 4
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0





share|improve this answer











$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f18154%2fhow-to-count-categorical-values-including-zero-occurrence%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2












    $begingroup$

    Based on the short example DataFrame you provided, this block of code will include all of the months. It is based on using the Series.reindex method and creating a new MultiIndex with the additional values for the months:



    import pandas as pd
    # Load example data into DataFrame
    df = pd.read_table("categorical_data.txt", delim_whitespace=True)

    # Transform to a count
    count = df.groupby(['id', 'code', 'month']).month.count()

    # Re-create a new array of levels, now including all 12 months
    levels = [count.index.levels[0].values, count.index.levels[1].values, range(12)]
    new_index = pd.MultiIndex.from_product(levels, names=count.index.names)

    # Reindex the count and fill empty values with zero (NaN by default)
    count = count.reindex(new_index, fill_value=0)
    print(count)


    Printing the result, I get something like this (only showing the first entry for sally/s_A):



    id     code  month
    sally s_A 0 4
    1 0
    2 0
    3 0
    4 0
    5 0
    6 0
    7 0
    8 0
    9 0
    10 0
    11 0





    share|improve this answer











    $endgroup$


















      2












      $begingroup$

      Based on the short example DataFrame you provided, this block of code will include all of the months. It is based on using the Series.reindex method and creating a new MultiIndex with the additional values for the months:



      import pandas as pd
      # Load example data into DataFrame
      df = pd.read_table("categorical_data.txt", delim_whitespace=True)

      # Transform to a count
      count = df.groupby(['id', 'code', 'month']).month.count()

      # Re-create a new array of levels, now including all 12 months
      levels = [count.index.levels[0].values, count.index.levels[1].values, range(12)]
      new_index = pd.MultiIndex.from_product(levels, names=count.index.names)

      # Reindex the count and fill empty values with zero (NaN by default)
      count = count.reindex(new_index, fill_value=0)
      print(count)


      Printing the result, I get something like this (only showing the first entry for sally/s_A):



      id     code  month
      sally s_A 0 4
      1 0
      2 0
      3 0
      4 0
      5 0
      6 0
      7 0
      8 0
      9 0
      10 0
      11 0





      share|improve this answer











      $endgroup$
















        2












        2








        2





        $begingroup$

        Based on the short example DataFrame you provided, this block of code will include all of the months. It is based on using the Series.reindex method and creating a new MultiIndex with the additional values for the months:



        import pandas as pd
        # Load example data into DataFrame
        df = pd.read_table("categorical_data.txt", delim_whitespace=True)

        # Transform to a count
        count = df.groupby(['id', 'code', 'month']).month.count()

        # Re-create a new array of levels, now including all 12 months
        levels = [count.index.levels[0].values, count.index.levels[1].values, range(12)]
        new_index = pd.MultiIndex.from_product(levels, names=count.index.names)

        # Reindex the count and fill empty values with zero (NaN by default)
        count = count.reindex(new_index, fill_value=0)
        print(count)


        Printing the result, I get something like this (only showing the first entry for sally/s_A):



        id     code  month
        sally s_A 0 4
        1 0
        2 0
        3 0
        4 0
        5 0
        6 0
        7 0
        8 0
        9 0
        10 0
        11 0





        share|improve this answer











        $endgroup$



        Based on the short example DataFrame you provided, this block of code will include all of the months. It is based on using the Series.reindex method and creating a new MultiIndex with the additional values for the months:



        import pandas as pd
        # Load example data into DataFrame
        df = pd.read_table("categorical_data.txt", delim_whitespace=True)

        # Transform to a count
        count = df.groupby(['id', 'code', 'month']).month.count()

        # Re-create a new array of levels, now including all 12 months
        levels = [count.index.levels[0].values, count.index.levels[1].values, range(12)]
        new_index = pd.MultiIndex.from_product(levels, names=count.index.names)

        # Reindex the count and fill empty values with zero (NaN by default)
        count = count.reindex(new_index, fill_value=0)
        print(count)


        Printing the result, I get something like this (only showing the first entry for sally/s_A):



        id     code  month
        sally s_A 0 4
        1 0
        2 0
        3 0
        4 0
        5 0
        6 0
        7 0
        8 0
        9 0
        10 0
        11 0






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 15 mins ago









        Tomasz Gandor

        1033




        1033










        answered Nov 8 '17 at 22:43









        ffuffu

        364




        364






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f18154%2fhow-to-count-categorical-values-including-zero-occurrence%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Aikido

            Tivadar Csontváry Kosztka

            Metroo de Marsejlo