Loading own train data and labels in dataloader using pytorch?












1












$begingroup$


I have x_data and labels separately. How can I combine and load them in the model using torch.utils.data.DataLoader?



I have a dataset that I created and the training data has 20k samples and the labels are also separate. Lets say I want to load a dataset in the model, shuffle each time and use the batch size that I prefer. The Dataloader function does that. How can I combine and put them in the function so that I can train it in the model in pytorch?










share|improve this question











$endgroup$












  • $begingroup$
    See discussion on StackOverflow here: stackoverflow.com/questions/41924453/…
    $endgroup$
    – joha
    9 hours ago
















1












$begingroup$


I have x_data and labels separately. How can I combine and load them in the model using torch.utils.data.DataLoader?



I have a dataset that I created and the training data has 20k samples and the labels are also separate. Lets say I want to load a dataset in the model, shuffle each time and use the batch size that I prefer. The Dataloader function does that. How can I combine and put them in the function so that I can train it in the model in pytorch?










share|improve this question











$endgroup$












  • $begingroup$
    See discussion on StackOverflow here: stackoverflow.com/questions/41924453/…
    $endgroup$
    – joha
    9 hours ago














1












1








1





$begingroup$


I have x_data and labels separately. How can I combine and load them in the model using torch.utils.data.DataLoader?



I have a dataset that I created and the training data has 20k samples and the labels are also separate. Lets say I want to load a dataset in the model, shuffle each time and use the batch size that I prefer. The Dataloader function does that. How can I combine and put them in the function so that I can train it in the model in pytorch?










share|improve this question











$endgroup$




I have x_data and labels separately. How can I combine and load them in the model using torch.utils.data.DataLoader?



I have a dataset that I created and the training data has 20k samples and the labels are also separate. Lets say I want to load a dataset in the model, shuffle each time and use the batch size that I prefer. The Dataloader function does that. How can I combine and put them in the function so that I can train it in the model in pytorch?







python dataset preprocessing pytorch






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 21 mins ago









Community

1




1










asked Feb 20 at 21:13









AmarnathAmarnath

112




112












  • $begingroup$
    See discussion on StackOverflow here: stackoverflow.com/questions/41924453/…
    $endgroup$
    – joha
    9 hours ago


















  • $begingroup$
    See discussion on StackOverflow here: stackoverflow.com/questions/41924453/…
    $endgroup$
    – joha
    9 hours ago
















$begingroup$
See discussion on StackOverflow here: stackoverflow.com/questions/41924453/…
$endgroup$
– joha
9 hours ago




$begingroup$
See discussion on StackOverflow here: stackoverflow.com/questions/41924453/…
$endgroup$
– joha
9 hours ago










2 Answers
2






active

oldest

votes


















0












$begingroup$

Assuming both of x_data and labels are lists or numpy arrays,



train_data = 
for i in range(len(x_data)):
train_data.append([x_data[i], labels[i]])

trainloader = torch.utils.data.DataLoader(train_data, shuffle=True, batch_size=100)
i1, l1 = next(iter(trainloader))
print(i1.shape)





share|improve this answer








New contributor




Ashutosh Mishra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$





















    0












    $begingroup$

    I think the standard way is to create a Dataset class object from the arrays and pass the Dataset object to the DataLoader.



    One solution is to inherit from the Dataset class and define a custom class that implements __len__() and __get__(), where you pass X and y to the __init__(self,X,y).



    For your simple case with two arrays and without the necessity for a special __get__() function beyond taking the values in row i, you can also use transform the arrays into Tensor objects and pass them to TensorDataset.



    Run the following code for a self-contained example.



    # Create a dataset like the one you describe
    from sklearn.datasets import make_classification
    X,y = make_classification()

    # Load necessary Pytorch packages
    from torch.utils.data import DataLoader, TensorDataset
    from torch import Tensor

    # Create dataset from several tensors with matching first dimension
    # Samples will be drawn from the first dimension (rows)
    dataset = TensorDataset( Tensor(X), Tensor(y) )

    # Create a data loader from the dataset
    # Type of sampling and batch size are specified at this step
    loader = DataLoader(dataset, batch_size= 3)

    # Quick test
    next(iter(loader))





    share|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "557"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45916%2floading-own-train-data-and-labels-in-dataloader-using-pytorch%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0












      $begingroup$

      Assuming both of x_data and labels are lists or numpy arrays,



      train_data = 
      for i in range(len(x_data)):
      train_data.append([x_data[i], labels[i]])

      trainloader = torch.utils.data.DataLoader(train_data, shuffle=True, batch_size=100)
      i1, l1 = next(iter(trainloader))
      print(i1.shape)





      share|improve this answer








      New contributor




      Ashutosh Mishra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$


















        0












        $begingroup$

        Assuming both of x_data and labels are lists or numpy arrays,



        train_data = 
        for i in range(len(x_data)):
        train_data.append([x_data[i], labels[i]])

        trainloader = torch.utils.data.DataLoader(train_data, shuffle=True, batch_size=100)
        i1, l1 = next(iter(trainloader))
        print(i1.shape)





        share|improve this answer








        New contributor




        Ashutosh Mishra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$
















          0












          0








          0





          $begingroup$

          Assuming both of x_data and labels are lists or numpy arrays,



          train_data = 
          for i in range(len(x_data)):
          train_data.append([x_data[i], labels[i]])

          trainloader = torch.utils.data.DataLoader(train_data, shuffle=True, batch_size=100)
          i1, l1 = next(iter(trainloader))
          print(i1.shape)





          share|improve this answer








          New contributor




          Ashutosh Mishra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$



          Assuming both of x_data and labels are lists or numpy arrays,



          train_data = 
          for i in range(len(x_data)):
          train_data.append([x_data[i], labels[i]])

          trainloader = torch.utils.data.DataLoader(train_data, shuffle=True, batch_size=100)
          i1, l1 = next(iter(trainloader))
          print(i1.shape)






          share|improve this answer








          New contributor




          Ashutosh Mishra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer






          New contributor




          Ashutosh Mishra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered 12 hours ago









          Ashutosh MishraAshutosh Mishra

          11




          11




          New contributor




          Ashutosh Mishra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          Ashutosh Mishra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          Ashutosh Mishra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.























              0












              $begingroup$

              I think the standard way is to create a Dataset class object from the arrays and pass the Dataset object to the DataLoader.



              One solution is to inherit from the Dataset class and define a custom class that implements __len__() and __get__(), where you pass X and y to the __init__(self,X,y).



              For your simple case with two arrays and without the necessity for a special __get__() function beyond taking the values in row i, you can also use transform the arrays into Tensor objects and pass them to TensorDataset.



              Run the following code for a self-contained example.



              # Create a dataset like the one you describe
              from sklearn.datasets import make_classification
              X,y = make_classification()

              # Load necessary Pytorch packages
              from torch.utils.data import DataLoader, TensorDataset
              from torch import Tensor

              # Create dataset from several tensors with matching first dimension
              # Samples will be drawn from the first dimension (rows)
              dataset = TensorDataset( Tensor(X), Tensor(y) )

              # Create a data loader from the dataset
              # Type of sampling and batch size are specified at this step
              loader = DataLoader(dataset, batch_size= 3)

              # Quick test
              next(iter(loader))





              share|improve this answer









              $endgroup$


















                0












                $begingroup$

                I think the standard way is to create a Dataset class object from the arrays and pass the Dataset object to the DataLoader.



                One solution is to inherit from the Dataset class and define a custom class that implements __len__() and __get__(), where you pass X and y to the __init__(self,X,y).



                For your simple case with two arrays and without the necessity for a special __get__() function beyond taking the values in row i, you can also use transform the arrays into Tensor objects and pass them to TensorDataset.



                Run the following code for a self-contained example.



                # Create a dataset like the one you describe
                from sklearn.datasets import make_classification
                X,y = make_classification()

                # Load necessary Pytorch packages
                from torch.utils.data import DataLoader, TensorDataset
                from torch import Tensor

                # Create dataset from several tensors with matching first dimension
                # Samples will be drawn from the first dimension (rows)
                dataset = TensorDataset( Tensor(X), Tensor(y) )

                # Create a data loader from the dataset
                # Type of sampling and batch size are specified at this step
                loader = DataLoader(dataset, batch_size= 3)

                # Quick test
                next(iter(loader))





                share|improve this answer









                $endgroup$
















                  0












                  0








                  0





                  $begingroup$

                  I think the standard way is to create a Dataset class object from the arrays and pass the Dataset object to the DataLoader.



                  One solution is to inherit from the Dataset class and define a custom class that implements __len__() and __get__(), where you pass X and y to the __init__(self,X,y).



                  For your simple case with two arrays and without the necessity for a special __get__() function beyond taking the values in row i, you can also use transform the arrays into Tensor objects and pass them to TensorDataset.



                  Run the following code for a self-contained example.



                  # Create a dataset like the one you describe
                  from sklearn.datasets import make_classification
                  X,y = make_classification()

                  # Load necessary Pytorch packages
                  from torch.utils.data import DataLoader, TensorDataset
                  from torch import Tensor

                  # Create dataset from several tensors with matching first dimension
                  # Samples will be drawn from the first dimension (rows)
                  dataset = TensorDataset( Tensor(X), Tensor(y) )

                  # Create a data loader from the dataset
                  # Type of sampling and batch size are specified at this step
                  loader = DataLoader(dataset, batch_size= 3)

                  # Quick test
                  next(iter(loader))





                  share|improve this answer









                  $endgroup$



                  I think the standard way is to create a Dataset class object from the arrays and pass the Dataset object to the DataLoader.



                  One solution is to inherit from the Dataset class and define a custom class that implements __len__() and __get__(), where you pass X and y to the __init__(self,X,y).



                  For your simple case with two arrays and without the necessity for a special __get__() function beyond taking the values in row i, you can also use transform the arrays into Tensor objects and pass them to TensorDataset.



                  Run the following code for a self-contained example.



                  # Create a dataset like the one you describe
                  from sklearn.datasets import make_classification
                  X,y = make_classification()

                  # Load necessary Pytorch packages
                  from torch.utils.data import DataLoader, TensorDataset
                  from torch import Tensor

                  # Create dataset from several tensors with matching first dimension
                  # Samples will be drawn from the first dimension (rows)
                  dataset = TensorDataset( Tensor(X), Tensor(y) )

                  # Create a data loader from the dataset
                  # Type of sampling and batch size are specified at this step
                  loader = DataLoader(dataset, batch_size= 3)

                  # Quick test
                  next(iter(loader))






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 9 hours ago









                  johajoha

                  1012




                  1012






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45916%2floading-own-train-data-and-labels-in-dataloader-using-pytorch%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Ponta tanko

                      Tantalo (mitologio)

                      Erzsébet Schaár