• Login
    View Item 
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Letters and Science, University of Wisconsin–Madison
    • Department of Computer Sciences, UW-Madison
    • CS Theses and Dissertations
    • View Item
    •   MINDS@UW Home
    • MINDS@UW Madison
    • College of Letters and Science, University of Wisconsin–Madison
    • Department of Computer Sciences, UW-Madison
    • CS Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Understanding Representation Learning Paradigms with Applications to Low Resource Text Classification

    Thumbnail
    File(s)
    TR1862 Siddhant Garg.pdf (2.063Mb)
    Date
    2020-05-21
    Author
    Garg, Siddhant
    Metadata
    Show full item record
    Abstract
    A crucial component of modern machine learning systems is learning input representations which can be used for prediction tasks. The expensive cost of labelling and easy availability of unlabelled data has led to the popularity of representation learning techniques on unlabelled data. In thesis we present two ideas in the domain of representation learning. Firstly, we show that self-supervised representation learning approaches like variational auto-encoders and masked self-supervision can be viewed as imposing a regularization on the representation via a learnable function. We present a discriminative theoretical framework for analysing the underlying assumptions and sample complexities of representation learning via such functional regularizations. Our results show that functional regularization on unlabelled data can prune the hypothesis space and reduce the sample complexity of labelled data. We then consider the domain of NLP where fine-tuning pre-trained sentence embedding models like BERT has become the default transfer learning approach. We propose an alternative transfer learning approach called SimpleTran for low resource text classification characterized by small sized datasets. We train a simple sentence embedding model on the target dataset, combine its output embedding with that of the pre-trained model via concatenation or dimension reduction, and finally train a classifier on the combined embedding either by fixing the embedding model weights or training the classifier and the embedding models end-to-end. With end-to-end training, SimpleTran outperforms fine-tuning on small and medium sized datasets with negligible computational overhead. We provide theoretical analysis for our method, identifying conditions under which it has advantages.
    Subject
    representation learning
    self supervised learning
    self-supervised learning
    transfer learning
    sentence embeddings
    Permanent Link
    http://digital.library.wisc.edu/1793/80196
    Type
    Technical Report
    Part of
    • CS Theses and Dissertations

    Related items

    Showing items related by title, author, creator and subject.

    • Comparing the Transfer of Learning from In-Person Learning to Blended Learning in a Healthcare Environment 

      Betz, Eric C. (University of Wisconsin--Stout, 2021)
      The present study investigated the transfer of learning that occurred when the new-hire clinical course was moved from an in-person learning approach to a blended learning approach. Two surveys were sent out to two different ...
    • Leadership living/learning communities: understanding the history, growth and need for sustainable leadership living/learning communities to be applied to the leadership living/learning community at a Midwest university 

      King, David L. (2015)
    • Mobile learning - learning content, learner styles, and mobility - a differentiated examination on the advantages and disadvantages of mobile learning 

      Morana, Stefan (2010)

    Contact Us | Send Feedback
     

     

    Browse

    All of MINDS@UWCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    Contact Us | Send Feedback