• Login
    View Item 
    •   MINDS@UW Home
    • MINDS@UW Whitewater
    • Master's Theses--UW-Whitewater
    • View Item
    •   MINDS@UW Home
    • MINDS@UW Whitewater
    • Master's Theses--UW-Whitewater
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Audio generation from a single sample using deep convolutional generative adversarial networks

    Thumbnail
    File(s)
    Pfantz_Thesis_Final.pdf (4.820Mb)
    Date
    2021-12
    Author
    Pfantz, Levi
    Publisher
    University of Wisconsin - Whitewater
    Advisor(s)
    Gunawardena, Athula
    Mukherjee, Lopamudra
    Zhou, Jiazhen
    Metadata
    Show full item record
    Abstract
    Training neural networks require sizeable datasets for meaningful output. It is difficult to acquire large datasets for many types of data. This is especially challenging for individuals and small organizations. We have taken SinGAN [1], a model that works to address those issues in the image domain, and extended it to work in the audio domain. Our new model, called AudioSinGAN, uses deep convolutional generative adversarial networks (DCGAN) trained on a single audio sample to generate new, unique, audio samples. Like SinGAN, AudioSinGAN uses a pyramid of unique GANs, each responsible for learning and generating different levels of detail. Our system is capable of generating unique audio with clear features from the single input audio clip. We explore and discuss the realities of converting and tuning a generative adversarial network (GAN) built for images into one built for audio and our results. We also present a database of audio clips generated by AudioSinGAN and use Singular Value Decomposition to analyze the dataset and confirm that our model successfully generates audio belonging to unique classes. We also learn that a challenge facing our system is audio that contains multiple audio sources overlapping each other. Finally, we discuss methods to address this issue including splitting audio into frequency band before processing.
    Subject
    Machine learning
    Neural networks (Computer science)
    Audio frequency
    Permanent Link
    http://digital.library.wisc.edu/1793/82595
    Type
    Thesis
    Description
    This file was last viewed in Adobe Acrobat Pro.
    Part of
    • Master's Theses--UW-Whitewater

    Contact Us | Send Feedback
     

     

    Browse

    All of MINDS@UWCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    Contact Us | Send Feedback