http://archive.pkmital.com

Neural Audio Decollage – Whale Sounds

Parag K Mital — Sun, 30 Jan 2022 02:58:15 +0000

I put some music together this winter which you can hear on soundcloud:

pkmital · Whale Sounds (Burial & Four Tet Decollage of John Tejada)

It features both some neural audio decollage and the voice of Jessa Carter. The decollage works with the material of Four Tet and Burial’s Moth and recomposes the track by John Tejada – Farther and Fainter.… Continue reading...

Talking Hypersurfaces, Computational Arts, NFTs, and more on R Mochi Podcast Episode

Parag K Mital — Sat, 04 Sep 2021 16:32:57 +0000

I discuss Hypersurfaces, computational arts, my PhD work on audiovisual scene synthesis, perception, Machine Learning, GANs, NFTs, crypto, simulated realities, video games, the merger of it all, and journeys with Mochi on the R Mochi Poqcast:

UCLA Course on “Cultural Appropriation with Machine Learning”

Parag K Mital — Wed, 01 Sep 2021 04:37:33 +0000

During the Fall of 2020, I had the honor of teaching a new course at the University of California in Los Angeles (UCLA) Department of Design Media Arts (DMA) entitled “Cultural Appropriation with Machine Learning”. This provocatively titled course came together after wrestling with many questions I had that year in the wake of the pandemic, black civil rights movements, and a crushed economy.

Rather than teach a course that focuses purely on the “how” of machine learning, like Creative Applications of Deep Learning does, I wanted to also include a critical component to guide students through the questions they should be asking as they learn and employ these tools. I also wanted students to understand how these tools and algorithms came to be in today’s society, so that they knew better what questions to ask when they were using them. It became clear early on that cultural appropriation was a central theme across most generative arts practices. I say this because machine learning requires large amounts of data which tend to come from existing corpora of creative content, such as flickr archives, or instagram collections. What does it mean when an algorithm owned by Google or Microsoft is capable … Continue reading...

NFT Pixels

Parag K Mital — Tue, 31 Aug 2021 05:17:30 +0000

NFTs, or non-fungible tokens are data stored on the blockchain that certifies and records transactions associated with a digital asset. Effectively, they are certificates of authenticity of digital assets traded on blockchain-based exchanges that have spurred an economy of scarcity for digital assets such as images, GIFs, sound clips, and videos. In the past, artists producing digital content may have also considered selling their work in places such as instagram, behance, deviant art, or via their own websites. Though, it is unlikely that this content would have been actually bought in any form. This is likely because the content is already accessible to all. On these platforms, if we are to call it art, it could be akin to a digital version of public art in some ways, for all to see and consume. Or more likely, we would consider it either documentation or advertising and marketing of the artist’s work.

The entire point of NFTs, and the reason they are so marketed as being valuable, is because we believe there is digital scarcity of the content being traded and sold.

Prior to NFT marketplaces for digital art, let’s say that someone were to decide that they wanted ownership of … Continue reading...

Memory Mosaic iOS – Generative Audio Mashup App

Parag K Mital — Sun, 23 Aug 2015 06:55:54 +0000

I had a chance to incorporate some udpates into Memory Mosaic, an iOS app I started developing during my PhD in Audiovisual Scene Synthesis. The app organizes sound in real-time and clusters them based on similarity. Using the microphone on the device, or an iTunes song, any onset triggers a new audio segment to be created and stored in a database. The example video below shows how this works for Do Make Say Think’s song, Minim and The Landlord is Dead:

Memory Mosaic iOS from Parag K Mital on Vimeo.

Here’s an example of using it with live instruments

Memory Mosaic – Technical Demo (iOS App) from Parag K Mital on Vimeo.

The app also works with AudioBus, meaning you can use it with other apps too, adding effects chains, or sampling from another app’s output. Available on the iOS App Store: https://itunes.apple.com/us/app/memory-mosaic/id475759669?mt=8… Continue reading...

C.A.R.P.E. version 0.1.1 release

Parag K Mital — Mon, 09 Feb 2015 20:01:12 +0000

I’ve updated C.A.R.P.E., a graphical tool for visualizing eye-movements and processing audio/video, to include a graphical timeline (thanks to ofxTimeline by James George/YCAM), support for audio playback/scrubbing (using pkmAudioWaveform), audio saving, and various bug fixes. This release has changed some parameters of the XML file and added others. Please refer to this example XML file for how to setup your own data:

See my previous post for information on the initial release.

Please fill out the following form if you’d like to use C.A.R.P.E..:

Loading…… Continue reading...

Toolkit for Visualizing Eye-Movements and Processing Audio/Video

Parag K Mital — Fri, 06 Feb 2015 23:53:18 +0000

Original video still without eye-movements and heatmap overlay copyright Dropping Knowledge Video Republic.

From 2008 – 2010, I worked on the Dynamic Images and Eye-Movements (D.I.E.M.) project, led by John Henderson, with Tim Smith and Robin Hill. We worked together to collect nearly 200 participants eye-movements on nearly 100 short films from 30 seconds to 5 minutes in length. The database is freely available and covers a wide range of film styles form advertisements, to movie and music trailers, to news clips. During my time on the project, I developed an open source toolkit, C.A.R.P.E. to complement D.I.E.M., or Computational Algorithmic Representation and Processing of Eye-movements (Tim’s idea!), for visualizing and processing the data we collected, and used it for writing up a journal paper describing a strong correlation between tightly clustered eye-movements and the motion in a scene. We also output visualizations of our entire corpus on our Vimeo channel. The project came to a halt and so did the visualization software. I’ve since picked up the ball and re-written it entirely from the ground up.

The image below shows how you can represent the movie, the motion in the scene of the movie (represented in … Continue reading...

Handwriting Recognition with LSTMs and ofxCaffe

Parag K Mital — Fri, 06 Feb 2015 04:41:49 +0000

Long Short Term Memory (LSTM) is a Recurrent Neural Network (RNN) architecture designed to better model temporal sequences (e.g. audio, sentences, video) and long range dependencies than conventional RNNs [1]. There is a lot of excitement in the machine learning communities with LSTMs (and Deep Minds’s counterpart, “Neural Turing Machines” [2], or Facebook’s, “Memory Networks” [3]) as they overcome a fundamental limitation to conventional RNNs and are able to achieve state-of-the-art benchmark performances on a number of tasks [4,5]:

Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
Medium vocabulary speech recognition (Geiger et al., Interspeech 2014)
English to French translation (Sutskever et al., Google, NIPS 2014)
Audio onset detection (Marchi et al., ICASSP 2014)
Social signal classification (Brueckner & Schulter, ICASSP 2014)
Arabic handwriting recognition (Bluche et al., DAS 2014)
TIMIT phoneme recognition (Graves et al., ICASSP 2013)
Optical character recognition (Breuel et al., ICDAR 2013)
Image caption generation (Vinyals et al., Google, 2014)
Video to textual description (Donahue et al., 2014)

The current dynamic state … Continue reading...

Real-Time Object Recognition with ofxCaffe

Parag K Mital — Sun, 04 Jan 2015 03:53:48 +0000

I’ve spent a little time with Caffe over the holiday break to try and understand how it might work in the context of real-time visualization/object recognition in more natural scenes/videos. Right now, I’ve implemented the following Deep Convolution Networks using the 1280×720 resolution webcamera on my 2014 Macbook Pro:

VGG ILSVRC 2014 (16 Layers): 1000 ImageNet Object Categories (~ 7 FPS)
VGG ILSVRC 2014 (19 Layers): 1000 Object Categories (~5 FPS)
BVLC GoogLeNet: 1000 Object Categories (~ 24 FPS)
Region-CNN ILSVRC 2013: 200 Object Categories (~ 22 FPS)
BVLC Reference CaffeNet: 1000 Object Categories (~ 18 FPS)
BVLC Reference CaffeNet (Fully Convolutional) 8×8: 1000 Object Categories (~12 FPS)
BVLC Reference CaffeNet (Fully Convolutional) 34×17: 1000 Object Categories (~1 FPS)
MIT Places-CNN Hybrid (Places + ImageNet): 971 Object Categories + 200 Scene Categories = 1171 Categories (~ 12 FPS)

The above image depicts the output from an 8×8 grid detection showing brighter regions as higher probabilities of the class “snorkel” (automatically selected by the network from 1000 possible classes as the highest probability).

So far I have spent some time understanding how Caffe keeps each layer’s data during a forward/backward pass, and how the deeper layers could be “visualized” in a … Continue reading...

Extracting Automatically Labeled Volumetric ROIs from MRI

Parag K Mital — Thu, 22 May 2014 22:29:50 +0000

Performing a region of interest analysis on MRI requires knowing where the regions are in your subject data. Typically, this has been done using hand-drawn masks in a 3d viewer. However, recent research has made the process mostly automatic and the open-source community has implemented everything you will need to automatically create labeled volumetric regions of interest [1-3]. With FreeSurfer 5.3, we have the option of performing cortical parcellation using 4 different atlases:

Destrieux atlas: aparc.a2009s
Desikan-Killiany atlas: aparc
Mindboggle: aparc.DKTatlas40
Brodman areas: BA and BA.thresh

We’ll first use freesurfer’s recon-all tool to perform a cortical reconstruction of our anatomical scans. Download freesurfer and register your copy. You’ll be sent an e-mail with a license. Follow the instructions and create the license file “.license” inside your freesurfer home directory (check the environment variable, FREESURFER_HOME, e.g., “$ echo $FREESURFER_HOME"). Then run the script, “$FREESURFER_HOME/FreeSurferEnv.sh” to setup necessary paths.

Next make sure you have set the environment variable for SUBJECTS_DIR to where you’d like your analysis to go (e.g., “$ export SUBJECTS_DIR=/some/directory“). For our example, we’ll keep this to a directory called “freesurfer” in our home directory, “~/”. Each subject we analyze will have its own folder insider … Continue reading...