machine learning - http://archive.pkmital.com

UCLA Course on “Cultural Appropriation with Machine Learning”

Parag K Mital — Wed, 01 Sep 2021 04:37:33 +0000

During the Fall of 2020, I had the honor of teaching a new course at the University of California in Los Angeles (UCLA) Department of Design Media Arts (DMA) entitled “Cultural Appropriation with Machine Learning”. This provocatively titled course came together after wrestling with many questions I had that year in the wake of the pandemic, black civil rights movements, and a crushed economy.

Rather than teach a course that focuses purely on the “how” of machine learning, like Creative Applications of Deep Learning does, I wanted to also include a critical component to guide students through the questions they should be asking as they learn and employ these tools. I also wanted students to understand how these tools and algorithms came to be in today’s society, so that they knew better what questions to ask when they were using them. It became clear early on that cultural appropriation was a central theme across most generative arts practices. I say this because machine learning requires large amounts of data which tend to come from existing corpora of creative content, such as flickr archives, or instagram collections. What does it mean when an algorithm owned by Google or Microsoft is capable … Continue reading...

The post UCLA Course on “Cultural Appropriation with Machine Learning” first appeared on http://archive.pkmital.com.

Memory Mosaic iOS – Generative Audio Mashup App

Parag K Mital — Sun, 23 Aug 2015 06:55:54 +0000

I had a chance to incorporate some udpates into Memory Mosaic, an iOS app I started developing during my PhD in Audiovisual Scene Synthesis. The app organizes sound in real-time and clusters them based on similarity. Using the microphone on the device, or an iTunes song, any onset triggers a new audio segment to be created and stored in a database. The example video below shows how this works for Do Make Say Think’s song, Minim and The Landlord is Dead:

Memory Mosaic iOS from Parag K Mital on Vimeo.

Here’s an example of using it with live instruments

Memory Mosaic – Technical Demo (iOS App) from Parag K Mital on Vimeo.

The app also works with AudioBus, meaning you can use it with other apps too, adding effects chains, or sampling from another app’s output. Available on the iOS App Store: https://itunes.apple.com/us/app/memory-mosaic/id475759669?mt=8… Continue reading...

The post Memory Mosaic iOS – Generative Audio Mashup App first appeared on http://archive.pkmital.com.

Handwriting Recognition with LSTMs and ofxCaffe

Parag K Mital — Fri, 06 Feb 2015 04:41:49 +0000

Long Short Term Memory (LSTM) is a Recurrent Neural Network (RNN) architecture designed to better model temporal sequences (e.g. audio, sentences, video) and long range dependencies than conventional RNNs [1]. There is a lot of excitement in the machine learning communities with LSTMs (and Deep Minds’s counterpart, “Neural Turing Machines” [2], or Facebook’s, “Memory Networks” [3]) as they overcome a fundamental limitation to conventional RNNs and are able to achieve state-of-the-art benchmark performances on a number of tasks [4,5]:

Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
Medium vocabulary speech recognition (Geiger et al., Interspeech 2014)
English to French translation (Sutskever et al., Google, NIPS 2014)
Audio onset detection (Marchi et al., ICASSP 2014)
Social signal classification (Brueckner & Schulter, ICASSP 2014)
Arabic handwriting recognition (Bluche et al., DAS 2014)
TIMIT phoneme recognition (Graves et al., ICASSP 2013)
Optical character recognition (Breuel et al., ICDAR 2013)
Image caption generation (Vinyals et al., Google, 2014)
Video to textual description (Donahue et al., 2014)

The current dynamic state … Continue reading...

The post Handwriting Recognition with LSTMs and ofxCaffe first appeared on http://archive.pkmital.com.

Real-Time Object Recognition with ofxCaffe

Parag K Mital — Sun, 04 Jan 2015 03:53:48 +0000

I’ve spent a little time with Caffe over the holiday break to try and understand how it might work in the context of real-time visualization/object recognition in more natural scenes/videos. Right now, I’ve implemented the following Deep Convolution Networks using the 1280×720 resolution webcamera on my 2014 Macbook Pro:

VGG ILSVRC 2014 (16 Layers): 1000 ImageNet Object Categories (~ 7 FPS)
VGG ILSVRC 2014 (19 Layers): 1000 Object Categories (~5 FPS)
BVLC GoogLeNet: 1000 Object Categories (~ 24 FPS)
Region-CNN ILSVRC 2013: 200 Object Categories (~ 22 FPS)
BVLC Reference CaffeNet: 1000 Object Categories (~ 18 FPS)
BVLC Reference CaffeNet (Fully Convolutional) 8×8: 1000 Object Categories (~12 FPS)
BVLC Reference CaffeNet (Fully Convolutional) 34×17: 1000 Object Categories (~1 FPS)
MIT Places-CNN Hybrid (Places + ImageNet): 971 Object Categories + 200 Scene Categories = 1171 Categories (~ 12 FPS)

The above image depicts the output from an 8×8 grid detection showing brighter regions as higher probabilities of the class “snorkel” (automatically selected by the network from 1000 possible classes as the highest probability).

So far I have spent some time understanding how Caffe keeps each layer’s data during a forward/backward pass, and how the deeper layers could be “visualized” in a … Continue reading...

The post Real-Time Object Recognition with ofxCaffe first appeared on http://archive.pkmital.com.