Blog | http://archive.pkmital.com

Latest Entries

Neural Audio Decollage – Whale Sounds

#2109
30/01/22

burial, collage, decollage, four tet, jessa carter, john tejada, music, neural audio, soundcloud, synthesis, whale sounds

I put some music together this winter which you can hear on soundcloud:

pkmital · Whale Sounds (Burial & Four Tet Decollage of John Tejada)

It features both some neural audio decollage and the voice of Jessa Carter. The decollage works with the material of Four Tet and Burial’s Moth and recomposes the track by John Tejada – Farther and Fainter.… Continue reading...

Talking Hypersurfaces, Computational Arts, NFTs, and more on R Mochi Podcast Episode

#2104
04/09/21

I discuss Hypersurfaces, computational arts, my PhD work on audiovisual scene synthesis, perception, Machine Learning, GANs, NFTs, crypto, simulated realities, video games, the merger of it all, and journeys with Mochi on the R Mochi Poqcast:

UCLA Course on “Cultural Appropriation with Machine Learning”

#2093
01/09/21

collage, critical theory, cultural appropriation, deep learning, digital media, dma, image generation, machine learning, sound generation, text generation, ucla

During the Fall of 2020, I had the honor of teaching a new course at the University of California in Los Angeles (UCLA) Department of Design Media Arts (DMA) entitled “Cultural Appropriation with Machine Learning”. This provocatively titled course came together after wrestling with many questions I had that year in the wake of the pandemic, black civil rights movements, and a crushed economy.

Rather than teach a course that focuses purely on the “how” of machine learning, like Creative Applications of Deep Learning does, I wanted to also include a critical component to guide students through the questions they should be asking as they learn and employ these tools. I also wanted students to understand how these tools and algorithms came to be in today’s society, so that they knew better what questions to ask when they were using them. It became clear early on that cultural appropriation was a central theme across most generative arts practices. I say this because machine learning requires large amounts of data which tend to come from existing corpora of creative content, such as flickr archives, or instagram collections. What does it mean when an algorithm owned by Google or Microsoft is capable … Continue reading...

NFT Pixels

#2058
31/08/21

art, blockchain, copyright, crypto, digital media, economy, hen, hicetnunc, nft, opensea, pixels, scarcity, transactions

NFTs, or non-fungible tokens are data stored on the blockchain that certifies and records transactions associated with a digital asset. Effectively, they are certificates of authenticity of digital assets traded on blockchain-based exchanges that have spurred an economy of scarcity for digital assets such as images, GIFs, sound clips, and videos. In the past, artists producing digital content may have also considered selling their work in places such as instagram, behance, deviant art, or via their own websites. Though, it is unlikely that this content would have been actually bought in any form. This is likely because the content is already accessible to all. On these platforms, if we are to call it art, it could be akin to a digital version of public art in some ways, for all to see and consume. Or more likely, we would consider it either documentation or advertising and marketing of the artist’s work.

The entire point of NFTs, and the reason they are so marketed as being valuable, is because we believe there is digital scarcity of the content being traded and sold.

Prior to NFT marketplaces for digital art, let’s say that someone were to decide that they wanted ownership of … Continue reading...

Memory Mosaic iOS – Generative Audio Mashup App

#1977
23/08/15

audio mashup, concatenative synthesis, generative music, generative sampler, granular synthesis, interactive music sampler, iOS app, john oswald, machine learning, machine learning music, mds, musical browser, pca, svd

I had a chance to incorporate some udpates into Memory Mosaic, an iOS app I started developing during my PhD in Audiovisual Scene Synthesis. The app organizes sound in real-time and clusters them based on similarity. Using the microphone on the device, or an iTunes song, any onset triggers a new audio segment to be created and stored in a database. The example video below shows how this works for Do Make Say Think’s song, Minim and The Landlord is Dead:

Memory Mosaic iOS from Parag K Mital on Vimeo.

Here’s an example of using it with live instruments

Memory Mosaic – Technical Demo (iOS App) from Parag K Mital on Vimeo.

The app also works with AudioBus, meaning you can use it with other apps too, adding effects chains, or sampling from another app’s output. Available on the iOS App Store: https://itunes.apple.com/us/app/memory-mosaic/id475759669?mt=8… Continue reading...

C.A.R.P.E. version 0.1.1 release

#1887
09/02/15

carpe, software, update, visualization

I’ve updated C.A.R.P.E., a graphical tool for visualizing eye-movements and processing audio/video, to include a graphical timeline (thanks to ofxTimeline by James George/YCAM), support for audio playback/scrubbing (using pkmAudioWaveform), audio saving, and various bug fixes. This release has changed some parameters of the XML file and added others. Please refer to this example XML file for how to setup your own data:

See my previous post for information on the initial release.

Please fill out the following form if you’d like to use C.A.R.P.E..:

Loading…… Continue reading...

Toolkit for Visualizing Eye-Movements and Processing Audio/Video

#1852
06/02/15

carpe, diem project, eye movements, heatmaps, optical flow, saliency, visualization

Original video still without eye-movements and heatmap overlay copyright Dropping Knowledge Video Republic.

From 2008 – 2010, I worked on the Dynamic Images and Eye-Movements (D.I.E.M.) project, led by John Henderson, with Tim Smith and Robin Hill. We worked together to collect nearly 200 participants eye-movements on nearly 100 short films from 30 seconds to 5 minutes in length. The database is freely available and covers a wide range of film styles form advertisements, to movie and music trailers, to news clips. During my time on the project, I developed an open source toolkit, C.A.R.P.E. to complement D.I.E.M., or Computational Algorithmic Representation and Processing of Eye-movements (Tim’s idea!), for visualizing and processing the data we collected, and used it for writing up a journal paper describing a strong correlation between tightly clustered eye-movements and the motion in a scene. We also output visualizations of our entire corpus on our Vimeo channel. The project came to a halt and so did the visualization software. I’ve since picked up the ball and re-written it entirely from the ground up.

The image below shows how you can represent the movie, the motion in the scene of the movie (represented in … Continue reading...

Handwriting Recognition with LSTMs and ofxCaffe

#1836
06/02/15

caffe, deep learning, handwriting, learning, lstm, machine learning, open-source, recognition, recurrent neural network, training

Long Short Term Memory (LSTM) is a Recurrent Neural Network (RNN) architecture designed to better model temporal sequences (e.g. audio, sentences, video) and long range dependencies than conventional RNNs [1]. There is a lot of excitement in the machine learning communities with LSTMs (and Deep Minds’s counterpart, “Neural Turing Machines” [2], or Facebook’s, “Memory Networks” [3]) as they overcome a fundamental limitation to conventional RNNs and are able to achieve state-of-the-art benchmark performances on a number of tasks [4,5]:

Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
Medium vocabulary speech recognition (Geiger et al., Interspeech 2014)
English to French translation (Sutskever et al., Google, NIPS 2014)
Audio onset detection (Marchi et al., ICASSP 2014)
Social signal classification (Brueckner & Schulter, ICASSP 2014)
Arabic handwriting recognition (Bluche et al., DAS 2014)
TIMIT phoneme recognition (Graves et al., ICASSP 2013)
Optical character recognition (Breuel et al., ICDAR 2013)
Image caption generation (Vinyals et al., Google, 2014)
Video to textual description (Donahue et al., 2014)

The current dynamic state … Continue reading...

Real-Time Object Recognition with ofxCaffe

#1764
04/01/15

caffe, cnn, deep learning, machine learning, neuron, object recognition, visualization

I’ve spent a little time with Caffe over the holiday break to try and understand how it might work in the context of real-time visualization/object recognition in more natural scenes/videos. Right now, I’ve implemented the following Deep Convolution Networks using the 1280×720 resolution webcamera on my 2014 Macbook Pro:

VGG ILSVRC 2014 (16 Layers): 1000 ImageNet Object Categories (~ 7 FPS)
VGG ILSVRC 2014 (19 Layers): 1000 Object Categories (~5 FPS)
BVLC GoogLeNet: 1000 Object Categories (~ 24 FPS)
Region-CNN ILSVRC 2013: 200 Object Categories (~ 22 FPS)
BVLC Reference CaffeNet: 1000 Object Categories (~ 18 FPS)
BVLC Reference CaffeNet (Fully Convolutional) 8×8: 1000 Object Categories (~12 FPS)
BVLC Reference CaffeNet (Fully Convolutional) 34×17: 1000 Object Categories (~1 FPS)
MIT Places-CNN Hybrid (Places + ImageNet): 971 Object Categories + 200 Scene Categories = 1171 Categories (~ 12 FPS)

The above image depicts the output from an 8×8 grid detection showing brighter regions as higher probabilities of the class “snorkel” (automatically selected by the network from 1000 possible classes as the highest probability).

So far I have spent some time understanding how Caffe keeps each layer’s data during a forward/backward pass, and how the deeper layers could be “visualized” in a … Continue reading...

Extracting Automatically Labeled Volumetric ROIs from MRI

#1658
22/05/14

afni, atlas, destrieux, fmri, freesurfer, neuroscience, parcellation, roi, volumetric

Performing a region of interest analysis on MRI requires knowing where the regions are in your subject data. Typically, this has been done using hand-drawn masks in a 3d viewer. However, recent research has made the process mostly automatic and the open-source community has implemented everything you will need to automatically create labeled volumetric regions of interest [1-3]. With FreeSurfer 5.3, we have the option of performing cortical parcellation using 4 different atlases:

Destrieux atlas: aparc.a2009s
Desikan-Killiany atlas: aparc
Mindboggle: aparc.DKTatlas40
Brodman areas: BA and BA.thresh

We’ll first use freesurfer’s recon-all tool to perform a cortical reconstruction of our anatomical scans. Download freesurfer and register your copy. You’ll be sent an e-mail with a license. Follow the instructions and create the license file “.license” inside your freesurfer home directory (check the environment variable, FREESURFER_HOME, e.g., “$ echo $FREESURFER_HOME"). Then run the script, “$FREESURFER_HOME/FreeSurferEnv.sh” to setup necessary paths.

Next make sure you have set the environment variable for SUBJECTS_DIR to where you’d like your analysis to go (e.g., “$ export SUBJECTS_DIR=/some/directory“). For our example, we’ll keep this to a directory called “freesurfer” in our home directory, “~/”. Each subject we analyze will have its own folder insider … Continue reading...

YouTube’s “Copyright School” Smash Up

#1303
09/11/12

appropriation, collage, copyright, dispute, fair dealing, fair use, gangam style, happy tree friends, information retrieval, infringement, mosaic, pastiche, perception, sampling, smash up, sony, synthesis, upmg, youtube

Ever wonder what happens when you’ve been accused of violating copyright multiple times on YouTube? First, you get a redirect to YouTube’s “Copyright School” whenever you visit YouTube, forcing you to watch a cartoon of Happy Tree Friends where the main character is dressed as an actual pirate:

Second, I’m guessing, your account will be banned. Third, you cry and wonder why you ever violated copyright in the first place.

In my case, I’ve disputed every one of the 4 copyright violation notices that I’ve received under grounds of Fair Use and Fair Dealing. Here’s what happens when you file a dispute using YouTube’s online form (click for high-res):

3 of the 4 have been dropped after I’ve filed disputes, though I’m still waiting to hear about the response to the above dispute. Read the dispute letter to Sony ATV and UPMG Publishers in full here.

The picture above shows a few stills from what my Smash Ups look like. The process described in greater detail on createdigitalmotion.com is part of my ongoing research into how existing content can be transformed into artistic styles reminiscent of analytic cubist, figurative, and futurist paintings. The process to create the videos … Continue reading...

An open letter to Sony ATV and UMPG

#1291
09/11/12

collage, copyright, gangnam style, infringement, mosaic, perception, psy, smash up, sony, synthesis, technology, umpg, youtube

Dear Sony ATV Publishing, UMPG Publishing, and other concerned parties,

I ask you to please withdraw your copyright violation notice on my video, “PSY – GANGNAM STYLE (?????) M/V (YouTube SmashUp)” as I believe my use of any copyrighted material is protected under Fair Use or Fair Dealing. This video was created by an automated process as part of an art project developed during my PhD at Goldsmiths, University of London: http://archive.pkmital.com/projects/visual-smash-up/ and http://archive.pkmital.com/projects/youtube-smash-up/

The process which creates the audio and video is entirely automated meaning the accused video is created by an algorithm. This algorithm begins by first creating a large database of tiny fragments of audio and video (less than 1 second of audio per fragment) using 9 videos from YouTube’s top 10 list. From this database, the tiny fragments of video and audio are stored as unrelated pieces of information and described only by a short series of 10-15 numbers. These numbers represent low-level features describing the texture and shape of the fragment of audio or video. These tiny fragments are then matched to the tiny fragments of audio and video detected within the target for resynthesis, in this case the number one YouTube video … Continue reading...

Copyright Violation Notice from “Rightster”

#1246
16/10/12

collage, copyright infringement, fair dealing, fair use, mosaic, pastiche, sampling, youtube

I’ve been working on an art project which takes the top 10 videos in YouTube and tries to resynthesize the #1 video in YouTube using the remaining 9 videos. The computational model is based on low-level human perception and uses only very abstract features such as edges, textures, and loudness. I’ve created a new synthesis each week using the top 10 of the week in the hopes that, one day, I will be able to resynthesize my own video in the top 10. It is a viral algorithm essentially but it is not proven if it will succeed or not.

The database of content used in the recreation of the above video comes from the following videos:
#2 News Anchor FAIL Compilation 2012 || PC
#3 Flo Rida – Whistle [Official Video]
#4 Carly Rae Jepsen – Call Me Maybe
#5 Jennifer Lopez – Goin’ In ft. Flo Rida
#6 Taylor Swift – We Are Never Ever Getting Back Together
#7 will.i.am – This Is Love ft. Eva Simons
#8 Call Me Maybe – Carly Rae Jepsen (Chatroulette Version)
#9 Justin Bieber – As Long As You Love Me ft. Big Sean
#10 Rihanna – Where Have You Been

It … Continue reading...

3D Musical Browser

#1079
29/06/12

browser, cbir, concatenative synthesis, daphne oram, oramics, perception, technology, video

I’ve been interested in exploring ways of navigating media archives. Typically, you may use iTunes and go from artist to artist, or have managed to tediously classify your collection into genres. Some may still even browse their music through a file browser, perhaps making sure the folders and filenames of their collection are descriptive of the artist, album, year, etc… Though what about how the content actually sounds?

Wouldn’t it be nice to hear all music which shares similar sounds, or similar phrases of sounds? Research in the last 10-15 years have developed methods precisely to solve this problem and fall under the umbrella term content-based information retrieval (CBIR) algorithms, or uncovering the relationships of an archive through the information within the content. For images, Google’s Search by Image is a great example which only recently became public. For audio, audioDB and ShaZam are good examples of discovering music through the way it sounds, or the content-based relationships of the audio itself. Though, each of these interfaces present a list of matches to a image or audio query, making exploring the content-based relationships of a specific set of material difficult.

The video above demonstrates interaction with a novel 3D browser … Continue reading...

Intention in Copyright

#1106
29/06/12

analytic cubism, appropriation, burroughs, collage, concatenative synthesis, copyright, dada, digital, frederic jameson, inference, intention, john oswald, law, michael jackson, mosaic, pastiche, perception, sampling, tristan tzara

The following article is written for the LUCID Studio for Speculative Art based in India.

Introduction

My work in audiovisual resynthesis aims to create models of how humans represent and attend to audiovisual scenes. Using pattern recognition of both audio and visual material, these models use large corpora of learned audiovisual material which can be matched to ongoing streams of incoming audio or visual material. The way audio and visual material is stored and segmented within the model is based heavily on neurobiology and behavioral evidence (the details are saved for another post). I have called the underlying model Audiovisual Content-based Information Description/Distortion (or ACID for short).

As an example, a live stream of audio may be matched to a database of learned sounds from recordings of nature, creating a re-synthesis of the audio environment at present using only pre-recorded material from nature itself. These learned sounds may be fragments of a bird chirping, or the sound of footsteps. Incoming sounds of someone talking may then be synthesized using the closest sounding material to that person talking, perhaps a bird chirp or a footstep. Instead of a live stream, one can also re-synthesize a pre-recorded stream. Consider using a database … Continue reading...

Course @ CEMA Srishti School of Design, Bangalore, IN

#862
10/11/11

3 D Interactive projects, creative coding, data, histories, projection mapping, sculpture, technology, urban

From November 21st to the 2nd of December, I’ll have the pleasure to lead a course and workshop with Prayas Abhinav at the Center for Experimental Media Arts in the Srishti School of Design in Banaglore, IN. Many thanks to Meena Vari for all her help in organizing the project.

Stories are flowing trees

Key words: 3D, interactive projects, data, histories, urban, creative coding, technology, sculpture, projection mapping

Project Brief:

Urban realities are more like fictions, constructed through folklore, media and policy. Compressing these constructions across time would offer some possibilities for the emergence of complexity and new discourse. Using video projections adapted for 3D surfaces, urban histories will become data and information – supple, malleable, and material.

The project will begin with a one week workshop by Parag Mital on “Creative Coding” using the openFrameworks platform for C/C++ coding”.

About the Artists:

Prayas Abhinav

Presently he teaches at the Srishti School of Art, Design and Technology and is a researcher at the Center for Experimental Media Arts (CEMA). He has taught in the past at Dutch Art Institute (DAI) and Center for Environmental Planning and Technology (CEPT).
He has been supported by fellowships by Openspace India (2009), TED (2009), … Continue reading...

Memory Mosaicing

#855
03/11/11

app, Augmented Perception, concatenative synthesis, database, information retrieval, iphone, itunes, memory mosaicing, plunderphonics, soundspotting, synthesis

A product of my PhD research is now available on the iPhone App Store (for a small cost!): View in App Store.

This application is motivated by my interests in experiencing an Augmented Perception and of course very much inspired by some of the work here at Goldsmiths. The application of existing approaches in soundspotting/mosaicing to a real-time stream and situated in the real-world allows one to play with their own sonic memories, and certainly requires an open ear for new experiences. Succinctly, the app records segments of sounds in real-time using it’s own listening model, as you walk around in different environment (or sit at your desk). These segments are constantly built up the longer the app is left running to form a database (working memory model) for which to understand new sounds. Incoming sounds are then matched to this database and the closest matching sound is played instead. What you get is a polyphony of sound memories triggered by the incoming feed of audio, and an app which sounds more like your environment the longer it is left to run. A sort of gimmicky feature of this app is the ability to learn a song from your … Continue reading...

Concatenative Video Synthesis (or Video Mosaicing)

#830
08/10/11

audiovisual, computer vision, concatenative, mosaicing, object detection, synthesis, video, video processing

Working closely with my adviser Mick Grierson, I have developed a way to resynthesize existing videos using material from another set of videos. This process starts by learning a database of objects that appear in the set of videos to synthesize from. The target video to resynthesize is then broken into objects in a similar manner, but also matched to objects in the database. What you get is a resynthesis of the video that appears as beautiful disorder. Here are two examples, the first using Family Guy to resynthesize The Simpsons. And the second using Jan Svankmajer’s Food to resynthesize Jan Svankmajer’s Dimensions of Dialogue.

Google Earth + Atlantis Space Shuttle

#640
10/07/11

atlantis, google, google earth, nasa, space, virtual reality

I managed to catch the live feed from NASA.gov of the Atlantis Space Shuttle launch yesterday. Though what I found really interesting was a real-time virtual reality of the space shuttle launch from inside Google Earth. Screen-capture with obligatory 12x speedup to retain attention span below:

Lunch Bites @ CULTURE Lab, Newcastle University

#638
10/07/11

atau tanaka, binauralization, catart, culture lab, newcastle university, seminar, soundspotter, synthesis, talk

I was recently invited to the CULTURE lab at Newcastle University by director, Atau Tanaka. I would say it has the resources and creative power of 5 departments all housed in one spacious building. In the 12-some studios housed over 3 floors, over the course of 2 short days, I found people building multitouch tables, controlling synthesizers with the touch of fabric, and researching augmented spatial sonic realities. There is a full suite of workshop tools including a laser cutter, multiple multi-channel sound studios, full stage/theater with stage lighting and multiple projection, radio lab, and tons of light and interesting places to sit and do whatever you feel like doing. The other thing I found really interesting is there are no “offices”. Instead, the staff are dispersed amongst the students in the twelve-some studios, picking a new desk perhaps whenever they need a change of scenery? If you are ever in the area, it is certainly worth a visit, and I’m sure the people there will be very open to tell you what they are up to.

I also had the pleasure to give a talk on my PhD research in Resynthesizing Audiovisual Perception with Augmented Reality at the Lunch … Continue reading...

Latest Entries

Neural Audio Decollage – Whale Sounds

Talking Hypersurfaces, Computational Arts, NFTs, and more on R Mochi Podcast Episode

UCLA Course on “Cultural Appropriation with Machine Learning”

NFT Pixels

Memory Mosaic iOS – Generative Audio Mashup App

C.A.R.P.E. version 0.1.1 release

Toolkit for Visualizing Eye-Movements and Processing Audio/Video

Handwriting Recognition with LSTMs and ofxCaffe

Real-Time Object Recognition with ofxCaffe

Extracting Automatically Labeled Volumetric ROIs from MRI

YouTube’s “Copyright School” Smash Up

An open letter to Sony ATV and UMPG

Copyright Violation Notice from “Rightster”

3D Musical Browser

Intention in Copyright

Introduction

Course @ CEMA Srishti School of Design, Bangalore, IN

Stories are flowing trees

Prayas Abhinav

Memory Mosaicing

Concatenative Video Synthesis (or Video Mosaicing)

Google Earth + Atlantis Space Shuttle

Lunch Bites @ CULTURE Lab, Newcastle University

Social

Photos

Archives