http://archive.pkmital.com technology - http://archive.pkmital.com

Archived entries for technology

Neural Audio Decollage – Whale Sounds

January 30, 2022
In music, technology
No comments yet

I put some music together this winter which you can hear on soundcloud:

pkmital · Whale Sounds (Burial & Four Tet Decollage of John Tejada)

It features both some neural audio decollage and the voice of Jessa Carter. The decollage works with the material of Four Tet and Burial’s Moth and recomposes the track by John Tejada – Farther and Fainter.… Continue reading...

UCLA Course on “Cultural Appropriation with Machine Learning”

September 1, 2021
In art, critical theory, teaching, technology
One comment

During the Fall of 2020, I had the honor of teaching a new course at the University of California in Los Angeles (UCLA) Department of Design Media Arts (DMA) entitled “Cultural Appropriation with Machine Learning”. This provocatively titled course came together after wrestling with many questions I had that year in the wake of the pandemic, black civil rights movements, and a crushed economy.

Rather than teach a course that focuses purely on the “how” of machine learning, like Creative Applications of Deep Learning does, I wanted to also include a critical component to guide students through the questions they should be asking as they learn and employ these tools. I also wanted students to understand how these tools and algorithms came to be in today’s society, so that they knew better what questions to ask when they were using them. It became clear early on that cultural appropriation was a central theme across most generative arts practices. I say this because machine learning requires large amounts of data which tend to come from existing corpora of creative content, such as flickr archives, or instagram collections. What does it mean when an algorithm owned by Google or Microsoft is capable … Continue reading...

NFT Pixels

August 31, 2021
In art, audio-visual, copyright, critical theory, technology
No comments yet

NFTs, or non-fungible tokens are data stored on the blockchain that certifies and records transactions associated with a digital asset. Effectively, they are certificates of authenticity of digital assets traded on blockchain-based exchanges that have spurred an economy of scarcity for digital assets such as images, GIFs, sound clips, and videos. In the past, artists producing digital content may have also considered selling their work in places such as instagram, behance, deviant art, or via their own websites. Though, it is unlikely that this content would have been actually bought in any form. This is likely because the content is already accessible to all. On these platforms, if we are to call it art, it could be akin to a digital version of public art in some ways, for all to see and consume. Or more likely, we would consider it either documentation or advertising and marketing of the artist’s work.

The entire point of NFTs, and the reason they are so marketed as being valuable, is because we believe there is digital scarcity of the content being traded and sold.

Prior to NFT marketplaces for digital art, let’s say that someone were to decide that they wanted ownership of … Continue reading...

Memory Mosaic iOS – Generative Audio Mashup App

August 23, 2015
In music, technology
No comments yet

I had a chance to incorporate some udpates into Memory Mosaic, an iOS app I started developing during my PhD in Audiovisual Scene Synthesis. The app organizes sound in real-time and clusters them based on similarity. Using the microphone on the device, or an iTunes song, any onset triggers a new audio segment to be created and stored in a database. The example video below shows how this works for Do Make Say Think’s song, Minim and The Landlord is Dead:

Memory Mosaic iOS from Parag K Mital on Vimeo.

Here’s an example of using it with live instruments

Memory Mosaic – Technical Demo (iOS App) from Parag K Mital on Vimeo.

The app also works with AudioBus, meaning you can use it with other apps too, adding effects chains, or sampling from another app’s output. Available on the iOS App Store: https://itunes.apple.com/us/app/memory-mosaic/id475759669?mt=8… Continue reading...

C.A.R.P.E. version 0.1.1 release

I’ve updated C.A.R.P.E., a graphical tool for visualizing eye-movements and processing audio/video, to include a graphical timeline (thanks to ofxTimeline by James George/YCAM), support for audio playback/scrubbing (using pkmAudioWaveform), audio saving, and various bug fixes. This release has changed some parameters of the XML file and added others. Please refer to this example XML file for how to setup your own data:

See my previous post for information on the initial release.

Please fill out the following form if you’d like to use C.A.R.P.E..:

Loading…… Continue reading...

Toolkit for Visualizing Eye-Movements and Processing Audio/Video

February 6, 2015
In computer vision, technology, visual cognition
4 comments

Original video still without eye-movements and heatmap overlay copyright Dropping Knowledge Video Republic.

From 2008 – 2010, I worked on the Dynamic Images and Eye-Movements (D.I.E.M.) project, led by John Henderson, with Tim Smith and Robin Hill. We worked together to collect nearly 200 participants eye-movements on nearly 100 short films from 30 seconds to 5 minutes in length. The database is freely available and covers a wide range of film styles form advertisements, to movie and music trailers, to news clips. During my time on the project, I developed an open source toolkit, C.A.R.P.E. to complement D.I.E.M., or Computational Algorithmic Representation and Processing of Eye-movements (Tim’s idea!), for visualizing and processing the data we collected, and used it for writing up a journal paper describing a strong correlation between tightly clustered eye-movements and the motion in a scene. We also output visualizations of our entire corpus on our Vimeo channel. The project came to a halt and so did the visualization software. I’ve since picked up the ball and re-written it entirely from the ground up.

The image below shows how you can represent the movie, the motion in the scene of the movie (represented in … Continue reading...

Handwriting Recognition with LSTMs and ofxCaffe

February 6, 2015
In computer vision, source code, technology
No comments yet

Long Short Term Memory (LSTM) is a Recurrent Neural Network (RNN) architecture designed to better model temporal sequences (e.g. audio, sentences, video) and long range dependencies than conventional RNNs [1]. There is a lot of excitement in the machine learning communities with LSTMs (and Deep Minds’s counterpart, “Neural Turing Machines” [2], or Facebook’s, “Memory Networks” [3]) as they overcome a fundamental limitation to conventional RNNs and are able to achieve state-of-the-art benchmark performances on a number of tasks [4,5]:

Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
Medium vocabulary speech recognition (Geiger et al., Interspeech 2014)
English to French translation (Sutskever et al., Google, NIPS 2014)
Audio onset detection (Marchi et al., ICASSP 2014)
Social signal classification (Brueckner & Schulter, ICASSP 2014)
Arabic handwriting recognition (Bluche et al., DAS 2014)
TIMIT phoneme recognition (Graves et al., ICASSP 2013)
Optical character recognition (Breuel et al., ICDAR 2013)
Image caption generation (Vinyals et al., Google, 2014)
Video to textual description (Donahue et al., 2014)

The current dynamic state … Continue reading...

Real-Time Object Recognition with ofxCaffe

January 4, 2015
In computer vision, neuroscience, technology
4 comments

I’ve spent a little time with Caffe over the holiday break to try and understand how it might work in the context of real-time visualization/object recognition in more natural scenes/videos. Right now, I’ve implemented the following Deep Convolution Networks using the 1280×720 resolution webcamera on my 2014 Macbook Pro:

VGG ILSVRC 2014 (16 Layers): 1000 ImageNet Object Categories (~ 7 FPS)
VGG ILSVRC 2014 (19 Layers): 1000 Object Categories (~5 FPS)
BVLC GoogLeNet: 1000 Object Categories (~ 24 FPS)
Region-CNN ILSVRC 2013: 200 Object Categories (~ 22 FPS)
BVLC Reference CaffeNet: 1000 Object Categories (~ 18 FPS)
BVLC Reference CaffeNet (Fully Convolutional) 8×8: 1000 Object Categories (~12 FPS)
BVLC Reference CaffeNet (Fully Convolutional) 34×17: 1000 Object Categories (~1 FPS)
MIT Places-CNN Hybrid (Places + ImageNet): 971 Object Categories + 200 Scene Categories = 1171 Categories (~ 12 FPS)

The above image depicts the output from an 8×8 grid detection showing brighter regions as higher probabilities of the class “snorkel” (automatically selected by the network from 1000 possible classes as the highest probability).

So far I have spent some time understanding how Caffe keeps each layer’s data during a forward/backward pass, and how the deeper layers could be “visualized” in a … Continue reading...

Copyright Violation Notice from “Rightster”

October 16, 2012
In art, computer vision, technology
2 comments

I’ve been working on an art project which takes the top 10 videos in YouTube and tries to resynthesize the #1 video in YouTube using the remaining 9 videos. The computational model is based on low-level human perception and uses only very abstract features such as edges, textures, and loudness. I’ve created a new synthesis each week using the top 10 of the week in the hopes that, one day, I will be able to resynthesize my own video in the top 10. It is a viral algorithm essentially but it is not proven if it will succeed or not.

The database of content used in the recreation of the above video comes from the following videos:
#2 News Anchor FAIL Compilation 2012 || PC
#3 Flo Rida – Whistle [Official Video]
#4 Carly Rae Jepsen – Call Me Maybe
#5 Jennifer Lopez – Goin’ In ft. Flo Rida
#6 Taylor Swift – We Are Never Ever Getting Back Together
#7 will.i.am – This Is Love ft. Eva Simons
#8 Call Me Maybe – Carly Rae Jepsen (Chatroulette Version)
#9 Justin Bieber – As Long As You Love Me ft. Big Sean
#10 Rihanna – Where Have You Been

It … Continue reading...

3D Musical Browser

I’ve been interested in exploring ways of navigating media archives. Typically, you may use iTunes and go from artist to artist, or have managed to tediously classify your collection into genres. Some may still even browse their music through a file browser, perhaps making sure the folders and filenames of their collection are descriptive of the artist, album, year, etc… Though what about how the content actually sounds?

Wouldn’t it be nice to hear all music which shares similar sounds, or similar phrases of sounds? Research in the last 10-15 years have developed methods precisely to solve this problem and fall under the umbrella term content-based information retrieval (CBIR) algorithms, or uncovering the relationships of an archive through the information within the content. For images, Google’s Search by Image is a great example which only recently became public. For audio, audioDB and ShaZam are good examples of discovering music through the way it sounds, or the content-based relationships of the audio itself. Though, each of these interfaces present a list of matches to a image or audio query, making exploring the content-based relationships of a specific set of material difficult.

The video above demonstrates interaction with a novel 3D browser … Continue reading...

Memory Mosaicing

November 3, 2011
In art, augmented reality, memory, technology
No comments yet

A product of my PhD research is now available on the iPhone App Store (for a small cost!): View in App Store.

This application is motivated by my interests in experiencing an Augmented Perception and of course very much inspired by some of the work here at Goldsmiths. The application of existing approaches in soundspotting/mosaicing to a real-time stream and situated in the real-world allows one to play with their own sonic memories, and certainly requires an open ear for new experiences. Succinctly, the app records segments of sounds in real-time using it’s own listening model, as you walk around in different environment (or sit at your desk). These segments are constantly built up the longer the app is left running to form a database (working memory model) for which to understand new sounds. Incoming sounds are then matched to this database and the closest matching sound is played instead. What you get is a polyphony of sound memories triggered by the incoming feed of audio, and an app which sounds more like your environment the longer it is left to run. A sort of gimmicky feature of this app is the ability to learn a song from your … Continue reading...

Concatenative Video Synthesis (or Video Mosaicing)

October 8, 2011
In art, audio-visual, computer vision, technology, visual cognition
2 comments

Working closely with my adviser Mick Grierson, I have developed a way to resynthesize existing videos using material from another set of videos. This process starts by learning a database of objects that appear in the set of videos to synthesize from. The target video to resynthesize is then broken into objects in a similar manner, but also matched to objects in the database. What you get is a resynthesis of the video that appears as beautiful disorder. Here are two examples, the first using Family Guy to resynthesize The Simpsons. And the second using Jan Svankmajer’s Food to resynthesize Jan Svankmajer’s Dimensions of Dialogue.

Creative Community Spaces in INDIA

June 22, 2011
In art, technology
No comments yet

Jaaga – Creative Common Ground
Bangalore
http://www.jaaga.in/

CEMA – Center for Experimental Media Arts at Srishti School of Art, Design and Technology
Bangalore
http://cema.srishti.ac.in/site/

Bar1 – non-profit exchange programme by artists for artists to foster the local, Indian and international mutual exchange of ideas and experiences through guest residencies in Bangalore
Bangalore
http://www.bar1.org

Sarai – a space for research, practice, and conservation about the contemporary media and urban constellations.
New Dehli
http://www.sarai.net/

Khoj/International Artists’ Association – artist led, alternative forum for experimentation and international exchange
New Dehli
http://www.khojworkshop.org/

Periferry – To create a nomadic space for hybrid art practices. It is a laboratory for people cross- disciplinary practices. The project focuses on the creation of a network space for negotiating the challenge of contemporary cultural production. It is located on a ferry barge on river Brahmaputra and is docked in Guwhati, Assam.
Narikolbari, Guwahati
http://www.periferry.in/

Point of View – non-profit organization that brings the points of view of women into community, social, cultural and public domains through media, art and culture.
Bombay
http://www.pointofview.org/

Majilis – a center for rights discourse and inter-disciplinary arts initiatives
Bombay
http://majlisbombay.org/

Camp – not an “artists collective” but a space, in which ideas and energies … Continue reading...

Facial Appearance Modeling/Tracking

I’ve been working on developing a method for automatic head-pose tracking, and along the way have come to model facial appearances. I start by initializing a facial bounding box using the Viola-Jones detector, a well known and robust detector used for training objects. This allows me to centralize the face. Once I know where the 2D plane of the face is in an image, I can register an Active Shape Model like so:

After multiple views of the possible appearance variations of my face, including slight rotations, I construct an appearance model.

The idea I am working with is using the first components of variations of this appearance model for determining pose. Here I show the first two basis vectors and the images they reconstruct:

As you may notice, these two basis vectors very neatly encode rotation. By looking at the eigenvalues of the model, you can also interpret pose.… Continue reading...

Short Time Fourier Transform using the Accelerate framework

April 14, 2011
In audio-visual, source code, technology
3 comments

Using the libraries pkmFFT and pkm::Mat, you can very easily perform a highly optimized short time fourier transform (STFT) with direct access to a floating-point based object.

Get the code on my github:
http://github.com/pkmital/pkmFFT
Depends also on: http://github.com/pkmital/pkmMatrix… Continue reading...

Real FFT/IFFT with the Accelerate Framework

April 14, 2011
In audio-visual, source code, technology
24 comments

Apple’s Accelerate Framework can really speed up your code without thinking too much. And it will also run on an iPhone. Even still, I did bang my head a few times trying to get a straightforward Real FFT and IFFT working, even after consulting the Accelerate documentation (reference and source code), stackoverflow (here and here), and an existing implementation (thanks to Chris Kiefer and Mick Grierson). Still, the previously mentioned examples weren’t very clear as they did not handle the case of overlapping FFTs which I was doing in the case of a STFT or they did not recover the power spectrum, or they just didn’t work for me (lots of blaring noise).

Get the code on my github:
http://github.com/pkmital/pkmFFT… Continue reading...

Responsive Ecologies Documentation

As part of a system of numerous dynamic connections and networks, we are reactive and deterministic to a complex system of cause and effect. The consequence of our actions upon our selves, the society we live in and the broader natural world is conditioned by how we perceive our involvement. The awareness of how we have impacted on a situation is often realised and processed subconsciously, the extent and scope of these actions can be far beyond our knowledge, our consideration, and importantly beyond our sensory reception. With this in mind, how can we associate our actions, many of which may be overlooked as customary, with for instance, the honey bee depopulation syndrome or the declining numbers of Siberian Tigers.

Responsive Ecologies is part of an ongoing collaboration with ZSL London Zoo and Musion Academy. Collectively we have been exploring innovative means of public engagement, to generate an awareness and understanding of nature and the effects of climate change. All of the contained footage has come from filming sessions within the Zoological Society; this coincidentally has raised some interesting questions on the spectacle of captivity, a issue which we have tried to reflect upon in the construction and presentation of … Continue reading...

Streaming Motion Capture Data from the Kinect using OSC on Mac OSX

January 24, 2011
In computer vision, technology
17 comments

This guide will help to get you running PrimeSense NITE’s Skeleton Tracking inside XCode on your OSX. It will also help you stream that data in case you’d like to use it in another environment such as Max. An example Max patch is also available.

PrimeSense NITE Skeletonization and Motion Capture to Max/MSP via OSC from pkmital on Vimeo.

Prerequisites:

0.) 1 Microsoft Kinect or other PrimeSense device.

1.) Install XCode and Java Developer Package located here: https://connect.apple.com/cgi-bin/WebObjects/MemberSite.woa/wa/getSoftware?bundleID=20719 – if you require a Mac OSX Developer account, just register at developer.apple.com since it is free.

2.) Install Macports: http://www.macports.org/

3.) Install libtool and libusb > 1.0.8:

$ sudo port install libusb-devel +universal

4.) Get the OpenNI Binaries for Mac OSX: http://www.openni.org/downloadfiles

5.) Install OpenNI by unzipping the file OpenNI-Bin-MacOSX (-v1.0.0.25 at the time of writing) and running,

$ sudo ./install.sh

6.) Get SensorKinect from avin2: https://github.com/avin2/SensorKinect/tree/unstable/Bin

7.) Install SensorKinect by unzipping and running

$ sudo ./install.sh

8.) Install OpenNI Compliant Middleware NITE from Primesense for Mac OSX: http://www.openni.org/downloadfiles

9.) Install NITE by unzipping and running

$ sudo ./install.sh

When prompted for a key, enter the key listed on the openni website.

Getting it up and running:

1.) Download the … Continue reading...

Responsive Ecologies Exhibition

December 1, 2010
In audio-visual, computer vision, technology
No comments yet

Come checkout the Waterman’s Art Centre from the 6th of December until the 21st of January for an immersive and interactive visual experience entitled “Responsive Ecologies” developed in collaboration with artists captincaptin. We will also be giving a talk on the 10th of December from 7 p.m. – 9 p.m. during CINE: 3D Imaging in Art at the Watermans Center.

Responsive Ecologies is part of a wider ongoing collaboration between artists captincaptin, the ZSL London Zoo and Musion Academy. Collectively they have been exploring innovative means of public engagement, to generate an awareness and understanding of nature and the effects of climate change. All of the contained footage has come from filming sessions within the Zoological Society; this coincidentally has raised some interesting questions on the spectacle of captivity, a issue which we have tried to reflect upon in the construction and presentation of this installation. The nature of interaction within Responsive Ecologies means that a visitor to the space cannot simply view the installation but must become a part of its environment. When attempting to perceive the content within the space the visitor reshapes the installation. Everybody has a degree of impact whether directed or incidental, and … Continue reading...

6DOF Head Tracking

November 18, 2010
In computer vision, technology, visual cognition
No comments yet

The following demo works with SeeingMachines FaceAPI in openFrameworks controlling a Mario avatar. It also has some really poor gesture recognition (and learning but it’s not shown here), though a threshold on the rotation DOF would have produced better results for the simple task of looking up/down left/right gestures.

6DOF Head Tracking from pkmital on Vimeo.

interfacing seeingmachines faceapi with openFrameworks to control a 3D mario avatar

This is just with the non-commercial license. The full commercial license (~$3000?) gives you access to lip/mouth tracking and eye-brows, as well as much more flexibility in how to use their api with different/multiple cameras and accessing image data.

Of course, there are other initiatives at producing similar results. Mutual information based template trackers, for instance, seem to be state-of-art. Take a look at recent work by Panin and Knoll using OpenTL:

I imagine a lot of people would like this technology.… Continue reading...