Archived entries for

C.A.R.P.E. version 0.1.1 release

Screen Shot 2015-02-09 at 2.59.14 PM

I’ve updated C.A.R.P.E., a graphical tool for visualizing eye-movements and processing audio/video, to include a graphical timeline (thanks to ofxTimeline by James George/YCAM), support for audio playback/scrubbing (using pkmAudioWaveform), audio saving, and various bug fixes. This release has changed some parameters of the XML file and added others. Please refer to this example XML file for how to setup your own data:

See my previous post for information on the initial release.

Please fill out the following form if you’d like to use C.A.R.P.E..:

Continue reading...

Toolkit for Visualizing Eye-Movements and Processing Audio/Video

Screen Shot 2015-02-06 at 6.24.27 PM

Original video still without eye-movements and heatmap overlay copyright Dropping Knowledge Video Republic.

From 2008 – 2010, I worked on the Dynamic Images and Eye-Movements (D.I.E.M.) project, led by John Henderson, with Tim Smith and Robin Hill. We worked together to collect nearly 200 participants eye-movements on nearly 100 short films from 30 seconds to 5 minutes in length. The database is freely available and covers a wide range of film styles form advertisements, to movie and music trailers, to news clips. During my time on the project, I developed an open source toolkit, C.A.R.P.E. to complement D.I.E.M., or Computational Algorithmic Representation and Processing of Eye-movements (Tim’s idea!), for visualizing and processing the data we collected, and used it for writing up a journal paper describing a strong correlation between tightly clustered eye-movements and the motion in a scene. We also output visualizations of our entire corpus on our Vimeo channel. The project came to a halt and so did the visualization software. I’ve since picked up the ball and re-written it entirely from the ground up.

The image below shows how you can represent the movie, the motion in the scene of the movie (represented in … Continue reading...

Handwriting Recognition with LSTMs and ofxCaffe

Long Short Term Memory (LSTM) is a Recurrent Neural Network (RNN) architecture designed to better model temporal sequences (e.g. audio, sentences, video) and long range dependencies than conventional RNNs [1]. There is a lot of excitement in the machine learning communities with LSTMs (and Deep Minds’s counterpart, “Neural Turing Machines” [2], or Facebook’s, “Memory Networks” [3]) as they overcome a fundamental limitation to conventional RNNs and are able to achieve state-of-the-art benchmark performances on a number of tasks [4,5]:

  • Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
  • Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
  • Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
  • Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
  • Medium vocabulary speech recognition (Geiger et al., Interspeech 2014)
  • English to French translation (Sutskever et al., Google, NIPS 2014)
  • Audio onset detection (Marchi et al., ICASSP 2014)
  • Social signal classification (Brueckner & Schulter, ICASSP 2014)
  • Arabic handwriting recognition (Bluche et al., DAS 2014)
  • TIMIT phoneme recognition (Graves et al., ICASSP 2013)
  • Optical character recognition (Breuel et al., ICDAR 2013)
  • Image caption generation (Vinyals et al., Google, 2014)
  • Video to textual description (Donahue et al., 2014)

The current dynamic state … Continue reading...


Copyright © 2010 Parag K Mital. All rights reserved. Made with Wordpress. RSS