caffe - http://archive.pkmital.com https://archive.pkmital.com computational audiovisual augmented reality research Fri, 06 Feb 2015 05:45:51 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 Handwriting Recognition with LSTMs and ofxCaffe https://archive.pkmital.com/2015/02/06/handwriting-recognition-with-lstms-and-ofxcaffe/ https://archive.pkmital.com/2015/02/06/handwriting-recognition-with-lstms-and-ofxcaffe/#respond Fri, 06 Feb 2015 04:41:49 +0000 http://pkmital.com/home/?p=1836

Long Short Term Memory (LSTM) is a Recurrent Neural Network (RNN) architecture designed to better model temporal sequences (e.g. audio, sentences, video) and long range dependencies than conventional RNNs [1]. There is a lot of excitement in the machine learning communities with LSTMs (and Deep Minds’s counterpart, “Neural Turing Machines” [2], or Facebook’s, “Memory Networks” [3]) as they overcome a fundamental limitation to conventional RNNs and are able to achieve state-of-the-art benchmark performances on a number of tasks [4,5]:

  • Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
  • Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
  • Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
  • Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
  • Medium vocabulary speech recognition (Geiger et al., Interspeech 2014)
  • English to French translation (Sutskever et al., Google, NIPS 2014)
  • Audio onset detection (Marchi et al., ICASSP 2014)
  • Social signal classification (Brueckner & Schulter, ICASSP 2014)
  • Arabic handwriting recognition (Bluche et al., DAS 2014)
  • TIMIT phoneme recognition (Graves et al., ICASSP 2013)
  • Optical character recognition (Breuel et al., ICDAR 2013)
  • Image caption generation (Vinyals et al., Google, 2014)
  • Video to textual description (Donahue et al., 2014)

The current dynamic state … Continue reading...

The post Handwriting Recognition with LSTMs and ofxCaffe first appeared on http://archive.pkmital.com.

]]>
https://archive.pkmital.com/2015/02/06/handwriting-recognition-with-lstms-and-ofxcaffe/feed/ 0
Real-Time Object Recognition with ofxCaffe https://archive.pkmital.com/2015/01/04/real-time-object-recognition-with-ofxcaffe/ https://archive.pkmital.com/2015/01/04/real-time-object-recognition-with-ofxcaffe/#comments Sun, 04 Jan 2015 03:53:48 +0000 http://pkmital.com/home/?p=1764 Screen Shot 2015-01-03 at 12.57.23 PM

I’ve spent a little time with Caffe over the holiday break to try and understand how it might work in the context of real-time visualization/object recognition in more natural scenes/videos. Right now, I’ve implemented the following Deep Convolution Networks using the 1280×720 resolution webcamera on my 2014 Macbook Pro:

The above image depicts the output from an 8×8 grid detection showing brighter regions as higher probabilities of the class “snorkel” (automatically selected by the network from 1000 possible classes as the highest probability).

So far I have spent some time understanding how Caffe keeps each layer’s data during a forward/backward pass, and how the deeper layers could be “visualized” in a … Continue reading...

The post Real-Time Object Recognition with ofxCaffe first appeared on http://archive.pkmital.com.

]]>
https://archive.pkmital.com/2015/01/04/real-time-object-recognition-with-ofxcaffe/feed/ 4