SlideShare une entreprise Scribd logo
1  sur  47
Text, Image and Speech
Understanding: it’s all about
Learning from Data
Motaz El Saban
Senior applied research, Microsoft T&R, Cairo Lab
Associate professor faculty of computers and
information, Cairo university
Technical Focus
• Analyzing, modeling, learning and predicting in various domains to make sense of
digital media content
• Early on: better analysis for video compression
• Ph.D.: automatic tracking and analysis of sub cellular structures in time lapse
videos
• Post Ph.D.
– Visual object recognition (NevenVision & Google)
– Textual web rankers (Arabic) (Microsoft)
– Analyzing content on social networks
– Mobile multimedia experiences
• Automatic panoramic video construction
• Automatic video tagging
– Object/scene recognition/detection (face reco and expression as special cases)
– Activity recognition in videos (RGB/RGB+D)
– Speaker recognition
Application Scenarios
Web search (Text
ranking, media tagging
(e.g. object detection))
Visual similarity search
(clothes/products)
Ranking user
generated content on
forums
Searching document
archives (OCRLess)
Real-time classifiers
for mobile phones
(document-like
content)
Access control (face
unlocking)
Enhanced visual
communication
(background
replacement in Skype)
Activity recognition for
Kinect
Agenda
• Ph.D. work: Microtubules
• Real-time video stitching
– Basic framework
– Active Feedback for stitching
– Exploiting frame correlation
– Exploiting mobile sensors
• Annotating mobile generated videos
– Main concept
– Matching using frame information aggregation
– Propagating tags over frames
• Object recognition
– Detection
– Segmentation
– General object/specific
– Activity in videos
– Facial expression recognition using DNNs
4
Ph.D. thesis: Microtubule tracking and
analysis
5
Multi-frame tracking
6
MT analysis: HMMs
• HMM for each experimental condition of MT
• Distances between HMMs estimated
• MDS embedding of distances for visualization
7
MT analysis: Association rules
Early Work in Microsoft…
Ranking User Content on Forums
• Forums are conversational social cyberspaces
constituting rich repositories of content and an
important source of collaborative knowledge
• However, most of this knowledge is buried inside
the forum infrastructure and its extraction is both
complex and difficult
• In this work, focus on automatic rating of
postings in online discussion forums for easier
content access
Features & Classifier
Classifier: Non-linear SVM into three levels (M/H/L)
Relevance (OnSubForumTopic, OnThreadTopic,…)
Originality (OverlapPrevious,…)
Forum-specific features (Referencing,…)
Surface features (Timeliness, Lengthiness,…)
Posting-component features (Weblinks, Questioning, …)
Dataset
• Discussion threads from Slashdot
• 200 threads with a maximum of 200 posts from the 14 sub-
forums
• A total of 120,000 posts were scraped
• Posts on Slashdot are rated on a scale from -1 (irrelevant)
to 5 (high quality posts)
• Default rating for a registered user is 1 and for an
unregistered user is 0
• Posts rated as 0 were removed, unless they were from a
registered user
• Final dataset was composed of 20,008 rated posts, which
were clustered into three groups, namely low, medium, and
high, according to their value
Results
High Medium Low
F1-measure 0.61 0.42 0.46
Relative Accuracy and F1-measure for each metric category
Building an Arabic Web Ranker
• English rankers for Arabic documents is
suboptimal
• Re-training an Arabic ranker from scratch
suffers from lack of sufficient training data
• Solution: start from English ranker, fine tune
with Arabic specific data for top returned
results
Agenda
• Ph.D. work: Microtubules
• Real-time video stitching
– Basic framework
– Active Feedback for stitching
– Exploiting frame correlation
– Exploiting mobile sensors
• Annotating mobile generated videos
– Main concept
– Matching using frame information aggregation
– Propagating tags over frames
• Object recognition
– Detection
– Segmentation
– General object/specific
– Activity in videos
– Facial expression recognition using DNNs
• Initial thoughts/Ideas on Smarter Urban Dynamics
15
Real time Stitching
Estimating geometric and photometric mapping
16A novel research agenda, published over a number of articles and resulted in a
recent book chapter
Stitching Pipeline
Extract
Interest points
Match
between
frame pairs
Solve
geometric
transform
(with RANSAC)
Photometric
alignment
Sample Resulting Frame
18
Active Feedback
19
ICME 2011
Active Feedback Results
Pr Re F1 Overlap
NoFeedback 0.95 0.49 0.65 73.48
With
Feedback
0.97 0.65 0.78 58.85
Frame Correlation Results
21
recall precision Average time
(ms)
SIFT + no time
info usage
0.37 0.46 414
SURF + no time
info usage
0.32 0.48 162
SURF + Motion
Vector
estimation
0.23 0.38 128
SURF +
overlapping
region
0.27 0.48 149
SURF + Optical
Flow
0.32 0.47 103
Rotational angles from 3-D accelerometers
22
Mobile phone axis
Device coordinates
accelerations (front view)
Dataset sample
23
Agenda
• Ph.D. work: Microtubules
• Real-time video stitching
– Basic framework
– Active Feedback for stitching
– Exploiting frame correlation
– Exploiting mobile sensors
• Annotating mobile generated videos
– Main concept
– Matching using frame information aggregation
– Propagating tags over frames
• Object recognition
– Detection
– Segmentation
– General object/specific
– Activity in videos
– Facial expression recognition using DNNs
25
Overview
26
Key Contributions










L
k
tkikii
jiji
tagsConfffSim
L
k
tagsConftagsScore
1
,
,,
)(*),(*
1
1
)()(
Results
Effect of different time window size N when sampling rate (a) S = 5, (b) S = 25
Agenda
• Ph.D. work: Microtubules
• Real-time video stitching
– Basic framework
– Active Feedback for stitching
– Exploiting frame correlation
– Exploiting mobile sensors
• Annotating mobile generated videos
– Main concept
– Matching using frame information aggregation
– Propagating tags over frames
• Object recognition
– Detection
– Segmentation
– General object/specific
– Activity in videos
– Facial expression recognition using DNNs
29
Object Recognition
Object recognition
Object detection
General object
(with MSRC, ICIP 2013)
Specific such as face and
expression recognition
(ICCV 2011, ICIP 2015)
Object segmentation
(ICIP 2011, ICIP 2014)
Instance recognition
(with MSRA, ICME 2011,
Car recognition ICPR
2012)
Activity Recognition
RGB: with NU, WACV
2011, MWSCS 2011
RGB+D: IJCAI 2013
30
Activity Recognition in RGBD Videos
Skeleton joint locations and names
as captured by the Kinect sensor
Trajectory Descriptor
Trajectory Descriptor
Pt is the position of the joint at time t.
This figure shows a general displacement between Pt and Pt+1.
In this example, the trajectory is described by a histogram of 8 bins. For each
displacement, the angle and the length of the displacement are calculated.
Sample Trajectory
From the Action3D dataset, this figure shows the three projections of the Right
Hand joint when a subject is performing the action of High Arm Wave. For each
projection, HOD is used to describe the movement.
Temporal localization: Pyramid
Classification Accuracy Comparison
Method Accuracy
Recurrent Neural Network 42.5%
Hidden Markov Model 78.97%
Action Graph on Bag of 3D Points 74.7%
Random Occupancy Patterns 86.5%
Actionlets Ensemble 88.2%
HOD 91.26%
Facial Expression Recognition in the
Wild using Rich Deep Features
• Traditionally, researchers used hand crafted
features to represent images: texture, color,
gradients, Histogram of these (e.g. Hist of
gradients)
• Then came Hinton and produced impressive
results on ImageNet with deep Nets with
almost raw input
DeepNet Krizhevsky, Sutskever & Hinton
• 5 conv layers, 2 fully connected
• Uses raw image data as input (centered)
• Best results on ImageNet 2012 (~15% top 5 error)
• Trained by standard backprop, SGD with mini batches, dropout and ReLU
• Successful because:
– More training data 1.2 M images/1000 classes
– More computation power (GPUs)
After: ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al. NIPS 2012
Deep Net as Hierarchical Feature Extractors
• Shown in literature
After: Visualizing and Understanding Convolutional Networks,
Matthew D. Zeiler and Rob Fergus, ECCE 2014
Facial Expression Recognition in the Wild using Rich
Deep Features
Selected among top 10% papers in ICIP 2015
40
Idea: use deep features from Krishvesky net for facial expression but with added
domain knowledge
New Real-world Dataset
41
Results
Facial parts contribution
Comparative results on different datasets
AutoCaption: Automatic Caption Generation for
Personal Photos
43
Sample Results
(a) An example with two people interacting. (b) An example with a prominent landmark. (c)
An example where the caption is personalized to the user. (d) An example of a caption when
no people are present. (e) An example of a caption for a photo with no GPS. (f) An example
of a caption with a scene classification error.
Publications
• Book Chapter
– Motaz El Saban and Ayman Kaheel, “Panoramic
video construction from mobile video streams” in
Mobile and Cloud Visual Media Computing,
Springer (under preparation).
45
Publications
• Abbubakrelsedik Karali, Ahmad Bassiouny and Motaz El Saban, “Facial Expression Recognition in the Wild Using Rich Deep Features”, ICIP 2015 (selected as one of the
top 10% papers in ICIP 2015).
• Amr Sharaf, Mohammad Hussein, Marwan Torki and Motaz EL Saban, “Real-time Multi-scale Action Detection From 3D Features”, WACV 2015.
• Ahmed Bassiouny and Motaz El-Saban, “Semantic segmentation as image representation for scene recognition”, ICIP 2014.
• Krishnan Ramnath, Simon Baker, Anitha Kannan, Lucy Vanderwende, Michel Galley, Yi Yang, Deva Ramanan, Motaz El-Saban, Noran Hasan, Lorenzo Torresani, Sudipta
Sinha, “Auto Caption”, WACV 2014.
• Mostafa Izz, Alaa Abd El Hakeem and Motaz El-Saban, “Graph-Based Superpixel Labeling for Enhancement of Online Video Segmentation”, ICIIP 2013.
• Mohammad Gowayyed, Marwan Torki, Mohamed Hussein, Motaz El-Saban, “Histogram of Oriented Displacements (HOD): Describing Trajectories of Human Joints for
Action Recognition”, IJCAI 2013.
• Mohamed Hussein, Marwan Torki, Mohammad Gowayyed, Motaz El-Saban, “Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D
Joint Locations”, IJCAI 2013
• Osama Khalil, Mohammad Fathy, Dina Khalil, Motaz El Saban, Pushmeet kohli and Jamie Shotton, “Synthetic Training in Object Detection”, ICIP 2013.
• Noran Hasan, Ahmad Fathy, Tamer Deif, Ramy Shahin and Motaz El Saban, “Using Skin Segmentation to Improve Similar Product Recommendations in Online Clothing
Stores”, VISAPP 2013.
• Noureldin Laban, Motaz ElSaban, Ayman Nasr and Hoda Onsi, “Spatial cloud detection and retrieval system for satellite images”, International Journal of Advanced
Computer Science and Applications (IJACSA), December 2012.
• Ayman Kaheel, Motaz El-Saban, Mostafa Izz and Mahmoud Refaat, “Employing 3D Accelerometer Information for Fast and Reliable Image Features Matching on mobile
devices”, ICME 2012 workshop on hot topics in mobile multimedia.
• Alaa Abd El Hakeem and Motaz El-Saban and, “FRPCA: Fast Robust Principal Component Analysis”, ICPR 2012.
• Meena Abd El Meseeh, Islam Badr El Deen, Mohammad Abd El Kader and Motaz El-Saban, “Combining global and local information for within-category object class
recognition”, ICPR 2012.
• NourElDin Laban, Motaz ElSaban, Ayman Nasr and Hoda Onsi, “System Refinement for Content Based Satellite Image Retrieval”, The Egyptian Journal of Remote Sensing
and Space Sciences, 2012.
• Alaa Abd El Hakeem and Motaz El-Saban, “Distortion Impact on Low-Dimensional Manifold Recovery of High-Dimensional Data”, Taibah University International
Conference on Computing and Information Technology (ICCIT 2012).
• Alaa Abd El Hakeem and Motaz El-Saban, “Face Authentication Using Graph-Based Low-Rank Representation of Face Components”, ICCV 2011 workshop on Mobile
Vision.
• Mahmoud Refaat , M. El-Saban and Ayman Kaheel, “Active Feedback for Enhancing the Construction of Panoramic Live Mobile Video Streams”, ICME 2011, full paper.
46
Publications
• Motaz El-Saban, Xin-Jing Wang, Noran Hasan, Mahmoud Bassiouny and Mahmoud Refaat, “Seamless annotation and enrichment of mobile captured video streams in real-time”, ICME
2011 application/industrial short paper.
• Mostafa S. Ibrahim, Motaz El Saban, "Higher order potentials with superpixel neighbourhood (HSN) for semantic image segmentation", ICIP 2011.
• Motaz El-Saban, Mostafa Izz, Ayman Kaheel and Mahmoud Refaat, "Improved optimal seam selection blending for fast video stitching of videos captured from freely moving devices",
ICIP 2011.
• Mohammad Nael, Moataz Abd El Wahab, Motaz El-Saban and Mikhail Wasfy, “Highly Efficient Human Action Recognition using compact 2DPCA-based descriptors in the Spatial and
Transform domains”, invited paper at the Midwest Symposium on Circuits and Systems (MWSCAS) 2011, special session.
• Mohammad Nael, Moataz Abd El Wahab and Motaz El-Saban, “Multi-view Human Action Recognition System Employing 2DPCA”, WACV 2011.
• Mahmoud Bassiouny and Motaz El-Saban, “Object Matching Using Feature Aggregation Over a Frame Sequence”, WACV 2011.
• Motaz El-Saban, Mostafa Izz and Ayman Kaheel , “Fast stitching of videos captured from freely moving devices by exploiting temporal redundancy”, ICIP 2010.
• Mohammad El Deeb and Motaz El-Saban, “Human age estimation using enhanced bio-inspired features (EBIF)”, ICIP 2010.
• A. Kaheel, M. El-Saban, M. Refaat and M. Izz, “Mobicast: A System for Collaborative Event Casting Using Mobile Phones”, ACM-MUM 09.
• M. El-Saban, M. Refaat, A. Kaheel and A. Abd El-Hameed, “Stitching videos streamed by mobile phones in real-time“, ACM-MM 09 (technical demonstration)
• M. Eldib, B. Abou Zaid, H. Zawbaa, M. El-Zahar, M. El-Saban, “Soccer video summarization using enhanced logo detection', ICIP 2009.
• Waleed Magdy, Kareem Darwish and Motaz El-Saban, “Efficient Language-Independent Retrieval of Printed Documents without OCR”, in SPIRE 2009.
• Nayer Wanas, Motaz El-Saban, Heba Ashour, Waleed Ammar, "Automatic Scoring of Online Discussion Posts", CIKM Second Workshop on Information Credibility on the Web (WICOW
2008)
• M. El-Saban et al. Automated tracking and modelling of microtubule dynamics, International Symposium of biomedical imaging, IEEE International Symposium on Biomedical Imaging
(ISBI) 06.
• B. S. Manjunath, B. Sumengen, Z. Bi, J. Byun, M. El-Saban, D. Fedorov, N. Vu, Towards Automated Bioimage Analysis: from features to semantics, IEEE International Symposium on
Biomedical Imaging (ISBI), invited paper.
• A. Altinok, M. E-Saban et al. Activity Recognition in Microtubule Videos by Mixture of Hidden Markov Models, Proc. International Conference on Computer Vision and Pattern
Recognition (CVPR), 2006.
• S. Bhagavaty and M. A. El Saban. SketchIt: Basketball Video Retrieval Using Ball Motion Similarity, in Advances in Multimedia Information Processing - PCM 2004: 5th Pacific Rim
Conference on Multimedia, Tokyo, Japan.
• M. A. El Saban and B. S. Manjunath. Interactive Segmentation Using Curve Evolution and Relevance Feedback. ICIP 2004
• M. A. El Saban and B. S. Manjunath. Video Region Segmentation by Spatio-temporal watersheds. ICIP 2003.
• M. A. El Saban, S. Abd El-Azeem and M. Rashwan. A new video coding scheme based on the H.263 standard and entropy constrained vector quantization. Faculty of engineering journal,
Nov. 2000.
• M. A. El Saban, S. Abd El-Azeem and M. Rashwan. A new video coding scheme based on the H.263 standard and entropy constrained vector quantization. ICII 2000, Kuwait, Nov. 2000.
47
Patents
• Waleed Magdi and Motaz El-Saban, Personalized notification of live events (U.S. patent #8,881,191)
• Waleed Magdi and Motaz El-Saban, Using An Id Domain To Improve Searching (U.S. granted patent #8,131,720
and #8,538,964 (continuation))
• Heba Ashour, Nayer Wanas, Mostafa El Baradei and Motaz El-Saban, User Evaluation in a Collaborative Online
Forum (U.S. patent #8,893,024)
• Ayman Kaheel, Motaz El-Saban, Mohamed Shawky and Mahmoud Refaat, Sharing video data associated with the
same event (U.S. patent #8,767,081)
• Motaz El-Saban, Christopher Burges and Qiang Wu, Re-ranking top search results (U.S. patent #8,661,030)
• Motaz El-Saban, Ayman Kaheel, Mahmoud Refaat and Ahmad Abd El Hameed, Composite video generation (U.S.
patent #8,605,783)
• Ayman Kaheel, Motaz El-Saban, Mahmoud Refaat, Ahmad El Arabawy, Mostafa Baradei Using accelerometer
information for determining orientation of pictures and video images (U.S. patent pending)
• Motaz El-Saban, Xin-Jing Wang and May Sayed, Real-Time Annotation And Enrichment Of Captured Video (U.S.
patent #8,903,798)
• James Lau, Ayman Kaheel, Motaz El-Saban Mohammad Shawky, Monica Gonzales, Ahmed El Baz, Tamer Deif and
Alaa Abd El Hakeem, Using facial data for device authentication or subject identification (U.S. patent pending)
• Motaz El-Saban, Ayman Kaheel, Mohammad Shawky and James Lau, Modifying video regions using mobile device
input (U.S. patent pending)
• Pushmeet Kohli, Jamie Shotton and Motaz El Saban, Synthesizing Training Samples for Object Recognition (U.S.
patent #8,903,167)
• Alaa Abd El Hakeem and Motaz El-Saban, Dynamic update of recovered subspaces of high dimensional
• Motaz et al, Natural language search of images and navigation
48

Contenu connexe

Tendances

Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...webhostingguy
 
Learning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationLearning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationNAVER Engineering
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learningNAVER Engineering
 
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 
Kadir A_20160804_res_tea
Kadir A_20160804_res_teaKadir A_20160804_res_tea
Kadir A_20160804_res_teaKadir A Peker
 
Jia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realityNAVER Engineering
 
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 
HUMAN MOTION DETECTION AND TRACKING FOR VIDEO SURVEILLANCE
HUMAN MOTION DETECTION AND TRACKING FOR VIDEO SURVEILLANCEHUMAN MOTION DETECTION AND TRACKING FOR VIDEO SURVEILLANCE
HUMAN MOTION DETECTION AND TRACKING FOR VIDEO SURVEILLANCEAswinraj Manickam
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye viewRoelof Pieters
 
Human Photogrammetry: Foundational Techniques for Creative Practitioners
Human Photogrammetry: Foundational Techniques for Creative PractitionersHuman Photogrammetry: Foundational Techniques for Creative Practitioners
Human Photogrammetry: Foundational Techniques for Creative Practitionersijcga
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentationMrsShwetaBanait1
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Universitat Politècnica de Catalunya
 
Deep Learning in real world @Deep Learning Tokyo
Deep Learning in real world @Deep Learning TokyoDeep Learning in real world @Deep Learning Tokyo
Deep Learning in real world @Deep Learning TokyoPreferred Networks
 
auto-assistance system for visually impaired person
auto-assistance system for visually impaired personauto-assistance system for visually impaired person
auto-assistance system for visually impaired personshahsamkit73
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...Ahmed Gad
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep LearningDavid Khosid
 
Introduction to Object recognition
Introduction to Object recognitionIntroduction to Object recognition
Introduction to Object recognitionAshiq Ullah
 

Tendances (20)

Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...Multimedia Information Retrieval: What is it, and why isn't ...
Multimedia Information Retrieval: What is it, and why isn't ...
 
Learning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationLearning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identification
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
 
Kadir A_20160804_res_tea
Kadir A_20160804_res_teaKadir A_20160804_res_tea
Kadir A_20160804_res_tea
 
Jia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum Vitae
 
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed realitySynthesizing pseudo 2.5 d content from monocular videos for mixed reality
Synthesizing pseudo 2.5 d content from monocular videos for mixed reality
 
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
 
HUMAN MOTION DETECTION AND TRACKING FOR VIDEO SURVEILLANCE
HUMAN MOTION DETECTION AND TRACKING FOR VIDEO SURVEILLANCEHUMAN MOTION DETECTION AND TRACKING FOR VIDEO SURVEILLANCE
HUMAN MOTION DETECTION AND TRACKING FOR VIDEO SURVEILLANCE
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
 
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
 
Human Photogrammetry: Foundational Techniques for Creative Practitioners
Human Photogrammetry: Foundational Techniques for Creative PractitionersHuman Photogrammetry: Foundational Techniques for Creative Practitioners
Human Photogrammetry: Foundational Techniques for Creative Practitioners
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentation
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
 
Deep Learning in real world @Deep Learning Tokyo
Deep Learning in real world @Deep Learning TokyoDeep Learning in real world @Deep Learning Tokyo
Deep Learning in real world @Deep Learning Tokyo
 
auto-assistance system for visually impaired person
auto-assistance system for visually impaired personauto-assistance system for visually impaired person
auto-assistance system for visually impaired person
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
 
Introduction to Object recognition
Introduction to Object recognitionIntroduction to Object recognition
Introduction to Object recognition
 

Similaire à TechnicalBackgroundOverview

2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overviewLEE HOSEONG
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong LeeMoazzem Hossain
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision Chen Sagiv
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learningpratik pratyay
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionSuhas Pillai
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMeetupDataScienceRoma
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 
Aplications for machine learning in IoT
Aplications for machine learning in IoTAplications for machine learning in IoT
Aplications for machine learning in IoTYashesh Shroff
 
Weave-D - 2nd Progress Evaluation Presentation
Weave-D - 2nd Progress Evaluation PresentationWeave-D - 2nd Progress Evaluation Presentation
Weave-D - 2nd Progress Evaluation Presentationlasinducharith
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareTigerGraph
 
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSravan Puttagunta
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsPetteriTeikariPhD
 

Similaire à TechnicalBackgroundOverview (20)

2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
Introduction talk to Computer Vision
Introduction talk to Computer Vision Introduction talk to Computer Vision
Introduction talk to Computer Vision
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity Recognition
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 
Aplications for machine learning in IoT
Aplications for machine learning in IoTAplications for machine learning in IoT
Aplications for machine learning in IoT
 
Weave-D - 2nd Progress Evaluation Presentation
Weave-D - 2nd Progress Evaluation PresentationWeave-D - 2nd Progress Evaluation Presentation
Weave-D - 2nd Progress Evaluation Presentation
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Computer Vision Workshop
Computer Vision WorkshopComputer Vision Workshop
Computer Vision Workshop
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan PuttaguntaSPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
SPAR 2015 - Civil Maps Presentation by Sravan Puttagunta
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 

TechnicalBackgroundOverview

  • 1. Text, Image and Speech Understanding: it’s all about Learning from Data Motaz El Saban Senior applied research, Microsoft T&R, Cairo Lab Associate professor faculty of computers and information, Cairo university
  • 2. Technical Focus • Analyzing, modeling, learning and predicting in various domains to make sense of digital media content • Early on: better analysis for video compression • Ph.D.: automatic tracking and analysis of sub cellular structures in time lapse videos • Post Ph.D. – Visual object recognition (NevenVision & Google) – Textual web rankers (Arabic) (Microsoft) – Analyzing content on social networks – Mobile multimedia experiences • Automatic panoramic video construction • Automatic video tagging – Object/scene recognition/detection (face reco and expression as special cases) – Activity recognition in videos (RGB/RGB+D) – Speaker recognition
  • 3. Application Scenarios Web search (Text ranking, media tagging (e.g. object detection)) Visual similarity search (clothes/products) Ranking user generated content on forums Searching document archives (OCRLess) Real-time classifiers for mobile phones (document-like content) Access control (face unlocking) Enhanced visual communication (background replacement in Skype) Activity recognition for Kinect
  • 4. Agenda • Ph.D. work: Microtubules • Real-time video stitching – Basic framework – Active Feedback for stitching – Exploiting frame correlation – Exploiting mobile sensors • Annotating mobile generated videos – Main concept – Matching using frame information aggregation – Propagating tags over frames • Object recognition – Detection – Segmentation – General object/specific – Activity in videos – Facial expression recognition using DNNs 4
  • 5. Ph.D. thesis: Microtubule tracking and analysis 5
  • 7. MT analysis: HMMs • HMM for each experimental condition of MT • Distances between HMMs estimated • MDS embedding of distances for visualization 7
  • 9. Early Work in Microsoft…
  • 10. Ranking User Content on Forums • Forums are conversational social cyberspaces constituting rich repositories of content and an important source of collaborative knowledge • However, most of this knowledge is buried inside the forum infrastructure and its extraction is both complex and difficult • In this work, focus on automatic rating of postings in online discussion forums for easier content access
  • 11. Features & Classifier Classifier: Non-linear SVM into three levels (M/H/L) Relevance (OnSubForumTopic, OnThreadTopic,…) Originality (OverlapPrevious,…) Forum-specific features (Referencing,…) Surface features (Timeliness, Lengthiness,…) Posting-component features (Weblinks, Questioning, …)
  • 12. Dataset • Discussion threads from Slashdot • 200 threads with a maximum of 200 posts from the 14 sub- forums • A total of 120,000 posts were scraped • Posts on Slashdot are rated on a scale from -1 (irrelevant) to 5 (high quality posts) • Default rating for a registered user is 1 and for an unregistered user is 0 • Posts rated as 0 were removed, unless they were from a registered user • Final dataset was composed of 20,008 rated posts, which were clustered into three groups, namely low, medium, and high, according to their value
  • 13. Results High Medium Low F1-measure 0.61 0.42 0.46 Relative Accuracy and F1-measure for each metric category
  • 14. Building an Arabic Web Ranker • English rankers for Arabic documents is suboptimal • Re-training an Arabic ranker from scratch suffers from lack of sufficient training data • Solution: start from English ranker, fine tune with Arabic specific data for top returned results
  • 15. Agenda • Ph.D. work: Microtubules • Real-time video stitching – Basic framework – Active Feedback for stitching – Exploiting frame correlation – Exploiting mobile sensors • Annotating mobile generated videos – Main concept – Matching using frame information aggregation – Propagating tags over frames • Object recognition – Detection – Segmentation – General object/specific – Activity in videos – Facial expression recognition using DNNs • Initial thoughts/Ideas on Smarter Urban Dynamics 15
  • 16. Real time Stitching Estimating geometric and photometric mapping 16A novel research agenda, published over a number of articles and resulted in a recent book chapter
  • 17. Stitching Pipeline Extract Interest points Match between frame pairs Solve geometric transform (with RANSAC) Photometric alignment
  • 20. Active Feedback Results Pr Re F1 Overlap NoFeedback 0.95 0.49 0.65 73.48 With Feedback 0.97 0.65 0.78 58.85
  • 21. Frame Correlation Results 21 recall precision Average time (ms) SIFT + no time info usage 0.37 0.46 414 SURF + no time info usage 0.32 0.48 162 SURF + Motion Vector estimation 0.23 0.38 128 SURF + overlapping region 0.27 0.48 149 SURF + Optical Flow 0.32 0.47 103
  • 22. Rotational angles from 3-D accelerometers 22 Mobile phone axis Device coordinates accelerations (front view)
  • 24. Agenda • Ph.D. work: Microtubules • Real-time video stitching – Basic framework – Active Feedback for stitching – Exploiting frame correlation – Exploiting mobile sensors • Annotating mobile generated videos – Main concept – Matching using frame information aggregation – Propagating tags over frames • Object recognition – Detection – Segmentation – General object/specific – Activity in videos – Facial expression recognition using DNNs 25
  • 27. Results Effect of different time window size N when sampling rate (a) S = 5, (b) S = 25
  • 28. Agenda • Ph.D. work: Microtubules • Real-time video stitching – Basic framework – Active Feedback for stitching – Exploiting frame correlation – Exploiting mobile sensors • Annotating mobile generated videos – Main concept – Matching using frame information aggregation – Propagating tags over frames • Object recognition – Detection – Segmentation – General object/specific – Activity in videos – Facial expression recognition using DNNs 29
  • 29. Object Recognition Object recognition Object detection General object (with MSRC, ICIP 2013) Specific such as face and expression recognition (ICCV 2011, ICIP 2015) Object segmentation (ICIP 2011, ICIP 2014) Instance recognition (with MSRA, ICME 2011, Car recognition ICPR 2012) Activity Recognition RGB: with NU, WACV 2011, MWSCS 2011 RGB+D: IJCAI 2013 30
  • 30. Activity Recognition in RGBD Videos Skeleton joint locations and names as captured by the Kinect sensor
  • 32. Trajectory Descriptor Pt is the position of the joint at time t. This figure shows a general displacement between Pt and Pt+1. In this example, the trajectory is described by a histogram of 8 bins. For each displacement, the angle and the length of the displacement are calculated.
  • 33. Sample Trajectory From the Action3D dataset, this figure shows the three projections of the Right Hand joint when a subject is performing the action of High Arm Wave. For each projection, HOD is used to describe the movement.
  • 35. Classification Accuracy Comparison Method Accuracy Recurrent Neural Network 42.5% Hidden Markov Model 78.97% Action Graph on Bag of 3D Points 74.7% Random Occupancy Patterns 86.5% Actionlets Ensemble 88.2% HOD 91.26%
  • 36. Facial Expression Recognition in the Wild using Rich Deep Features • Traditionally, researchers used hand crafted features to represent images: texture, color, gradients, Histogram of these (e.g. Hist of gradients) • Then came Hinton and produced impressive results on ImageNet with deep Nets with almost raw input
  • 37. DeepNet Krizhevsky, Sutskever & Hinton • 5 conv layers, 2 fully connected • Uses raw image data as input (centered) • Best results on ImageNet 2012 (~15% top 5 error) • Trained by standard backprop, SGD with mini batches, dropout and ReLU • Successful because: – More training data 1.2 M images/1000 classes – More computation power (GPUs) After: ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al. NIPS 2012
  • 38. Deep Net as Hierarchical Feature Extractors • Shown in literature After: Visualizing and Understanding Convolutional Networks, Matthew D. Zeiler and Rob Fergus, ECCE 2014
  • 39. Facial Expression Recognition in the Wild using Rich Deep Features Selected among top 10% papers in ICIP 2015 40 Idea: use deep features from Krishvesky net for facial expression but with added domain knowledge
  • 41. Results Facial parts contribution Comparative results on different datasets
  • 42. AutoCaption: Automatic Caption Generation for Personal Photos 43
  • 43. Sample Results (a) An example with two people interacting. (b) An example with a prominent landmark. (c) An example where the caption is personalized to the user. (d) An example of a caption when no people are present. (e) An example of a caption for a photo with no GPS. (f) An example of a caption with a scene classification error.
  • 44. Publications • Book Chapter – Motaz El Saban and Ayman Kaheel, “Panoramic video construction from mobile video streams” in Mobile and Cloud Visual Media Computing, Springer (under preparation). 45
  • 45. Publications • Abbubakrelsedik Karali, Ahmad Bassiouny and Motaz El Saban, “Facial Expression Recognition in the Wild Using Rich Deep Features”, ICIP 2015 (selected as one of the top 10% papers in ICIP 2015). • Amr Sharaf, Mohammad Hussein, Marwan Torki and Motaz EL Saban, “Real-time Multi-scale Action Detection From 3D Features”, WACV 2015. • Ahmed Bassiouny and Motaz El-Saban, “Semantic segmentation as image representation for scene recognition”, ICIP 2014. • Krishnan Ramnath, Simon Baker, Anitha Kannan, Lucy Vanderwende, Michel Galley, Yi Yang, Deva Ramanan, Motaz El-Saban, Noran Hasan, Lorenzo Torresani, Sudipta Sinha, “Auto Caption”, WACV 2014. • Mostafa Izz, Alaa Abd El Hakeem and Motaz El-Saban, “Graph-Based Superpixel Labeling for Enhancement of Online Video Segmentation”, ICIIP 2013. • Mohammad Gowayyed, Marwan Torki, Mohamed Hussein, Motaz El-Saban, “Histogram of Oriented Displacements (HOD): Describing Trajectories of Human Joints for Action Recognition”, IJCAI 2013. • Mohamed Hussein, Marwan Torki, Mohammad Gowayyed, Motaz El-Saban, “Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations”, IJCAI 2013 • Osama Khalil, Mohammad Fathy, Dina Khalil, Motaz El Saban, Pushmeet kohli and Jamie Shotton, “Synthetic Training in Object Detection”, ICIP 2013. • Noran Hasan, Ahmad Fathy, Tamer Deif, Ramy Shahin and Motaz El Saban, “Using Skin Segmentation to Improve Similar Product Recommendations in Online Clothing Stores”, VISAPP 2013. • Noureldin Laban, Motaz ElSaban, Ayman Nasr and Hoda Onsi, “Spatial cloud detection and retrieval system for satellite images”, International Journal of Advanced Computer Science and Applications (IJACSA), December 2012. • Ayman Kaheel, Motaz El-Saban, Mostafa Izz and Mahmoud Refaat, “Employing 3D Accelerometer Information for Fast and Reliable Image Features Matching on mobile devices”, ICME 2012 workshop on hot topics in mobile multimedia. • Alaa Abd El Hakeem and Motaz El-Saban and, “FRPCA: Fast Robust Principal Component Analysis”, ICPR 2012. • Meena Abd El Meseeh, Islam Badr El Deen, Mohammad Abd El Kader and Motaz El-Saban, “Combining global and local information for within-category object class recognition”, ICPR 2012. • NourElDin Laban, Motaz ElSaban, Ayman Nasr and Hoda Onsi, “System Refinement for Content Based Satellite Image Retrieval”, The Egyptian Journal of Remote Sensing and Space Sciences, 2012. • Alaa Abd El Hakeem and Motaz El-Saban, “Distortion Impact on Low-Dimensional Manifold Recovery of High-Dimensional Data”, Taibah University International Conference on Computing and Information Technology (ICCIT 2012). • Alaa Abd El Hakeem and Motaz El-Saban, “Face Authentication Using Graph-Based Low-Rank Representation of Face Components”, ICCV 2011 workshop on Mobile Vision. • Mahmoud Refaat , M. El-Saban and Ayman Kaheel, “Active Feedback for Enhancing the Construction of Panoramic Live Mobile Video Streams”, ICME 2011, full paper. 46
  • 46. Publications • Motaz El-Saban, Xin-Jing Wang, Noran Hasan, Mahmoud Bassiouny and Mahmoud Refaat, “Seamless annotation and enrichment of mobile captured video streams in real-time”, ICME 2011 application/industrial short paper. • Mostafa S. Ibrahim, Motaz El Saban, "Higher order potentials with superpixel neighbourhood (HSN) for semantic image segmentation", ICIP 2011. • Motaz El-Saban, Mostafa Izz, Ayman Kaheel and Mahmoud Refaat, "Improved optimal seam selection blending for fast video stitching of videos captured from freely moving devices", ICIP 2011. • Mohammad Nael, Moataz Abd El Wahab, Motaz El-Saban and Mikhail Wasfy, “Highly Efficient Human Action Recognition using compact 2DPCA-based descriptors in the Spatial and Transform domains”, invited paper at the Midwest Symposium on Circuits and Systems (MWSCAS) 2011, special session. • Mohammad Nael, Moataz Abd El Wahab and Motaz El-Saban, “Multi-view Human Action Recognition System Employing 2DPCA”, WACV 2011. • Mahmoud Bassiouny and Motaz El-Saban, “Object Matching Using Feature Aggregation Over a Frame Sequence”, WACV 2011. • Motaz El-Saban, Mostafa Izz and Ayman Kaheel , “Fast stitching of videos captured from freely moving devices by exploiting temporal redundancy”, ICIP 2010. • Mohammad El Deeb and Motaz El-Saban, “Human age estimation using enhanced bio-inspired features (EBIF)”, ICIP 2010. • A. Kaheel, M. El-Saban, M. Refaat and M. Izz, “Mobicast: A System for Collaborative Event Casting Using Mobile Phones”, ACM-MUM 09. • M. El-Saban, M. Refaat, A. Kaheel and A. Abd El-Hameed, “Stitching videos streamed by mobile phones in real-time“, ACM-MM 09 (technical demonstration) • M. Eldib, B. Abou Zaid, H. Zawbaa, M. El-Zahar, M. El-Saban, “Soccer video summarization using enhanced logo detection', ICIP 2009. • Waleed Magdy, Kareem Darwish and Motaz El-Saban, “Efficient Language-Independent Retrieval of Printed Documents without OCR”, in SPIRE 2009. • Nayer Wanas, Motaz El-Saban, Heba Ashour, Waleed Ammar, "Automatic Scoring of Online Discussion Posts", CIKM Second Workshop on Information Credibility on the Web (WICOW 2008) • M. El-Saban et al. Automated tracking and modelling of microtubule dynamics, International Symposium of biomedical imaging, IEEE International Symposium on Biomedical Imaging (ISBI) 06. • B. S. Manjunath, B. Sumengen, Z. Bi, J. Byun, M. El-Saban, D. Fedorov, N. Vu, Towards Automated Bioimage Analysis: from features to semantics, IEEE International Symposium on Biomedical Imaging (ISBI), invited paper. • A. Altinok, M. E-Saban et al. Activity Recognition in Microtubule Videos by Mixture of Hidden Markov Models, Proc. International Conference on Computer Vision and Pattern Recognition (CVPR), 2006. • S. Bhagavaty and M. A. El Saban. SketchIt: Basketball Video Retrieval Using Ball Motion Similarity, in Advances in Multimedia Information Processing - PCM 2004: 5th Pacific Rim Conference on Multimedia, Tokyo, Japan. • M. A. El Saban and B. S. Manjunath. Interactive Segmentation Using Curve Evolution and Relevance Feedback. ICIP 2004 • M. A. El Saban and B. S. Manjunath. Video Region Segmentation by Spatio-temporal watersheds. ICIP 2003. • M. A. El Saban, S. Abd El-Azeem and M. Rashwan. A new video coding scheme based on the H.263 standard and entropy constrained vector quantization. Faculty of engineering journal, Nov. 2000. • M. A. El Saban, S. Abd El-Azeem and M. Rashwan. A new video coding scheme based on the H.263 standard and entropy constrained vector quantization. ICII 2000, Kuwait, Nov. 2000. 47
  • 47. Patents • Waleed Magdi and Motaz El-Saban, Personalized notification of live events (U.S. patent #8,881,191) • Waleed Magdi and Motaz El-Saban, Using An Id Domain To Improve Searching (U.S. granted patent #8,131,720 and #8,538,964 (continuation)) • Heba Ashour, Nayer Wanas, Mostafa El Baradei and Motaz El-Saban, User Evaluation in a Collaborative Online Forum (U.S. patent #8,893,024) • Ayman Kaheel, Motaz El-Saban, Mohamed Shawky and Mahmoud Refaat, Sharing video data associated with the same event (U.S. patent #8,767,081) • Motaz El-Saban, Christopher Burges and Qiang Wu, Re-ranking top search results (U.S. patent #8,661,030) • Motaz El-Saban, Ayman Kaheel, Mahmoud Refaat and Ahmad Abd El Hameed, Composite video generation (U.S. patent #8,605,783) • Ayman Kaheel, Motaz El-Saban, Mahmoud Refaat, Ahmad El Arabawy, Mostafa Baradei Using accelerometer information for determining orientation of pictures and video images (U.S. patent pending) • Motaz El-Saban, Xin-Jing Wang and May Sayed, Real-Time Annotation And Enrichment Of Captured Video (U.S. patent #8,903,798) • James Lau, Ayman Kaheel, Motaz El-Saban Mohammad Shawky, Monica Gonzales, Ahmed El Baz, Tamer Deif and Alaa Abd El Hakeem, Using facial data for device authentication or subject identification (U.S. patent pending) • Motaz El-Saban, Ayman Kaheel, Mohammad Shawky and James Lau, Modifying video regions using mobile device input (U.S. patent pending) • Pushmeet Kohli, Jamie Shotton and Motaz El Saban, Synthesizing Training Samples for Object Recognition (U.S. patent #8,903,167) • Alaa Abd El Hakeem and Motaz El-Saban, Dynamic update of recovered subspaces of high dimensional • Motaz et al, Natural language search of images and navigation 48

Notes de l'éditeur

  1. posts rated as-1,0,1 or 2 were clustered as low Posts rated as 3 were considered medium. While posts rated 4 or 5 were clustered as high since this reflects that they have definitely been manually rated more than twice.
  2. Challenges Low computational power MM of low resolution, FOV and high compression Battery constrained Opportunities Ubiquity Collaborate for a better MM experience, viewing or producing Many sensors Enable better content access (e.g. by location filtering) Enable fusion of modalities (e.g. in matching)
  3. The value of |Y| is theoretically 9.80665, however because of the imperfections of the 3D accelerometer the value need to be calculated as |Y| = sqrt(ax^2+ay^2+az^2)