Computer Vision and Deep Learning are increasingly establishing their connection into our daily lives, from consumers to businesses, but their future direction isn’t exactly clear. Having access to a one-day learning intensive with the industry’s corporate and startup pioneering research minds is exactly what IMVC 2019 provided this year and the overall sentiment paints an exciting future for video and image processing along with deep learning developments.
The keynotes and technical lectures dealt with the forefront of computer vision innovations, and in light of the wealth of information shared, we decided to summarize some noteworthy presentations and their key insights.
Here’s a summary of the startups and industry speaker presentations, and academic AI research papers and competition winners:
Shmuel Peleg, Co-founder and Chief Scientist
Looking at video affects your sound perception, and it’s a tricky problem to solve when viewing many people speaking. There’s plenty of video recognition research breaking barriers but seldom does it tackle audio too. Vid2speech is Briefcam’s new project that aims to accurately isolate the audio and video cues based on a dilated convolutional neural network in an audio-visual deep model architecture. The technology enhances the speech of the subject when using a video that shows the face of the speaker(s). It can be applied to isolating the audio of spoken video. Professor Peleg demonstrated the technology on ESPN’s sports talk show First Take with their notorious personalities Stephen A Smith and Max Kellerman, who commonly talk on top of each other, resulting in incoherent content. Peleg demonstrated Vid2speech’s ability to mute the audio of one of the commentators while both were yelling simultaneously.
Aya Soffer, VP of AI Tech
Aya Soffer stated audio and image recognition has been conquered by AI computers, surpassing the skill of humans. But the eclipse hasn’t happened for video recognition yet. Still, human intuition can identify the content of a video better than the best deep learning algorithms. Video comprehension is a major focus for IBM’s research team directed at enterprise applications. Their team has created a new mega video data set called Moments In Time, in collaboration with MIT University, comprised of millions of video clips to support their development efforts, such as segmentation of video into semantic scenes, interpolating full videos scenes from select frames, recognizing objects and humans actions, summarized highlights of a video and few shot learning for object recognition.
Soliman Nasser, Lead Research and Development
Nasser began to explain the geometry of the human eye before detailing their technology for eye tracking. They track the eye movements from RGB cameras in order to predict the gaze point and consequently gaze vectors, the connection between your eye point and gaze point. They reconstruct a 3D model of the eye to find the intrinsic parameters of the eye and define the optical axis of each eye, and consequently calculate the point of gaze using triangulation methods. They crowd source their data (real and synthetic) and feed it into a deep learning algorithm to predict the gaze vector. The market is largely saturated with intrusive infrared sensors whereas Blink Technologies’ algorithms works on any RGB camera in real world environments with limited CPU computation power and without GPU units.
Ofir Bibi, Director of Research
According to Bibi, around 50% of the pixels in the typical image are composed of the sky, and in order to achieve the best capture, Lightricks believes the sky portion should be aesthetically perfect. The startup’s research team trained a neural network to segment the sky background on still images, that’s integrate into their highly popular Quickshot application. Video, similarly poses a greater challenge and the startup developed an algorithm for automatic replacement of the sky portion of video clips on composited videos with enhanced skys. They achieve temporal consistency in the segmentation network by using a complex feedback loop and novel infrastructure and their algorithm trains by actually adding noise to the data in all kinds of areas to extend and dilate the boundaries that mimic the errors of real video. Their work was conducted by Tavi Halperin, Harel Cain and Michael Werman and will be presented at Eurographics 2019.
Adham Ghazali, co-founder and CEO
Autonomous driving software is a complex problem. Depth is the major drawback for cameras, and LiDar doesn’t have high resolution information on active perception. Even the leading contender in the Autonomous Driving race, Waymo, has made advances in the localization by creating high definition maps and human labor (1 hour of self driving = 800 hours of annotation to build a high definition map) and SLAM for optimization. But if a map changes, they need to redo the procedure again which poses a problem to scaling. Imagry developed a novel architecture for autonomous driving software that’s able to localize, perceive and drive in unknown environment – learning a driving policy without a heuristic. It’s capable of training on one city’s roads, and drive autonomously in another city. The startup also invented a motion path finding algorithm called Aleph* (open sourced), which takes two hours to train on a laptop for a new function, exhibited promising run time efficiency, and compared well to industry benchmark Nstep Duke UN. Mangan encourage the industry to conduct a comparison to Monte Carlo research.
Shmoolik Mangan, Algorithms Development Manager
VayaVision developed a low level raw data fusion and perception system for autonomous vehicles, as opposed to object fusion architecture. They use unified algorithms – unsupervised neural networks – to detect objects based on the perceived environment from all pixel level data from RGBd sensor data. Deep learning and computer vision algorithms are applied to the upsampled HD 3D model. While the typical sensor set for autonomous vehicles consists of several sensor types, upon sensor failure, the startup’s novel redundancy mechanisms allow the vehicle to continue driving and not become stranded on the active road, albeit continue at reduced speeds and only to the nearest vehicle garage.
Jacob Gildenblat, Co-founder and CTO
Automating cell detection requires annotating large amounts of data, which is usually very unbalance and to alleviate the arduous task, DeePathology.ai developed the Cell Detection Studio, a DIY tool for pathologists develop their own AI solutions to train deep learning cell detection algorithms on their own data. Using this tool, Pathologists can easily create and train deep learning cell detection algorithms off of their own data sets, with the help of Active Learning.
Yael Pritch, Leader of Goole Perception AI
The Google team is invested in putting computational photography and Artificial Intelligence inside mobile cameras. According to Pritch, software advancements are the current innovation in the mobile camera today, opposed to the optical hardware that’s constricted by the physical form factors. Google’s innovation in two new camera upgrades were on display – portrait mode and night sight – detailing the convolutional neural networks and phase different auto-focus that’s unanimously won over conventional camera critics. Night sight is a feature for the Pixel 1,2,3, which gives you a high quality capture in a low-level light environment. The portrait mode makes use of a machine learning network that learns to segment people from images and a mechanism that utilizes another lens pixel for autofocus which is repurposed to create a high quality depth map. The Google Pixel 2 computational camera won the DPReview Awards in 2017, and the Google Pixel 3 won the same award in 2018. It’s the leading camera today and fueled by the AI software. Everything is running on the device in thermal and memory limitations on the Pixel 1 2 and 3, and the Pixel software is back compatible to all models.
STUDENT COMPETITION WINNERS
Graduate: 3D Point Cloud Classification and Segmentation using 3D Modified Fisher Vector Representation for Convolutional Neural Networks
By: Itzik Ben Shabat
Solving the classification problem: with the abundance of RGB cameras, Ben-Shabat sought to solve the 3D point cloud with deep learning methods, even though the power of Convolutional Neural Networks yields different results on point clouds as they do on images due to the inherent challenges of structure, order and sample size. To use the power of CNNs, they proposed a new hybrid representation which relies on point statistics.
The work was also extended to localized geometry which was accepted to CVPR 2019.
The group trained a generative model that generates faces according to a measurement of beauty using their unsupervised beautification model.
STUDENT RESEARCH PAPERS
Spoofing in Face Recognition Systems Based on Projective Invariants and Stereo Recording
Alexander Naitsat, Technion
On Occlusion Removal in Ultrasound Imaging
Yossef Cohen, Technion
Lung Structure Enhancement in Chest Radiographs via CT based FCNN Trainning
Ophir Gozes, Tel Aviv University
R(D)-Constrained Adaptive Quantization for Coding and De-Noising of Medical Images
Shira Nemirovsky-Rotman, Technion
SOS Boosting for Image Deblurring Algorithms
Shahar Romem Peled, Technion
Spatio-Temporal Detection of Cumulonimbus Clouds in Infrared Satellite Images
Ron Dorfman and Etai Wagner, Technion
From Unable to Enable, How Multiplicative Beamforming Can Enable Advanced Image Processing Techniques in Portable Ultrasound Devices
Omri Soceanu, Technion
Early Detection of Cancer Using Thermal Video Analysis
Idun Barazani and Ori Bryt, Technion
Automated Assembly of Polygonal Jigsaw Puzzles
Peleg Harel, Ben Gurion University
Modeling the Retinal Cone Mosaic and Its Use for Individual Authentication
Keren Berger, Ben Gurion University
On Sweeping Patterns Classification
Avner Atias, Technion
On the Role of Geometry in Geo-Localization
Moti kadosh, IDC Herzliya
Explorations and Lessons Learnt in Building an Autonomous Formula Sae Car From Simulations
Dean Zadok, Amir Biran and Tom Hirshberg, Technion
Network Adaptation Strategies for Learning New Classes Without Forgetting the Original Ones
Hagai Taitelbaum, Bar Ilan University
Autonomous Aero Optics Using Active flow Control for Vision Sensor Cleaning
David Menicovich, Actasys Inc.
Decision Trees Based Image Segmentation Using Ensemble Clustering
Yaron Levi, Yezreel Valley College
Form a Deep Learning Model Back to The Brain – Inferring Morphological Markers and Their Relation to Aging
Gidon Levakov, Ben Gurion University
Fractal Features and Local Phase Information in Textures Recognition
Samah Khawaled, Technion
Microscopy Cell Segmentation via Convolutional LSTM Networks
Assaf Arbelle, Ben Gurion University
Accelerated Magnetic Resonance Imaging by Generative Adversarial Neural Network
Roy Shaul and Itamar David, Ben Gurion University
Yuval Schwartz and Maayan Weitzman, Technion
ARCADE – Accurate and Rapid Camera for Depth Estimation
Shay Elmalem and Yotam Gil, Tel Aviv University
Dance Dance Revolution
Avrham Aton, Yael Ben-Gigi, Mor Avi-Aharon and Nadav Amram, Ben Gurion University
The IMVC conference is run by SagivTech (Chen and Nizan Sagiv) and Dr. Jacob Cohen and will take place next year at the same location.