S. Richter*, V. Vineet*, S. Roth and V. Koltun

Playing for Data: Ground Truth from Computer Games

European Conference on Computer Vision 2016(ECCV 2016)

[paper]  

Recent progress in computer vision has been driven by high-capacity models trained on large datasets. Unfortunately, creating large datasets with pixel-level labels has been extremely costly due to the amount of human effort required. In this paper, we present an approach to rapidly creating pixel-accurate semantic label maps for images extracted from modern computer games. Although the source code and the internal operation of commercial games are inaccessible, we show that associations between image patches can be reconstructed from the communication between the game and the graphics hardware. This enables rapid propagation of semantic labels within and across images synthesized by the game, with no access to the source code or the content. We validate the presented approach by producing dense pixel-level semantic annotations for 25 thousand images synthesized by a photorealistic open-world computer game. Experiments on semantic segmentation datasets show that using the acquired data to supplement real-world images significantly increases accuracy and that the acquired data enables reducing the amount of hand-labeled real-world data: models trained with game data and just 1/3 of the CamVid training set outperform models trained on the complete CamVid training set.

(*-joint first authors)


A. Kundu*, V. Vineet*, and V. Koltun

Feature Space Optimization for Semantic Video Segmentation

IEEE International Conference on Computer Vision and Pattern Recognition 2016(CVPR 2016)

[paper]  

We present an approach to long-range spation-temporal regularization in semantic video segmentation. Temporal regularization in video is challenging because both the camera and the scene may be in motion. Thus Euclidean distance in the space-time volume is not a good proxy for correspondence. We optimize the mapping of pixels to a Euclidean feature space so as to minimize distances between corresponding points. Structured prediction is performed by a dense CRF that operates on the optimized features. Experimental results demonstrate that the presented approach increases the accuracy and temporal consistency of semantic video segmentation.

(*-joint first authors)


R. Ranftl, V. Vineet, Q. Chen and V. Koltun

Dense Monocular Depth Estimation in Complex Dynamic Scenes

IEEE International Conference on Computer Vision and Pattern Recognition 2016(CVPR 2016)

[paper]  

We present an approach to dense depth estimation from a single monocular camera that is moving through a dynamic scene. The approach produces dense depth map from two consecutive frames. Moving objects are reconstructed along with the surrounding environments. Experimental results demonstrate that the proposed method outperforms the prior methods for depth estimation from monocular camera.


O. Miksik, Y. Aytar, V. Vineet, P. Perez and P. Torr

Incremental Dense Multi-modal 3D Scene Reconstruction

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2015)

[paper]  

We propose a probabilistic model that efficiently exploits complementarity between different depth-sensing modalities for online dense scene reconstruction. In particular we probabilistically fuse stereo and lidar depths.


J. Valentin*, V. Vineet*, M.M. Cheng*, D. Kim, J. Shotton, P. Kohli, M. Niessner, A. Criminisi, S. Izadi* and P. Torr* (*-joint first author)

SemanticPaint: Interactive 3D Labeling and Learning at your Fingertips

ACM Transactions on Graphics 2015 (TOG 2015)

[paper]   [video]   [project]

We present a new interactive and online approach to 3D scene understanding. Our system, SemanticPaint, allows users to simultaneously scan their environment, while interactively segmenting the scene simply by reaching out and touching any desired object or surface.


S. Zheng, S. Jayasumana, B. Paredes, V. Vineet, Z. Su, D. Du, C. Huang and P. Torr

Conditional Random Fields as Recurrent Neural Networks

International Conference on Computer Vision 2015 (ICCV 2015)

[paper]   [demo]   [project]

We present CRF-RNN where we show that a Conditional Random Fields can be expressed as a Recurrent Neural Network (RNN); and so we are able to plug in CRF as a part of a deep Convolution Neural Network (CNN) where we train the parameters of the whole system using a back-propagation method.


V. Vineet, O. Miksik, et.al.

Incremental Dense Semantic Stereo Fusion for Large Scale Semantic Scene Reconstruction

IEEE International Conference on Robotics and Automation (ICRA 2015)

[paper]   [video]   [project]

We present a system which to our knowledge is the first system that can perform dense, large-scale, semantic reconstruction of a dynamic outdoor scene in (near) real time using stereo cameras.


O. Miksik, V. Vineet, et.al.

The Semantic Paintbrush: Interactive 3D Mapping and Recognition in Large Outdoor Spaces

ACM Human Computer Interaction (CHI 2015)

[paper]   [video]   [project]

We present an augmented reality system for large scale 3D reconstruction and recognition in outdoor scenes where users physically draw (or paint) with a laser pointer directly onto the reconstruction to segment the model into objects and semantic parts which are then used to learn to segment other parts of the 3D map during online acquisition. We demonstrate the possibility of using our system for helping the visually impaired navigate through spaces.


S. Hare, S. Golodetz, A. Saffari, V. Vineet, M.M. Cheng, S. Hicks and P. Torr

Struck: Structured Output Tracking with Kernels

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2015)

[paper]  

We present a framework for adaptive visual object tracking based on structured output prediction. By explicitly allowing the output space to express the needs of the tracker, we are able to avoid the need for an intermediate classification step. Our method uses a gernelized structured output support vector machine (SVM), which is learned online to provide adaptive tracking.


V. Vineet, J. Warrell, and P. Torr

Filter-based Mean-Field Inference for Random Fields with Higher Order Terms and Product Label-Spaces

International Journal of Computer Vision, (IJCV 2014)

[paper]   [code]

We show how several higher-order terms can be formulated such that filter-based mean-field inference remains possible. We demonstrate our techniques on joint stereo and object labeling problems, as well as object class segmentation, showing in addi- tion for joint object-stereo labeling we provide an efficient approach to inference in product label-spaces.


M. M. Cheng, S. Zheng, W. Lin, J. Warrell, V. Vineet, P. Sturgess, N. Mitra, N. Crook and P. Torr

ImageSpirit: Verbal Guided Image Parsing

ACM Transaction on Graphics, (TOG 2014)

[paper]   [code & demo]

We propose a new interactive system whereby users verbally select objects and attributes of interest in an image; thereby enabling a novel and natural interaction modality that can be used to interact with new generation devices (e.g. smart phones, Google Glass, Microsoft Hololens, living room devices).


O. Miksik, V. Vineet, P. Perez, and P. Torr

Distributed ADMM-based Inference in Large-scale Random Fields

British Machine Vision Conference(BMVC 2014)

[paper]  

We propose a parallel and distributed algorithm for solving discrete labeling problems in large scale random fields using the method of alternating direction method of multipliers (ADMM).


S. Zheng, M.M. Cheng, J. Warrell, P. Sturgess, V. Vineet, C. Rother and P. Torr

Dense Semantic Image Segmentation with Objects and Attributes

IEEE Computer Vision and Pattern Recognition(CVPR 2014)

[paper]   [project with code & data]

We formulate the problem of joint visual attribute and object class image segmentation as a dense multi-labelling problem, where each pixel in an image can be associated with both an objectclass and a set of visual attributes labels.


V. Vineet, J. Warrell and P. Torr

A Tiered Move-making Algorithm for General Non-submodular Pairwise Energies

(Arxiv 2014)

[paper]   [code]

We propose a tiered move making algorithm which is an iterative method. Each move to the next configuration is based on the current labeling and an optimal tiered move, where each tiered move requires one application of the dynamic programming based tiered labeling method.


V. Vineet, C. Rother and P. Torr

Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation

Neural Information Processing System(NIPS 2013)

[paper]

We explore the joint estimation of intrinsic scene properties recovered from an image, together with the estimation of the objects and attributes present in the scene. In this way, our unified framework is able to capture the correlations between intrinsic properties (reflectance, shape, illumination), objects (table, tv-monitor), and materials (wooden, plastic) in a given scene.


M. Cheng, J. Warrell, W. Lin, S. Cheng, V. Vineet, and N. crook

Efficient salient regon detection with soft image abstraction

IEEE international conference on computer vision(ICCV 2013)

[paper]   [project page with code]

We propose a novel method to decompose an image into large scale perceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation.


V. Vineet, G. Sheasby, J. Warrell and P. Torr

PoseField: An Estimationcient Mean-field based Method for Joint Estimation of Human Pose, Segmentation and Depth

Energy Minimization Methods in Computer Vision and Pattern Recognition(EMMCVPR 2013)

[paper]

We propose PoseField, a new highly efficient filter-based mean-field inference approach for jointly estimating human segmentation, pose, per-pixel body parts, and depth given stereo pairs of images.


V. Vineet, J. Warrell and P. Torr

Filter-based Mean-Field Inference for Random Fields with Higher Order Terms and Product Label-Spaces

European Conference on Computer Vision(ECCV 2012)

[paper]   [code]

We show how several higher-order terms can be formulated such that filter-based mean-field inference remains possible. We demonstrate our techniques on joint stereo and object labeling problems, as well as object class segmentation, showing in addi- tion for joint object-stereo labeling we provide an efficient approach to inference in product label-spaces.


V. Vineet, J. Warrell and P. Torr

Improved Initialization and Gaussian Mixture Pairwise Terms for Dense Random Fields with Mean-field Inference

British Machine Vision Conference(BMVC 2012)

[paper]

We propose SIFT-flow label transfer method and Gaussian pairwise terms to improve generalization ability of the filter-based mean-field method.


V. Vineet, J. Warrell and P. Torr

Tiered Move Making Algorithm for General Pairwise MRFs

Computer Vision and Pattern Recognition(CVPR 2012)

[paper]   [code]

We propose a tiered move making algorithm which is an iterative method. Each move to the next configuration is based on the current labeling and an optimal tiered move, where each tiered move requires one application of the dynamic programming based tiered labeling method.


V. Vineet, J. Warrell, P. Sturgess and P. Torr

Learning and Inference for General Non-submodular Pairwise Energies

Rank Prize Symposium(RPS 2012)

[paper]   [code]

We propose a tiered move making algorithm which is an iterative method. Each move to the next configuration is based on the current labeling and an optimal tiered move, where each tiered move requires one application of the dynamic programming based tiered labeling method.


P. Harish, P. J. Narayanan, V. Vineet, and S. Patidar

Fast Minimum Spanning Tree Computation

Nvidia GPU Computing Gems(GCG), Jade Edition, Chapter 7, 2011(GCG 2011)

[paper]   [code]

We present a minimum spanning tree algorithm on Nvidia GPUsunder CUDA, as a recursive formulation of Boruvka's approach for undirected graphs.


V. Vineet, J. Warrell, L. Ladicky and P. Torr

Human Instance Segmentation from Video using Detector-based Conditional Random Fields

British Machine Vision Conference(BMVC 2011)

[paper]

We propose a method for instance based human segmentation in images and videos, extending the recent detector-based conditional random field model.


P. J. Narayanan, V. Vineet, and T. Stitch

Fast Graph-cuts for computer vision

GPU Computing Gems(GCG), Emerald Edition, Chapter 29, 2010(GCG 2010)

[paper]   [code]

We present an implementation of the push-relabel algorithm for graph cuts on the GPU.


V. Vineet, and P. J. Narayanan

Solving MultiLabel MRFs using Incremental alpha-expansion on the GPUs

Asian Conference on Computer Vision(ACCV 2009)

[paper]

We present incremental alpha-expansion algorithm for high-performance multilabel MRF optimization on the GPU.


V. Vineet, P. Harish, S. Patidar and P. J. Narayanan

Fast Minimum Spanning Tree for Large Graphs on the GPU

ACM SIGGRAPH Proceeding of High Performance Graphics(HPG 2009)

[paper]   [code]

We present a minimum spanning tree algorithm on Nvidia GPUsunder CUDA, as a recursive formulation of Boruvka's approach for undirected graphs.


P. Harish, V. Vineet, and P. J. Narayanan

Large Graph Algorithms for Massively Multithreaded Architecture

IIIT Tech Report, (IIIT/TR/2009/74)

[paper]   [code]   [BFS and SSSP]   [APSP using SSSP]   [Kernel Based MST]

We present implementations of breadth-first search, st-connectivity, single-source shortest path, all-pairs shortest path, minimum spanning tree, and maximum flow algorithms on commodity GPUs.


V. Vineet, and P. J. Narayanan

CUDA Cuts: Fast Graph-cuts on the GPU

CVPR workshop on Visual Computer Vision on the GPUs(CVGPU 2008)

[paper]   [code]

We present an implementation of the push-relabel algorithm for graph cuts on the GPU.