Thesis topic: "Rich Embedding Techniques to Improve Scene Understanding"
Advisors:Kevin W. Bowyer, Walter J. Scheirer. GPA: 3.95
Honorable Mention for Excellence. GPA: 4.0
Worked on a research program between the University of Notre Dame and the WVU FBI/Biometric Center of Excellence.
Research Project: Automatic classification of the eye’s orientation.
Designed a joint visual-temporal embedding method to improve the performance of temporal action segmentation (6% improvement over the state-of-the-art in the Breakfast Actions Dataset). Presented the results of my research project in the IBM Intern Showcase and got the "Best of Show" award.
Designed and programmed web applications using Amazon Web Services. Worked applying SCRUM Agile Methodology to specify, develop and deliver products within restrictive timelines.
Liaison between the Tecnológico de Monterrey, Campus Cuernavaca and Google. Directed group activities and coached over a thousand students in the use of Google Apps and Android Programming.
Designed, developed and maintained web applications and SCADA systems. Worked with a multidisciplinary team to convert business needs into technical specifications and provided counsel in the use and suitability of IT services.
S. Banerjee*, R. G. VidalMata*, Z. Wang, and W. J. Scheirer, "Report on UG^2+ Challenge Track 1: Assessing Algorithms to Improve Video Object Detection and Classification from Unconstrained Mobility Platforms," in Computer Vision and Image Understanding (CVIU).
S. Abraham, Z. Carmichael, S. Banerjee, R. VidalMata, A. Agrawal, M. N. A. Islam, W. Scheirer, and J. ClelandHuang, "Adaptive Autonomy in Human-on-the-Loop Vision-Based Robotics Systems," in 2021 Workshop on AI Engineering (WAIN).
R. G. VidalMata, W. J. Scheirer, A. Kukleva, D. D. Cox, and H. Kuehne, "Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences," in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
R. G. VidalMata, [et. al.], "Bridging the Gap Between Computational Photography and Visual Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), doi: 10.1109/TPAMI.2020.2996538.
R. G. VidalMata, S. Banerjee, W. J. Scheirer, K. Grm, and V. Struc, "UG^2: a Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition," in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1597-1606, March 2018. doi 10.1109.
A. Czajka, K. W. Bowyer, M. Krumdick, and R. G. VidalMata, "Recognition of Image-Orientation-Based Iris Spoofing," in IEEE Transactions on Information Forensics and Security, vol. 12, no. 9, pp. 2184-2196, Sept. 2017. doi: 10.1109/TIFS.2017.2701332
Manipulation detection algorithms often rely on identifying local anomalies, where manipulated regions would be “sufficiently” different from the rest of the features in the image.
As part of this project we study techniques taken from computational photography as a way to exacerbate the anomalies present in manipulated regions to facilitate their detection by a variety of deep-learning and traditional manipulation detection methods.
The use of small Unmanned Aerial Vehicles to collect imagery in difficult or dangerous terrain offers clear advantages for time-critical tasks such as search-and-rescue missions, fire surveillance, and medical deliveries.
Employing a drone to search for a missing kayaker on a river or a child lost in the wilderness, survey a traffic accident or a forest fire, or to track a suspect in a school shooting would not only reduce risk to first responders but also allow for a wide-scale search to be deployed in an expedited manner.
Understanding the structure of complex activities in untrimmed videos is a challenging task in the area of action recognition. One problem here is that this task usually requires a large amount of hand-annotated minute- or even hour-long video data, but annotating such data is very time consuming and can not easily be automated or scaled.
To address this problem, we propose an approach that is able to provide a meaningful visual and temporal embedding out of the visual cues present in contiguous video frames.
Can the application of enhancement algorithms as a pre-processing step improve image interpretability for manual analysis or automatic visual recognition to classify scene content?
While there have been important advances in the area of computational photography to restore or enhance the visual quality of an image, the capabilities of such techniques have not always translated in a useful way to visual recognition tasks.
Matching the iris codes from the left and right eyes of the same person gives a result that is on average basically the same as matching iris codes from unrelated persons.
Two approaches are compared on the same data, using the same evaluation protocol: