Checking In Blog

The AI Technologies Powering Checkout-free Retail Part 3 - Computer Vision

Written by Motilal Agrawal | May 3, 2022 3:00:00 PM

Previously, I described how deep learning and sensor fusion are important AI technologies enabling checkout-free retail. In this final blog post in the series, I will discuss the role of computer vision in identifying customers and the products they are ultimately purchasing from these stores.

What is computer vision?

Computer vision is the field of computer science that enables computers to ‘see’ just as humans do. During the 1960s, the initial focus of this field was to understand how human brains process visual information and try to mimic that with computers. However, we are still far from getting a clear understanding of how our brains work, so a more modern definition of computer vision is to extract abstract and actionable information from visual content such as images and videos. 

Computer vision is multi-disciplinary with lots of overlap from sensor technology, physics, math and statistics, neuroscience, AI, computer graphics, and computer science. It also broadly encompasses related fields such as image processing, computational photography, photogrammetry, and machine vision, wherein the goals are slightly different. In image processing the goal is to extract an enhanced image from the original images, thus image processing seeks to get rid of image blur and noise and obtain a clearer/crisper image. A related but somewhat recent subfield is computational photography, which aims to improve the capabilities of the camera or add new features through computational techniques. An example of this would be selective focusing based on depth as well as high dynamic range photography. In photogrammetry, the goal is to extract 3D information about the objects in a scene; machine vision aims to apply computer vision to industrial tasks such as recognizing defective parts.

The core sub-domains of computer vision differ in the nature and type of information to be extracted from the visual content and include image restoration, object detection, object classification, object segmentation, scene reconstruction, tracking, motion estimation, event and activity recognition.

Why is computer vision hard?

Computer vision started out as a summer research project at MIT in the 1960s, and even after six decades of research, this task of helping computers ‘see’ remains elusive and challenging. The challenge arises from this incomplete understanding of how our brains function and the complex nature of the interaction between light, objects, and our eyes. 

Our physical world is infinitely complex in terms of shapes, appearance, and materials at the macroscopic as well as microscopic levels. This is further complicated by the physics of the interaction of light photons on these objects placed at unknown orientations and affected by shadows and occlusions from objects around them. All of these complexities are difficult to model and recover from the only information that is accessible to computers: an array of pixels. Thus computer vision can be seen as the inverse of computer graphics. It is much more challenging to recover a higher dimensional representation of the world from a two-dimensional array of pixels than the other way around.

What are some of the applications of computer vision?

In spite of these challenges, computer vision has had numerous successes in the past few decades, especially under constrained environments where the lighting and/or the scene are partially known. Computer vision systems work flawlessly for applications such as optical character recognition for check processing, postal sorting, and vehicle license plate reading. 

More recently, computer vision has also achieved impressive results in unconstrained environments. Face detection and biometric recognition commonly found in mobile devices are examples of computer vision at work. Computer vision is also actively used in organizing your photo collection on mobile devices. Social media sites use computer vision to filter videos for inappropriate content and detect copyrighted content, and also to recommend related videos.

Computer vision has great potential in augmented/mixed reality and gaming applications. It can be used to build the virtual world through scene reconstruction and to track the user's pose and gaze, thereby providing the crucial link between the virtual and physical world - welcome to the Metaverse!

Computer vision for checkout-free retail

In the e-commerce realm, computer vision plays an important role in the management and organization of the online catalog of products from their pictures. It can help find duplicate products or similar-looking products, and get the product attributes automatically from their pictures. 

Computer vision even plays an important role in physical retail. It can help detect and track people and their whereabouts, therefore providing detailed advanced analytics, hot spots, and insights into product placements. In addition, cameras can be used to monitor shelf planograms and detect empty shelf locations, as well as price and product placement errors. 

Obviously, in order to facilitate the benefits of computer vision in stores, one has to invest in placing cameras throughout the store. At Zippin we know the return on investment for these cameras can be further increased by using computer vision to enable a whole new checkout-free retail experience. The computer vision technique of people detection and tracking can then be used to track customers throughout the store. Techniques for activity and event recognition can help detect the customer’s interaction with products on shelves and help recognize the products they are adding to their basket. 

Accuracy and cost of deployment are important considerations for these checkout-free retail stores. We've harnessed computer vision technologies at Zippin, for instance, to meet both these considerations, especially when used in combination with deep learning and sensor fusion, the two AI technologies discussed in parts one and two of this series. 

At Zippin, we strongly believe that the time is ripe for rethinking traditional retail and using deep learning, sensor fusion, and computer vision synergistically to provide the best AI-powered shopping experiences. We have launched 50 AI-powered stores in stadiums, airports, and convenience stores to date, with more launching this spring and summer.  We’ve served over half a million shoppers in one of our checkout-free stores, and encourage you to visit one to experience the full power of AI!