What is a Computer Vision Algorithm

Business software

Autonomous driving, virtual reality and augmented reality are just a few of the areas of application for computer vision.

Computer vision - definition

Computer vision refers to systems that recognize objects in digital still and moving image material and process them accordingly. The field of computer vision has developed significantly over the past twenty years: Today's computer vision systems achieve an accuracy of 99 percent and now also run on mobile devices.

In order to abstract the image processing by the visual cortex, researchers in the field of computer vision rely in particular on artificial neural networks. The breakthrough came in 1998 with Yann LeCun's LeNet-5 (a seven-stage convolutional neural network that recognizes handwritten numbers in digitized images with a resolution of 32x32 pixels). This model has been purposefully expanded: Today's image classification systems recognize entire object catalogs in HD resolution and in color.

In addition to neural networks, computer vision experts also rely on hybrid vision models that combine deep learning with classic machine learning algorithms.

Computer vision training data

Various public image databases are available to train computer vision models:

  • MNIST is one of the simplest and oldest databases and contains around 70,000 handwritten numbers in ten different classes. The MNIST data set can be converted into a model without any problems - even with a laptop without hardware acceleration.

  • COCO offers a large data set - for example for object recognition and image segmentation. More than 330,000 images in 80 object categories are available.

  • ImageNet contains around 1.5 million images including labels and bounding boxes.

  • Open Images houses the URLs to around nine million images - also including labels.

  • Google, Azure, and AWS each have their own computer vision models that have been trained with large data sets. These can either be used directly or trained with your own image data sets using transfer learning. This saves a lot of time compared to fundamentally rebuilding a model.

Computer vision - use cases

Computer vision is not perfect, but the systems are accurate enough to be used in various industries.

Automotive

Waymo - formerly Google's flagship project in matters of autonomous driving - has trained its vehicle software with data from seven million kilometers traveled. At least one Waymo van accident is known to date, but the software is not said to have been the cause.

As is well known, the Tesla models also offer possibilities to move around autonomously and rely on computer vision. After a fatal accident, the vehicle software was adapted so that the driver's hands must always be on the steering wheel.

trade

Amazon relies on self-service and computer vision in its Go Stores: the system detects when a customer takes products off the shelf or puts them back, and the purchases are identified and billed via a smartphone app. If the Amazon Go software misses a product, it is free for the customer - he receives a credit for incorrectly calculated products.

Healthcare

Computer vision is also regularly used in healthcare, for example when it comes to x-rays and other medical imaging systems.

Financial sector

In banking, for example, computer vision is used to detect fraud or to authenticate documents.

Agriculture

When it comes to Agriculture 4.0, computer vision comes into play - for example when it comes to monitoring arable land.

Controversial uses

Computer vision is also used for controversial purposes. In particular, facial recognition techniques are (not only) very popular with autocracies. Deepfakes and training bias are often described problem areas.

Computer Vision - Frameworks & Models

Most deep learning frameworks offer comprehensive support for computer vision, for example the Python-based frameworks TensorFlow, PyTorch or MXNet.

  • The video analysis service Amazon Rekognition can recognize objects, people, texts and activities - including faces and custom labels.

  • The pre-trained analysis service Google Cloud Vision API enables the detection of objects and faces, reads printed and written text and provides image catalogs with metadata. With Google AutoML Vision, custom image models can also be trained.

  • Microsoft's Computer Vision API can also recognize objects. The Azure Face API is available in the cloud or as a container solution at the edge of the network and can recognize emotions as well as faces.

  • IBM Watson Visual Recognition classifies images based on a pre-trained model and also enables transfer learning, object detection and counting. The IBM solution runs in the cloud or via iOS devices with Core ML.

  • With Matlab, MathWorks also offers an analysis package that masters image recognition based on machine and deep learning.

  • The Apple Vision Framework recognizes faces, text or barcodes. Own CoreML models can also be used for the purposes of image classification or object recognition.

Computer vision models have developed rapidly since LeNet-5 - most of these models are artificial neural networks:

Computer vision is becoming more and more precise and reliable and can already compete with the human visual cortex in many cases. Due to the further development of frameworks and models as well as the possibility of transfer learning, you no longer need a doctorate to apply computer vision. (fm)

This article is based on an article from our US sister publication Infoworld.