Artificial Intelligence, The next big thing

Introduction
Artificial Intelligence (or AI) is the 21st century’s equivalent of Electricity. Some have even called it “more significant than electricity and fire”. Elaborate concepts have been written about the potential of AI in various industries and the transformative potential of this technology in virtually every aspect of modern life, from smarter appliances to (in the distant future) AI becoming so smart it can pose philosophical, even existential questions.

Even at this early stage of AI (often called “weak” or “narrow” AI), this computer science technology is expected to add new insights, new capabilities, and new value to modern business domains, from better detecting frauds in financial transactions to enabling safer self-driving cars. One important application of AI is in the production, processing, presenting, and consuming audio-visual data, an increasingly essential element of our modern digital lives. As a leading vendor and technology provider in the visual arts domain, we decided to explore the various techniques and applications of AI in the image and video processing field.

The Nature of AI
At its core, AI refers to a software system or software-driven robotic system that does tasks normally associated with human intelligence, like reasoning, decision making, and learning. This approach makes it quite distinct from traditional computer programming, which is essentially a series of sequential steps which a computer system executes without concern for the input. As a result, the output is sometimes useless, giving rise to the term GIGO- Garbage in, garbage out!

AI is different. The “learning” aspect of AI algorithms means the results can be expected to get better as the system learns to categorize and process data better. Combined with the traditional advantages of computer systems over humans (i.e., ability to process a lot of data very quickly), AI systems have three potential advantages over traditional systems:

  1. An AI algorithm can be trained by humans, other AI systems, and even itself without updating the code itself, to solve progressively more complex problems;
  2. AI systems can utilize the vast reams of data produced by sensors and digital devices to produce more accurate results faster, where traditional systems and people would simply be overwhelmed; and
  3. AI systems can take on tasks of ambiguity that are simply outside the capability of traditional computer systems.

Domains of AI and their use in Image processing

Figure 2: The six branches of AI each add a unique characteristic in the ongoing quest to build useful AI systems. Courtesy analytic steps

Modern digital image and video processing are based on identifying and manipulating characteristics (e.g., RGB values) of screen pixels that form edges, shapes, and shades in a single (for images) or multiple (for video) frames. AI’s various methodologies can be used to automate these tasks for further processing. To see the impact of these methodologies, let us consider three use cases:

  1. An AI algorithm can be trained by humans, other AI systems, and even itself without updating the code itself, to solve progressively more complex problems;
  2. AI systems can utilize the vast reams of data produced by sensors and digital devices to produce more accurate results faster, where traditional systems and people would simply be overwhelmed; and
  3. AI systems can take on tasks of ambiguity that are simply outside the capability of traditional computer systems.

Consider the first use case- changing the color of a shirt in a digital image. Currently, this can be achieved either by a person manually using a photo-editing software package like Photoshop or via an automation script. However, the automation script is “dumb”- it can change the color of any object in the image but cannot identify the shirt from a chair, meaning significant human involvement is still necessary. Enter Machine Learning.

Machine Learning- Let us continue with the above example of identifying a shirt in an image using machine learning (or “ML” for short). In a “standard” or “supervised” Machine ML algorithm, several images of shirts are “tagged” with the desired characteristics such as edges, and the resulting dataset is given as input to the ML algorithm as “training”. Once trained, the algorithm can identify shirts and distinguish them from other objects, removing the need for a person to do the same. This is a very useful methodology to free up valuable human talent from relatively low intellect tasks.

Figure 3: Machine Learning algorithms can be trained to identify objects of increasing complexity and variety without reprogramming. They represent a significant leap forward in image processing technology. Courtesy Prasad Pai

This can work well for relatively simple images, but more complex work is too difficult for standard ML systems to handle. This difficulty can be due to many reasons, such as:

  1. Rarity- there are no similar pictures available to train the ML algorithm;
  2. Complexity- there are far too many objects, people, color gradients, and/or backgrounds with variations between them for the ML algorithm to be reliable;
  3. Niches- The image may be meaningful only in certain domains; and
  4. Quality- The image may be blurry or otherwise have a lot of ambiguity

When faced with such difficulties, methodologies other than standard ML are needed. Consider Fuzzy Logic systems:

Fuzzy Logic Systems- “Fuzzy” logic is the concept of accounting for ambiguity by assigning it a mathematical value. In the context of image processing, this can be assigned values between 0 and 1 to pixels that are neither totally dark or black (1) nor totally lit up or white (0). For example, an off-white might be assigned the value 0.1, pale yellow 0.2, dark gray 0.8, and so on.

Creating a quasi-continuous logical framework like this allows AI systems to is incredibly useful when we consider the real world as an analog world consisting of continuous processes. In the world of image processing, Fuzzy logic systems can simplify edge detection of objects, among other applications.

Figure 4: Fuzzy Logic systems are useful in edge detection in image processing, among other applications. Courtesy Katoch et al

Sometimes, the ambiguity in an image may have nothing to do with an “in-between”, but instead with a gap in the “knowledge” of the system. In other words, the object in the picture may mean nothing to the average person but holds immense meaning to a domain expert, like a doctor looking at an ultrasound or an x-ray. Other examples include geological formations, fossils, astronomical phenomena, etc. Such ambiguity is dealt with by creating specialized AI systems to solve complex problems in a very narrow domain. These are called “Expert Systems”.

Expert Systems- Expert Systems are more than just AI algorithms. They consist of a human expert who serves as the source of the original knowledge, which is stored in a “knowledge base”, and an inference engine consisting of branching “if-then” logic to answer input queries using the knowledge stored in the knowledge base. This “human in the loop” model allows AI to access uniquely human knowledge and is therefore extremely useful as an “intelligent assistant” to solve complex problems within a specific domain. In other words, a Cancer-specific Expert System won’t replace an oncologist but may serve as a useful assistant to answer general queries and screening lung x-rays from all over the world to select those which need scrutiny by human experts. Beyond medical science, Expert Systems are crucial to computer vision, useful in autonomous vehicles and in a number of other fields.

Figure 5: Expert Systems AI has a unique “human in the loop” methodology that could unlock AI personalization and universal knowledge repositories of the future. Courtesy javatpoint

Standard ML, Fuzzy Logic and Expert Systems all have a basic limitation, however. They cannot “think” for themselves. That is to say; they are limited by the methods and rules pre-fed into them. As such, they have very little originality. That is left to the domain of Neural Networks.

Neural Networks- Biological neural networks refer to the complex matrix of interconnected neurons that make up brains- especially human brains. In the context of AI, Neural Networks (or NNs) are a set of interconnected mathematical algorithms arranged in “layers” with assigned “weights” and rules of interconnectivity between them. The curious thing about NNs is the way they solve problems. NNs have shown the ability to study large training datasets to not only solve problems in pre-designated ways like ML algorithms but to come up with entirely new ways to reach a result. As a result, NNs can achieve incredible tasks like creating new music, composing poetry, and blending the painting styles of several artists to create original artwork. The results tend to be unpredictable, with most results often being nonsensical, but even the few acceptable results point to a truly astonishing practical possibility for the future! In the context of commercial visual art, NNs can be used to create animations, generate 3D characters for various applications, and perform tasks like image composition. Theoretically, all they require are guidelines about the desired end result! Neural Networks not only open up new possibilities in computer-aided graphics creation but add a new degree of autonomy to the other great domain of activity for AI- Robotics.

Figure 6: A pastiche made by a CNN by combining the image from one artist with the style of another. Courtesy sub subroutine

Robotics- Robots, from the Russian “Robota” meaning “worker” have long been a scientific and industrial reality. Modern robotics is using AI to develop robots that not only perform tasks but do so in ever-more efficient ways. Today’s robots come in a wide range, including military drones, civilian autonomous vehicles and industrial robots, and even miniature Nanobots for scientific and medical applications. Many employ computer vision with image processing based on AI algorithms, including neural networks, to do a variety of tasks.

As intelligent and autonomous as they are, all AI systems ultimately need to interact with human users. It is this human-machine interface that sees employment of the last of the AI methodologies, Natural Language Processing, or NLP.

Figure 7: This AI-powered robot called Jiang Lailai could be the precursor of future personal assistants, one of the several applications of robotics. Courtesy dailymail.co.uk

Natural Language Processing (NLP)- NLP is a branch of AI that seeks to simplify the way humans interact with AI systems by using human language patterns for ease of use for people, converting them on the backend to machine-understandable instructions and feedback. The process is two-way. Some NLP systems are utilized in speech-to-text conversions, others in IVR systems, and still others in voice search. As audio and video replace text as the primary mode of human-machine interaction, NLP is set to be center stage in its role as the enabler of better, faster internet search with smart speakers, contextual information overlay in AR, and VR systems, and many other applications.

Figure 8: In the Hollywood movie The Avengers, Jarvis is a fictional AI assistant that featured advanced forms of Neural Networks and NLP. Courtesy SyFyWire

Conclusion
The term AI was invented as far back as the 1950’s, it was envisioned as the future of computer software. Modern AI systems use multiple methodologies in concert with each other to build robust systems to solve complicated problems in virtually every field. AI is slated to become an integral layer of the software stack, forming a base of intelligence in every computer application. In the space of digital image processing, it is already revolutionizing image manipulation, 2D and 3D asset creation, marketing material creation, and much more. It is exciting to think this is only the beginning.

Image processing vendors in the future will need to be more than users of image and video manipulation software. They will increasingly create value for their customers by also enabling smart utilization of assets by being experts in AI applications for image and video for digital media. Manipal Digital Systems creates unique business value for its clients in eCommerce, Fashion, Packaging, eLearning, and Media through end-to-end software + visual graphics services. Contact us and let us help you in unleashing the power of AI for your business.

Reference

    Contact us
    Contact us