Pattern and image recognition with neural networks

Алиева, Динара Галымжановна

The most popular application of neural networks is visual image recognition. Today, networks are being created in which machines can successfully identify symbols on paper and bank cards, signatures on official documents and objects. These functions can significantly simplify human labor, as well as increase the reliability and accuracy of various work processes due to the absence of the possibility of errors due to human factors.

Keywords: artificial neural networks, neurons, architecture, training.

Самым популярным применением нейронных сетей является визуальное распознавание изображений. Сегодня создаются сети, в которых машины могут успешно идентифицировать символы на бумаге и банковских картах, подписи на официальных документах и предметах. Эти функции позволяют значительно упростить человеческий труд, а также повысить надежность и точность различных рабочих процессов за счет отсутствия возможности ошибок из-за человеческого фактора.

Ключевые слова: искусственные нейронные сети, нейроны, архитектура, обучение.

Artificial neural networks are a mathematical model of the functioning of traditional neural networks for living organisms, which are networks of nerve cells. As in the biological analogue, in artificial networks the main element is neurons, interconnected and forming layers, the number of which can vary depending on the complexity of the neural network and its purpose.

What are ordinary neural networks? A fully connected neural network is often called a regular one. In it, each node (except for the input and output) acts as both an input and an output, forming a hidden layer of neurons, and each neuron of the next layer is connected to all the neurons of the previous one. Inputs are supplied with weights, which are adjusted during the learning process and do not change subsequently. In this case, each neuron has an activation threshold, after passing which it takes one of two possible values.

Convolution neural network (CNN) has a special architecture that allows it to recognize images as efficiently as possible. The very idea of a CNN is based on alternating convolution and sub sampling layers (pooling), and the structure is unidirectional. CNN gets its name from the convolution operation, which assumes that each image fragment will be multiplied by a convolution kernel element by element, and the resulting result should be summed and written to a similar position in the output image [1, 180 p.]. This architecture ensures recognition invariance with respect to object displacement, gradually enlarging the «window»at which the convolution «looks», revealing larger and larger structures and patterns in the image.

Working with images is an important area of application for Deep Learning technologies. Globally, all images from all cameras in the world form a library of unstructured data. Using neural networks, machine learning and artificial intelligence, this data is structured and used to perform various tasks.

The basis of all video surveillance architectures is analysis, the first phase of which is image recognition. The artificial intelligence then uses machine learning to recognize the actions and classify them. In order to recognize an image, the neural network must first be trained on the data. This is very similar to the neural connections in the human brain — we have certain knowledge, see an object, analyze it and identify it.

Neural networks are demanding on the size and quality of the dataset on which it will be trained. The dataset can be downloaded from open sources or compiled independently.

In practice, this means that up to a certain limit, the more hidden layers in the neural network, the more accurately the image will be recognized. How is this implemented?

The picture is divided into small areas, up to several pixels, each of which will be an input neuron. With the help of synapses, signals are transmitted from one layer to another. During this process, hundreds of thousands of neurons with millions of parameters compare the received signals with the already processed data.

Simply put, if we ask a machine to recognize a photo of a dog, we will break the photo into small pieces and compare these layers with millions of existing images of dogs whose feature values the network has learned.

At some point, increasing the number of layers leads to simply memorizing the selection, rather than learning. Further — due to different architectures.

Neural network for image recognition is perhaps the most popular application of a neural network. Moreover, regardless of the specifics of the tasks being solved, it works in stages, the most important of which will be discussed below.

A variety of objects can act as recognizable images, including images, handwritten or printed text, sounds and much more. When training the network, it is offered various samples with a label indicating which type they can be classified as. A vector of feature values is used as a sample, and the set of features under these conditions should make it possible to unambiguously determine which class of images the neural network is dealing with [2, 934 p.].

When training, it is important to teach the network to determine not only a sufficient number and values of features to produce good accuracy on new images, but also not to over train, that is, not to unnecessarily «adjust» to the training sample of images. After proper training is completed, the neural network should be able to identify images that it did not deal with during the training process.

It is important to take into account that the source data for the neural network must be unambiguous and consistent, so that situations do not arise when the neural network produces high probabilities of one object belonging to several classes.

Creating a neural network for image recognition includes:

— collection and preparation of data;

— choice of topology;

— selection of characteristics;

— selection of training parameters;

— training;

— checking the quality of training;

— adjustment;

— verbalization.

There are several different architectures of artificial neural networks, including neural networks for image recognition:

Multilayer perceptron
Convolution
Recursive
Recurrent
Long short-term memory network — a type of recurrent neural network
Sequence-to-sequence model
Not deep

When training a neural network for image recognition with a teacher, there is a sample with true answers to the question of what is shown in the picture — class labels. Neural networks take these images as input and then calculate an error comparing the output values to the true class labels. Depending on the degree and nature of the discrepancy in the prediction of the neural network, its weights are adjusted, the responses of the neural networks are adjusted to the true answers until the error becomes minimal.

In the case of unsupervised learning, the training set does not have class labels, and the neural network is tasked with finding answers that are not known in advance. The neural network tries to independently find patterns in the data by extracting useful features and analyzing them. For example, clustering is the most common task for unsupervised learning. The algorithm selects similar data, finding common features, and groups them together.

In unsupervised learning, it is difficult to calculate the accuracy of the algorithm because there are no «correct answers» or labels in the data. But labeled data can be difficult or prohibitively expensive to obtain. In such cases, giving the model free reign to find dependencies can produce a specific result.

In the case of partially supervised learning, the training set contains both labeled and unlabeled data. This method is especially useful when marking all the objects is a time-consuming task. However, a neural network can extract information from a small fraction of the labeled data and improve prediction accuracy compared to a model trained solely on unlabeled data.

In the case of reinforcement learning, it operates on the principle of receiving feedback — rewards for certain actions. Neural networks today are used in various fields and areas: healthcare, aviation, Internet, manufacturing, political science, robotics, security, data processing [3, 340 p.].

The current level of technology development and the fact that today neural networks are used in a variety of fields shows that neural networks have great development prospects in various fields, including: transport, robotics, agriculture, medicine, Internet of things, entertainment, security.

Neural networks can find a variety of applications, not only for image and text recognition, but also in many other areas. Neural networks are capable of learning, so they can be optimized and maximize functionality.

The study of neural networks is one of the most promising areas at present, since in the future they will be used almost everywhere, in various fields of science and technology, since they can significantly facilitate work and sometimes protect people.

References:

Редько В. Г. Эволюция, нейронные сети, интеллект: Модели и концепции эволюционной кибернетики / В. Г. Редько. — М.: Ленанд, 2019. — С. 180–183
Хайкин С. Нейронные сети: полный курс / С.Хайкин. — М.: Диалектика, 2019. — С. 934–938
Галушкин А. И. Нейронные сети: основы теории. / А. И. Галушкин. — М.: РиС, 2015. — С. 340–342

Pattern and image recognition with neural networks

Библиографическое описание:

Ключевые слова

Похожие статьи

Похожие статьи

Ответим на ваш вопрос!