There’s a lot of hype around AI at the moment, and I am both bored of listening to tech-bro word salads in equal proportion with feeling sad that many of my friends are frightened. I was starting to freak out a bit too, so I thought I’d do a short course in AI basics – the opposite of fear, in my opinion, is understanding. Having finished that course, I thought I’d consolidate my learning and share a brief summary.
Okay, let’s be clear on what AI is. Artificial intelligence is not a countable noun, it is a collection of concepts, problems, and methods for solving them. There is not one definition of AI, and it is part myth. Most of what is being hyped up is machine learning; a sub-set of artificial intelligence that describes computers that improve the performance of tasks with more experience and data. The key things that define machine learning, which people are calling AI (correctly, but AI includes many other things – some of which don’t exist) so for the sake of social norms I will too, are:
· Autonomy i.e. the ability to perform tasks without constant guidance
· Adaptivity i.e. the ability to improve performance by learning from experience
Nowhere here does it say AI needs to understand the meaning of the task it is doing; it’s a bunch of mechanical processes.
AI is arguably best described as machines imitating human intelligence. Of course, there’s a big ol’ philosophical discussion here about what human intelligence actually is. That’s been on the agenda for a long time, and I’m not going to touch it in this piece of writing; I’m just going to focus on what AI is. So, what are applications of AI actually doing? Let’s start at the top.
AI developers started out teaching a computer to solve simple problems, like how to win a two-player game where there is a clear winner and loser. So, let’s take a game of noughts and crosses:
Every time a player takes a turn, they’re making a ‘transition’. How everything is positioned on the board when they start, and once they’ve made the transition, is a ‘state’ of play. The ‘state space’ is all the possible situations transitions can lead to from the current ‘state’ of play. So, you can arrange all the possible transitions and all the possible states they lead to in a game-tree. A game-tree will include all the possible pathways of all the possible transitions through to winning or losing a game. The game-tree essentially puts all the potential states of play in a line following each move, and crucially, attaches a value to each state and transition. The value of each move is essentially based on whether that move is part of a path that leads to winning (+1), losing (-1), or drawing (0).
The difficulty is, once you get into more complex games, like chess, the game-trees become insanely large.
So, the computer has to zoom in rather than see the whole game-tree and estimate a value that is likely to be the outcome of any given move. For example, it estimates value using varied weightings for strategic positions. All of this relies on odds and probabilities and results in an estimated value i.e. the likelihood of a specified outcome. Outside of the maths, all you need to know is that once we get out into the real world with all its complexity, the likelihood assigned to any situation e.g. chance of rain, is constantly updated with new data to produce an up-to-date estimate i.e. the probability of something occurring e.g. 90% chance of rain in Birmingham today.
Now, let’s talk application of these – very basic and mechanical – processes for estimating the probability of something occurring. You can train a machine to classify objects, such as text, into two or more classes. It determines the probability of the classifications being accurate based on the initial training it receives, i.e. here is a set of data which is correctly labelled e.g. spam emails and emails we want to read, and continues to refine its accuracy as humans check and correct for mistakes.
Obviously, the better and bigger the training data, and the more corrections it gets on its attempts, the more accurate the output will be, i.e. the labels and the probability of the label being correct.
This is called supervised learning; the computer is provided with training, is given an input and is tasked with predicting the correct outputs i.e. the label. The simplest tasks are just yes/no but you can get more complex. There’s also unsupervised learning, which contrary to the name is not about humans not being involved, it just means providing the machine with unstructured/unlabelled data and setting it the task of grouping these in some way; there’s no fixed output. There’s some other stuff, like reinforcement learning where learning happens in a complex environment with slightly delayed feedback e.g. driverless cars, and applied projects can be a mix of both, but that’s the gist.
So as supervised learning gets more complex, you might teach it to label images and predict which label is likely the right one for other images. You might teach it to find associations between data sets to predict what’s going to happen in the future. You might teach it to find similar users based on lots of past purchases and predict (or suggest) what they’ll buy next. The more data and training, the better. With unsupervised learning, there isn’t a known correct answer, so it’s harder to check for accuracy and, once it’s done some clustering, will need plenty of human input to label up and interpret.
Whatever you’re doing, you can see how the machine does compared to reality, whether that’s clustering datasets, predicting the right labels for data, or predicting future events.
So, let’s talk neural networks, or ‘deep learning’. Effectively there are several layers of simple processing units that can both store and process information, all connected to each other with wires. In a traditional computer information is processed in a central processor, which can only focus on one thing at a time and retrieves information from the memory. Memory and processing are done separately. In a neural network, the units (and paths between them) both store and process the information, and capacity for this is maximised when the network is on hardware that can process many pieces of information at the same time, like a graphics processor.
An artificial neural network model uses the same rules of prediction to add up the inputs and values of each input, and, using a probability of accuracy, decide on an action i.e. pass the information on as is, send a pulse to other neurons with the information, or don’t pass the information on at all. The learning happens as all the values adjust to produce the correct outputs. So, in theory, you can feed the network training data one example at a time and each misclassification, and correction, leads to an update of the values. These layers of neurons feed into each other, holding onto this information and sharing it with other neurons, to produce layers of outputs and ultimately the output of the whole network.
This works in principle, but the training data set would need to be so big it’s still not realistic – especially when you think of this being done over and over for lots of applications.
So, a convolutional network detects features which become the basis for more abstract features; you might feed it some pixels, and the bottom layers process the raw input for the number of right angles, then the higher layers will figure out if it’s a chair. The bottom layer will be trained on a big data set which can be used over and over with different top layers, and the top layers can be trained with a smaller data set and supervised learning while it efficiently trains itself from corrections.
Theoretically it can cope with the whole chess game-tree if you split the game-tree up and give different bits of it to different layers. It learns, updating values and prediction certainty, by passing the information back down the network once the final layer is given the correct outcome.
Last bit – generative networks, like the big language learning models and image generators we’ve heard so much about lately. Here’s the craic. You can train one network to output images that are similar to the input images. “Here are some cat pictures. Make me a cat picture.” You train another network to separate the images that the first network produced itself, from the ones it was given to train on. They then compete, thus creating a learning environment, a feedback loop to self-train.
So, there you have it. That’s what an AI application is in 2023, a series of sophisticated mathematical and computational models, with autonomous learning capabilities. As it stands, we ascribe meaning and purpose, and the machines are not able to decide what tasks they should be doing. Robotics is a long way off being affordable and scalable and we’re a long way off from creating AI with generalised capabilities.
So, we’re a long way off from the Terminator – drink your iced latte in peace and think about the world of current AI applications.
I learned all of the above from one of many, many, free courses on AI available – thanks to elementsofAI.com for the free knowledge, and to Capitan Biohack for correcting bits of my thinking.