Unearthing the Layers of Machine Learning
Data is the pathway upon which the future is being built. It is a language spoken everywhere, and being able to translate and transform data is becoming an increasingly pivotal skill. The power of data has grown immensely, and algorithms are now a companies strongest asset. In order to take advantage of this data, we must learn how to harness and translate it.
Data can be used in wonderous ways: what if I told you that you can use data to make an app that identifies hand-written digits? Or a program that recommends Medium stories? Or even build a search engine that revolutionizes the way we use the internet? Machine learning is the driver behind these technologies, enabling us to unlock the true potential of data.
So what exactly is machine learning? It is the science of getting a computer to learn without being specifically programmed. As Tom Mitchell, a professor at Carnegie Mellon University, astutely states:
A program is said to learn from experience E with respect to task T and performance measure P, if its performance on T, as measured by P, improves with experience E
If a program’s success (P) on a specific task (T) improves with experience (E), it is said to be learning. In a nutshell, machine learning is using algorithms to allow a computer to learn and develop a solution on its own, rather than being explicitly programmed. These algorithms dynamically adapt to the data given — there is nothing set in stone. In tradition programming, the program adhered to the constructs set by the programmer. Now, in a sense, the program becomes the programmer. Machine learning models learn, predict, and improve. We use machine learning for database mining, creating self-customizing applications, and making complex applications impossible for humans to create explicitly.
Let’s take a look at a practical application of machine learning: creating a program that can recognize whether a given image of an animal is a dog. This is not feasible to manually code: there are 339 breeds of dog and not every dog within a breed is identical. We need to make a program that can learn and identify characteristics unique to dogs.
Take a look at the image below:
This animal is quite clearly a dog, and it likely took you only a few seconds to identify it. But how did you know? You have likely never seen this specific image — or even this breed — yet you know with certainty that it is a dog. We have been subjected to hundreds of images of various dogs throughout our lives, that we can now identify new dogs based on past experience. Our goal is to get the machine to do the same thing: learn from experience in order to make accurate predictions for new cases.
In order to understand how exactly machines do this, I recommend you watch this excellent video which details how machines learn:
Machine learning is a subdivision of artificial intelligence. AI is the idea of letting machines think for themselves; Machine learning is the idea of letting machines think for themselves using data. Using a data set (which contains historical records upon which the model will be trained), we train an algorithm to make predictions. Machine learning algorithms can generally be categorized as supervised and unsupervised learning (there is reinforcement learning as well, but this is a more abstract form of machine learning, where a model learns from experience, instead of training data; reinforcement learning models the way humans learn.)
In supervised learning, we teach the computer how to do something. We provide the machine with the ‘answer key’ in the form of labels. More plainly in a data set (x,y) — where x is the input and y is the output — we directly provide the machine with y. The goal of a supervised learning algorithm is to get the program to find and understand the relationship between y and x. This relationship can then be used to predict the output for new inputs. In machine learning, we call the input(x) the feature and the output(y) the label. The relationship between x and y can be expressed as a function (f), where f is a bijective mapping function. The objective of our program is to determine what this mapping function is, and then use that mapping function to predict the labels for future inputs.
We know the right answers and our job is to supervise the machine to be able to accurately predict the output for new data inputs.
Supervised learning algorithms can be classified as either regression or classification.
For a regression algorithm, we try to predict a continuous (i.e. numerical) output. For instance: given an image of a person, determine their age. Age is a continuous range — there is no specific group of values. When dealing with a regression problem, the algorithm attempts to produce a function that closely maps to the actual data. When we plot this function, it often resembles the line of best fit through the data. There are different forms of regression, including linear, polynomial, and ridge. The form of regression used varies the mapping function produced and the line fitted through the data. Data that closely fits a line requires linear regression, while data which resembles a Gaussian curve may require polynomial regression. Linear regression is used quite often in statistics and finance and can help create incredibly accurate forecasting models.
For a classification algorithm, we attempt to predict a discrete value (i.e. categorical values). For instance: given an image of a person, determine if they are a child, a teen, an adult, or a senior. There are only 4 states possible: child; teen; adult; senior. Each of these states could be assigned numbers (as algorithms can only think in numbers, every feature must be numerically represented): 0; 1; 2; 3. There are only 4 possible output values; there can be no 1.5 or 3.14. As for regression, there are a myriad of algorithms that can be used for classification. These include neural networks, decision trees, and k-nearest neighbours. These algorithms seek to find patterns within the data and allows the machine to understand how the data is classified. Given new data, these algorithms help the program provide a numerical label on the data; it is now able to classify new data
In unsupervised learning we let the model learn by itself (there is no supervision per se). We have a data set, but we don’t know what each point represents or what to do with it. The data is unlabelled and there is no direct feedback. Unsupervised learning models must decipher the relationships in the data without any help (i.e. no corresponding output is provided). The objective of unsupervised learning is to devise structure within unlabeled data points. There are a variety of algorithms — most of which are beyond the scope of this post — but the most common is the clustering algorithm.
Simply put, the clustering algorithm groups similar data points together (visually, this can be thought of as the data points closest to an arbitrary point). Theoretically, data points in the same group share similar properties, whilst data in different groups are greatly dissimilar. Given a set of data points, an unsupervised learning algorithm clusters them all together based on the kind of clustering algorithm used. These algorithms are significantly more intricate than those used in supervised learning and are more difficult to grasp and understand.
Implementation of these algorithms has gotten significantly easier thanks to resources such as Tensorflow and Scikit-learn. Learning these algorithms from scratch is no longer a necessity, but it is highly recommended you gain an understanding as to how these algorithms work, as well as how they are derived. Understanding is perhaps the most important component in not just machine learning, but in programming and mathematics.
This article provided an overview of machine learning without delving into the technical details. This article has hopefully armed you with a basic understanding as to what machine learning is and what its capabilities are. The rate at which machine learning and AI are evolving is exponential, and it’s applications are so widespread, impacting everything from finance to medicine and genomics to robotics, machine learning is becoming the most significant and consequential technology in our world.
“Machine intelligence is the last invention that humanity will ever need to make.”
— Nick Bostrom, Swedish philosopher at Oxford
This post barely touched the tip of the iceberg that is machine learning. I encourage you to go and do some research of your own. There are hundreds of tutorials, but I highly recommend Andrew Ng’s introductory course on Coursera. I advise you to brush up upon your high school math and revise some of the fundamental concepts as machine learning can get very technical, very fast. You should also be able to fluently program in at least one ‘mathematical’ language, preferably Python, R, Octave, or MATLAB. This Median post lists some great resources you can use for machine learning.
I hope you enjoyed this introduction to machine learning, where we unveiled some of the haze surrounding the topic. The possibilities of machine learning are endless and it is incredibly important to learn how this exponential technology works and how it can be applied.