Machine Learning in Splunk - Say hello to the MLTK
21/08/19 – Author: Martyn O’Connor, MBE – Professional Services Consultant at Somerford Associates
Doesn’t Splunk already have machine learning capabilities?
For those who have been using Splunk for a while, you may have noticed that there are some slightly interesting search commands, like predict and kmeans. These commands do interesting things with your data. Unsurprisingly, predict allows you to make predictions on future trends in your data based on the past. Perhaps less obvious is what kmeans does. The name derives from k-means clustering, a mathematical technique to sort data points into groups, or clusters. Both of these commands are simple implementations of common machine learning algorithms, but they aren’t quite a fully fledged machine learning solution.
Yeah, but real machine learning is hard… isn’t it?
In a word… yes. Machine learning is something that has been talked about a lot in recent years, and its gradual entrance into our lives (Alexa, Siri, Netflix movie recommendations, those seemingly frighteningly relevant adverts you see when browsing the web) may have made it seem like it can’t be that hard. However, at its heart, machine learning is applied mathematics… and in some cases very hard applied mathematics at that. Actually doing something effective with machine learning, until recently, meant you probably needed a degree in mathematics, a strong familiarity with statistical modelling, and some good skills in a computer language like Python.
Essentially, these requirements meant the bar was simply too high for most people. So wouldn’t it be nice if someone could simplify it all, and take away the need for a degree in maths and skill with Python? Enter the Machine Learning Toolkit!
So what is the Machine Learning Toolkit then?
I should perhaps begin by explaining what exactly the Machine Learning Toolkit is. Essentially the Machine Learning Toolkit is two things:
- A free extension to Splunk’s core functionality with new search commands
- An integration of recognised industry-standard machine learning algorithms
Of the two elements, I think the latter is the most important one to stress. In developing the Machine Learning Toolkit, Splunk did not try and develop their own implementation of machine learning algorithms, but instead they built the core functionality of the toolkit upon established, open source, industry-standard algorithms and libraries such as Scikit-Learn. So what does that mean for you? It means you are able to harness all of the power that comes from these libraries, but you don’t have to study and learn anywhere near as much complicated information to get up and running. Compared to the complicated Python shown above, getting started with machine learning in Splunk is considerably easier, as MLTK abstracts much of the complicated work away behind simple to use SPL commands:
In the example shown above, taking your data and then using it to train a model took as little as a single line of SPL (assuming you ignore the inputlookup that first populates the data). Once the model is trained on your data, it can be used to predict values for you just as easily, as shown below:
This example shows a simple Linear Regression use case, but the Machine Learning Toolkit has a great many different algorithms you can choose from.
Lots of algorithms? How do I know which is the right one?
Of course, machine learning algorithms come in many different shapes and sizes, and can be tailored to suit many different use cases. Luckily, the Machine Learning Toolkit comes with over 30 examples of different kinds of use cases, and how exactly you can leverage machine learning. Each comes with examples of the SPL you might use, as well as accompanying explanations so you’re not just learning to copy and paste, but learning to understand what the algorithm does and why it is (or perhaps isn’t) the best choice for your use case.
The Machine Learning Toolkit makes starting your journey into machine learning a much easier, and hopefully interesting one. If you’re already familiar with search in Splunk, you’ll have no problem at all getting going with the Machine Learning Toolkit. This handy cheat sheet and this quick start guide will be useful to have nearby when you begin using the Machine Learning Toolkit.