I wished to do some machine learning for binary classification. Binary classification is perhaps the most basic of all supervised learning problems. Unsurprisingly julia has many libraries for it. Today we are looking at: LIBLINEAR (linear SVMs), LIBSVM (Kernel SVM), XGBoost (Extreme Gradient Boosting), DecisionTrees (RandomForests), Flux (neural networks), TensorFlow (also neural networks).
In this post we are only concentrating on their ability to be used for binary classification. Most (all) of these do other things as well. We’ll also not really be going into exploring all their options (e.g. different types of kernals).
Furthermore, I’m not rigeriously tuning the hyperparameters so this can’t be considered a fair test for performance. I’m also not performing preprocessing (e.g. many classifies like it if you standarise your features to zero mean unit variance). You can look at this post more as talking above what code for that package looks like, and this is roughly how long it takes and how well it does out of the box.
It’s more of a showcase of what packages exist.
For TensorFlow and Flux, you could also treat this as a bit of a demo in how to use them to define binary classifiers.
Since they don’t do it out of the box.
This post, like most of my posts, is backed by a jupyter notebook.
Feel free, encouraged even, to download and run that, or view it on github, etc.
Also to raise issues on that repository.
The Task: Predict if that part of the Australian Flag is Blue
This is on the mildly gnarly side of binary classification problems. The classifying regions are:
- Not linearly seperable
- You can’t draw a line such that on one since is all the blue parts and on the other is all the nonblue parts.
- Not connected
- The stars, for example, are entirely separated by blue background regions
- Not convex
- With-in a section of one color, you can draw a line between two points in the same colored region and have it exit that section, then reenter.
- Unbalanced classes
- Most of the image is blue.
So it seams like a good, difficult, problem.
Data Generation
An image of the flag gives us one datum per pixel. We’re going to sample that, just so that plotting is easier.
Input:
using Images, FileIO
Input:
img = load(download("https://upload.wikimedia.org/wikipedia/en/thumb/b/b9/Flag_of_Australia.svg/320px-Flag_of_Australia.svg.png"));
Output:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3333 100 3333 0 0 2494 0 0:00:01 0:00:01 --:--:-- 2494
Input:
isblue(pixel) = pixel.b > pixel.r && pixel.b > pixel.g
Output:
isblue (generic function with 1 method)
Input:
colorview(Gray, .!(isblue.(img)))
Output:
Input:
const all_feature1 = Vector{Float64}()
const all_feature2 = Vector{Float64}()
const all_labels = Vector{Bool}()
@inbounds for ind in eachindex(IndexCartesian(), img)
pixel = img[ind]
push!(all_labels, isblue(pixel))
push!(all_feature1, ind.I[1])
push!(all_feature2, ind.I[2])
end
const all_features = [all_feature1'; all_feature2']
# standard julia Observations are in final index form (i.e columns of matrixes)
Any[all_features; all_labels']
Output:
3×51200 Array{Any,2}:
1.0 2.0 3.0 4.0 … 157.0 158.0 159.0 160.0
1.0 1.0 1.0 1.0 320.0 320.0 320.0 320.0
false false false false true true true true
Normally I would do this data munging using MLDataUtils.jl, which I have blogged about before (though it might be nice to few more posts about it, it is a great package, and I don’t know that I’ve fully covered its capacities).
But since I am already about to introduce 6 packages, I thought I would minimize talking about other ones.
Input:
const all_inds = shuffle(1:length(all_labels))
const test_inds = all_inds[1:end÷5] # first 20%
const train_inds = all_inds[19end÷20:end] # last 5%
const test_features = all_features[:, test_inds]
const test_labels = all_labels[test_inds]
const train_features = all_features[:, train_inds]
const train_labels = all_labels[train_inds];
Input:
using Plots
pyplot() # Using PyPlot, because the SVGs that GR makes kill browsers with too many paths at this scale
function plotflag(xs,ys; title="")
scatter(xs[2,:],-xs[1,:]; zcolor=ys,
markersize=2, markerstrokealpha=0, bg=colorant"gray", seriescolor=:blues, title=title)
end
Input:
plotflag(train_features, train_labels, title="Training Data")
Output:
Input:
plotflag(test_features, test_labels, title="Test Data")
Output:
Interface
As was discussed on the julia slack yesterday. There is a real problem with a lack of consistency in our ML packages right now.
So I am going to take a leaf from XKCD #927, and define one.
StatsBase.fit(modeltype, features, labels)
returns a model of that type that is trained on those features and labels.- Since we are only interested in binary classification, labels witll be an
AbstractVector{Bool}
with one entry per column of the feature matrix
- Since we are only interested in binary classification, labels witll be an
StatsBase.fit!(model, features, labels)
- Mutating form of the above. (
fit
is basically a construtor) - useful for allowing retraining/hot-starting
- Mutating form of the above. (
StatsBase.predict(model, features)
returns a vector of estimated probabilities of classification being true- one entry per column in features.
Something like this is actually in use in a bunch of places already, just not these packages, it seems..
Some packages (LibSVM
, DecisionTrees.jl
) use the same names, from ScikitLearnBase
, but they go sideways (i.e. observations in rows, Python style).
I think the real solution to a good interface does need to be thinking more like (or using) MLDataUtils.jl, which is observation dimention agnostic, defaulting to normal julia practice (ObsDim.Last()
).
Using these we can define our metrics, etc. It might be nicer to be using MLMetrics.jl to do this for us. But I’ll just do it simply here.
Input:
import StatsBase: fit!, fit, predict
classify(model, features) = predict(model, features).>0.5
accuracy(model, features, ground_truth_labels) = mean(classify(model, features) .== ground_truth_labels)
Output:
accuracy (generic function with 1 method)
Evaluation function
Given a common interface we can write one function to evaluate them all.
Accessing our training and test data as a global variable.
(Obviously not a good idea normally).
Input:
percent(x) = @sprintf("%0.2f%%", 100*x)
function evaluate(modeltype)
@time model = fit(modeltype, train_features, train_labels)
println("$modeltype Train accuracy: ", percent(accuracy(model, train_features, train_labels)))
println("$modeltype Test accuracy: ", percent(accuracy(model, test_features, test_labels)))
#this is calculating the predict twice (since we did it to report accuaracy already), but predict is cheap
plotflag(test_features, predict(model, test_features); title=string(modeltype))
end
Output:
evaluate (generic function with 1 method)
LIBLINEAR.jl
The linear SVM. Possibly the weakest classifier in modern use. It actually works ok for a lot of higher dimentional problems. In high dimensions it is easier for things to be linearly seperable.
It surprises me that the C backend was only created in 2008.
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A Library for Large Linear Classification, Journal of Machine Learning Research 9(2008), 1871-1874. Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear
Because we are interesting in getting probabilities back from predict
we are restricted to using L2R_LR
and L1R_LR
solver types, which are logistric regression.
This could probably be relaxed for most applications (but might break those metrics defintions above).
Input:
using LIBLINEAR
function fit(::Type{LinearModel}, features, labels; solver_type=LIBLINEAR.L2R_LR, kwargs...)
linear_train(labels, features; solver_type=solver_type, kwargs...)
end
function predict(model::LinearModel, features)
classes, probs = linear_predict(model, features; probability_estimates=true)
vec(probs)
end
Output:
predict (generic function with 2 methods)
Input:
evaluate(LinearModel)
Output:
0.735264 seconds (158.51 k allocations: 8.306 MiB)
LIBLINEAR.LinearModel Train accuracy: 79.85%
LIBLINEAR.LinearModel Test accuracy: 80.88%
We can see from the plot that it is basically a gradient, of how much blue is in an area. This is as expected.
LIBSVM.jl
The more general SVM package. We’re here for its kernal SVM classifers. Again I am surprised that the backend was created so recently: 2005
Since version 2.8, it implements an SMO-type algorithm proposed in this paper: R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training SVM. Journal of Machine Learning Research 6, 1889-1918, 2005. https://www.csie.ntu.edu.tw/~cjlin/libsvm/
We’re looking at SVC
, in this example.
The other types of interst here would be NuSVC
, and LinearSVC
(but we got that covered by LIBLINEAR)
Input:
import LIBSVM:svmtrain, SVM, svmpredict
function fit(::Type{SVM{Bool}}, features, labels; solver_type=LIBLINEAR.L2R_LR, kwargs...)
#could use ScikitLearnBase.fit!(SVC, features, Float64.(labels)), but it doesn't take extra args same way.
svmtrain(features, labels; probability=true, kwargs...)
end
function predict(model::SVM{Bool}, features)
classes, probs = svmpredict(model, features)
probs[1,:]
end
Output:
predict (generic function with 3 methods)
Input:
evaluate(SVM{Bool})
Output:
5.028115 seconds (128.57 k allocations: 6.933 MiB)
LIBSVM.SVM{Bool} Train accuracy: 99.96%
LIBSVM.SVM{Bool} Test accuracy: 91.69%