Take Two: Amazon Machine Learning - Does it really learn?

Extrapolation

As already argued in the previous post, learning is not just the capability of memorizing values, but also the capability to apply the gained knowledge on something new. For that I also had a look at the following predictions of tuples which are entirely out of the set of training data:

Tuples where a = 10 (and b, c in {1, 7}) — called “block A”
Tuples where b = 10 (and a, c in {1, 7}) — called “block B”
Tuples where c = 10 (and a, b in {1, 7}) — called “block C”

This creates another 147 tuples of which the “real answers” are distributed like this:

Block A has 7 positive answers and 42 negative answers.
Block B has the same as block A (due to the commutative law).
Block C has 28 positive answers and 21 negative answers.

Querying the model for predicting the answers, here is what you get:

110 (out of 147) have been predicted correctly. 39 have been predicted wrongly. This is an error rate of 26.5%.
The tuples which have been predicted wrongly have intermediate values ranging from 6 to 35.
All wrong predictions in block C have an intermediate value of 6.
All wrong predictions in block C have a very high score of (exactly) 974135.
With the exception of tuple (a = 10, b = 4, c = 5) the model obeys the commutative law between block A and B also in its wrong predictions.

Conclusion

Looking at the results you may come to the following conclusions:

Compared to the previous setup, memorization of training data did not work out anymore so well. This could be based on the fact that Amazon has changed something providing their service or that with the new model a different machine learning algorithm was used by the service.
With the new setup, intrapolating predictions do not always work. Their error rate, however, is below statistical significance. Moreover, the errors are made exactly where they are expected: near the threshold value of the comparison. For many practical applications, this kind of error may be very well acceptable.
Extrapolating predictions on the other side still make clear that machine learning is not the same as knowledge: In our sample the error rate was above 20% (and thus statistically significant) even though basic properties (like the commutative law) may be retained.

Pages: 1 2 3

Nico's Blog

Hints that matter

Amazon Machine Learning – Take Two

Extrapolation

Conclusion

Leave a Reply