{"id":1141,"date":"2018-12-27T14:41:17","date_gmt":"2018-12-27T13:41:17","guid":{"rendered":"http:\/\/blog.schmoigl-online.de\/?p=1141"},"modified":"2018-12-27T14:59:14","modified_gmt":"2018-12-27T13:59:14","slug":"amazon-machine-learning-take-two","status":"publish","type":"post","link":"http:\/\/blog.schmoigl-online.de\/?p=1141","title":{"rendered":"Amazon Machine Learning &#8211; Take Two"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Since <a href=\"http:\/\/blog.schmoigl-online.de\/?p=1055\">my last post<\/a> analyzing Amazon Machine Learning (AML), I got quite some feedback from friends and colleagues about my findings. One of them included criticism about the setup of my test especially on the formula I used. As I consider this discussion valuable, with this post I would like to re-run my analysis with a slightly different approach to see if that changes anything significantly.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">Summary of Feedback<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The feedback I have received can be summarized  into the following to statements:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>For the set of values with <em>a=1<\/em> and <em>a=2<\/em>, there are (in total) only seven values provided to the model (in fact there are more, but they are only repeating what is already known to the model). That is to say that the model only has very limited knowledge in that space. Yet, my approach asks explicitly in that area trying to evaluate the quality of predictions.<\/li><li>Modulo computation with a prime as a base creates a field. Performing calculus in such a field can become a very complex task (due to the fact that it is wrapping higher numbers to lower ones). Additionally, it does not represent a typical decision behavior where additional interest usually also results in additional willingness to buy a product.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">New Setup<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I value this kind of feedback &#8211; and in fact reveals certain properties which are not desirable in such an analysis. That is why I would like to repeat the analysis in my original post with the following deviating setup:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>We again have <em>a<\/em>, <em>b<\/em>, and <em>c <\/em>as parameters, being in range <em>{1, 7}<\/em>.<\/li><li>Again, we also will use the formula <em>a * b &#8211; c<\/em> for computing an intermediate value. Note, however, that this time there is no modulus calculation involved.<\/li><li>To map our intermediate result to a binary decision, the condition <em>intermediate_result &gt; 10<\/em> is evaluated.<\/li><li>With <em>a, b, c<\/em> in <em>{1,7}<\/em>, we have a space of 343 values. Out of these there are 156 tuples which are considered truthy. The rest (187) are falsy.<\/li><li>For later verification, we leave the following values out of the training data: <ul><li>a = 6, b = 6, c = 6 (intermediate result = 30)<\/li><li>a = 7, b = 2, c = 3 (intermediate result = 11)<\/li><li>a = 7, b = 2, c = 4 (intermediate result = 10)<\/li><li>a = 7, b = 2, c = 5 (intermediate result = 9)<\/li><\/ul><\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">(continued on next page)<\/p>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">Execution of Test<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I performed the steps previously done using the new setup above. Without surprise the calculated AUC value returned was 1.0. For comparison I kept the score target to the default of 0.5.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I aggregated the results in this Excel sheet, which I have attached you for your reference.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"> <p><img decoding=\"async\" src=\"http:\/\/blog.schmoigl-online.de\/wp-content\/plugins\/wp-downloadmanager\/images\/ext\/unknown.gif\" alt=\"\" title=\"\" style=\"vertical-align: middle;\" \/>&nbsp;&nbsp;<strong><a href=\"http:\/\/blog.schmoigl-online.de\/?dl_id=9\">AML-take2.xlsx<\/a><\/strong> (62.1 KiB, 769 hits)<\/p> <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Results<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Querying all 343 possible tuples, the model returned<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>335 correct predictions<\/li><li>and 8 <strong>wrong predictions<\/strong>, which makes a <strong>97.6% success rate<\/strong> or a <strong>2.3% error rate<\/strong>.<\/li><li>The <strong>wrong <\/strong>predictions are<ul><li><em>a = 2, b = 7, c = 3<\/em> (intermediate = 11, binary = true, prediction = false with a score of 418922)<\/li><li><em>a = 3, b = 4, c = 2<\/em> (intermediate = 10, binary = false, prediction = true with a score of 618853)<\/li><li><em>a = 3, b = 5, c = 5<\/em> (intermediate = 10, binary = false, prediction = true with a score of 543858) <\/li><li><em>a = 4, b = 3, c = 2<\/em> (intermediate = 10, binary = false, prediction = true with a score of 808096)<\/li><li><em>a = 4, b = 4, c = 6<\/em> (intermediate = 10, binary = false, prediction = true with a score of 696756)<\/li><li><em>a = 4, b = 4, c = 7<\/em> (intermediate = 9, binary = false, prediction = true with a score of 534369)<\/li><li><em>a = 5, b = 3, c = 5<\/em> (intermediate = 10, binary = false, prediction = true with a score of 529201) <\/li><li><em>a = 7, b = 2, c = 4<\/em> (intermediate = 10, binary = false, prediction = true with a score of 55007) <\/li><\/ul><\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">It is noteworthy that out of the eight wrong predictions, <strong>seven of them is data which had been available to the training phase<\/strong> &#8211; and one of the four values is part of the test tuples (<em>a = 7, b = 2, c = 4<\/em>). Vice versa, this means that three out of four of our test tuples have been &#8220;guessed&#8221; correctly. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Also looking at the score of the false predictions there is a pattern to observe: Whilst the scores<strong> are very high for tuples which the model had seen during training<\/strong> (between 418,922 and 808,096 &#8211; the maximal score value in the entire data set is 999,999.9), the tuple which was not observed yet has a lower &#8220;confidence level&#8221; by a factor of 10.  <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, looking at the intermediate values affected, you observe that only tuples <strong>with the intermediate values 9, 10 and 11<\/strong> are subject to wrong predictions. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\n(continued on next page)\n\n<\/p>\n\n\n\n<!--nextpage-->\n\n\n\n<h2 class=\"wp-block-heading\">Extrapolation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">As already argued in the previous post, learning is not just the capability of memorizing values, but also the capability to <strong>apply the gained knowledge on something new<\/strong>. For that I also had a look at the following predictions of tuples which are entirely out of the set of training data:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Tuples where <em>a = 10<\/em> (and <em>b, c<\/em> in <em>{1, 7}<\/em>) &#8212; called &#8220;block A&#8221;<\/li><li>Tuples where <em>b = 10<\/em> (and <em>a, c<\/em> in <em>{1, 7}<\/em>) &#8212; called &#8220;block B&#8221;<\/li><li>Tuples where <em>c = 10<\/em> (and <em>a, b<\/em> in <em>{1, 7}<\/em>) &#8212; called &#8220;block C&#8221;<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">This creates another 147 tuples of which the &#8220;real answers&#8221; are distributed like this:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Block A has 7 positive answers and 42 negative answers.<\/li><li>Block B has the same as block A (due to the commutative law).<\/li><li>Block C has 28 positive answers and 21 negative answers.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Querying the model for predicting the answers, here is what you get:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>110 (out of 147) have been predicted correctly. 39 have been predicted wrongly. This is an error rate of 26.5%.<\/li><li>The tuples which have been predicted wrongly have intermediate values ranging from 6 to 35.<\/li><li>All wrong predictions in block C have an intermediate value of 6. <\/li><li>All wrong predictions in block C have a very high score of (exactly) 974135.<\/li><li>With the exception of tuple (<em>a = 10, b = 4, c = 5<\/em>) the model obeys the commutative law between block A and B also in its wrong predictions.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Looking at the results you may come to the following conclusions:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Compared to the previous setup, memorization of training data did not work out anymore so well. This could be based on the fact that Amazon has changed something providing their service or that with the new model a different machine learning algorithm was used by the service. <\/li><li>With the new setup, intrapolating predictions do not always work. Their error rate, however, is below statistical significance. Moreover, the errors are made exactly where they are expected: near the threshold value of the comparison. For many practical applications, this kind of error may be very well acceptable. <\/li><li>Extrapolating predictions on the other side still make clear that machine learning is not the same as knowledge: In our sample the error rate was above 20% (and thus statistically significant) even though basic properties (like the commutative law) may be retained.<\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The sequel of the previous post analyzes a second setup to see if Amazon Machine Learning really &#8220;learns&#8221; &#8211; or just repeats what it has memorized. <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38],"tags":[],"class_list":["post-1141","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"_links":{"self":[{"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/posts\/1141","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1141"}],"version-history":[{"count":10,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/posts\/1141\/revisions"}],"predecessor-version":[{"id":1152,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/posts\/1141\/revisions\/1152"}],"wp:attachment":[{"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1141"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1141"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1141"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}