{"id":1055,"date":"2018-05-10T17:28:37","date_gmt":"2018-05-10T15:28:37","guid":{"rendered":"http:\/\/blog.schmoigl-online.de\/?p=1055"},"modified":"2018-05-22T00:44:20","modified_gmt":"2018-05-21T22:44:20","slug":"amazon-machine-learning-does-it-really-learn","status":"publish","type":"post","link":"http:\/\/blog.schmoigl-online.de\/?p=1055","title":{"rendered":"Amazon Machine Learning &#8211; Does it really learn?"},"content":{"rendered":"<p>The term &#8220;Machine Learning&#8221; today has to be in everybody&#8217;s mouth. No software company may show up on the market without having a product which does not have at least a tag attached labeled with this buzzword. One of the most famous cases is Amazon, which &#8211; for example &#8211; promotes its product called &#8220;Amazon Machine Learning&#8221;. They claim that<\/p>\n<blockquote><p>The service uses powerful algorithms to create ML models by finding patterns in your existing data. Then, Amazon Machine Learning uses these models to process new data and generate predictions for your application.<\/p><\/blockquote>\n<p>Source: <a href=\"https:\/\/aws.amazon.com\/aml\/?nc1=h_ls\">https:\/\/aws.amazon.com\/aml\/?nc1=h_ls<\/a><\/p>\n<p>But how far does this &#8220;learning&#8221; go already? If you listen to the news, you could assume that in the future, learning is no longer a task for humans anymore, but soon will only be done by machines. Let&#8217;s see how far real life really is. <!--more--><\/p>\n<h3>The Scenario<\/h3>\n<p>For doing exactly this, we want to conduct a little experiment with &#8220;Amazon Machine Learning&#8221;. The service offering permits developers and data scientist to create Machine Learning models (ML models) based on a table typically provided in the CSV file format. The dataset provided (by default) is split up by 70% of data to be used for training the machine learning model, and the rest (30%) of the data is used to verify the quality of the learning process (Amazon calls this an &#8220;Evaluation&#8221;). Afterwards, Amazon allows to tweak the probability to adjust the likelihood of false positives and negatives. Finally, it is possible to query the model for its prediction either in a real-time fashion (which would be used for single requests where latency is critical), or in batch mode. Pricing is both on training and requesting predictions, whereas the pricing model differs between real-time and batch requests.<br \/>\nThe <a href=\"https:\/\/aws.amazon.com\/aml\/?nc1=h_ls\">tutorial<\/a> describes all the necessary steps for this. Pricing can be found at the <a href=\"https:\/\/aws.amazon.com\/aml\/?nc1=h_ls\">usual Amazon Pricing pages<\/a>.<\/p>\n<p>The most likely simplest thing a computer may &#8220;learn&#8221; is mathematics: It is logical, highly predictable, and reproducible as often as you may want it to be. So, as Machine Learning gurus claim that their algorithms are capable to master the irrationalness of human decision making, mathematical formulas should be a piece of cake for them.<\/p>\n<p>To test the abilities of Amazon&#8217;s machine learning offering I have set up the following test scenario for it:<\/p>\n<ul>\n<li>Let&#8217;s take three variables, called <strong>a<\/strong>, <strong>b<\/strong>, and <strong>c<\/strong>, each one being integers ranging from 1 to 7. <\/li>\n<li>Let&#8217;s calculate an (intermediate) result based on the following formula: <em>result = (a * b &#8211; c) mod 17<\/em>, where &#8220;mod&#8221; indicates the rest calculation of the division (e.g. modulo calculation). So, for example <em>5 mod 17<\/em> equals 5, <em>17 mod 17<\/em> equals 0, and <em>18 mod 17<\/em> equals 1. All these operations are very well-known and already trivial hardware is capable to implement them very efficiently.<\/li>\n<li>To further simplify the case for the machine learning model, we provide a projection to a binary field, which is defined like this: If the (intermediate) <em>result<\/em> (see above) is greater (and not equal) to 9, we assume that the result is truthy, false otherwise. Let&#8217;s call this attribute the &#8220;target (value)&#8221; (adopting Amazon&#8217;s nomenclature).<\/li>\n<li>Due to the fact that the modulo divisor is a prime number and slightly less than 18 (which is 2*9), we do have a small bias towards the false direction. <\/li>\n<\/ul>\n<p>(continued on next page)<br \/>\n<!--nextpage--><\/p>\n<h3>How does this fit to Amazon Machine Learning?<\/h3>\n<p>The most prominent use case provided by Amazon in their tutorial is a human decision-making process, based on a set of categorizations. The general idea with it is based on these characteristics, a binary decision (such as &#8220;will buy a product or not&#8221;) is being made. Machine learning shall support (based on a probabilistic approach) the identification of people with buying-will. <\/p>\n<p>In the scenario depicted above, we are simulating that kind of decision: The three variables <strong>a<\/strong>, <strong>b<\/strong>, and <strong>c<\/strong> represent three characteristics based on which a decision could be taken. The decision itself is represented by the target value. Due to the fact that the formula used above always yields the same result, the decision-making process can be considered 100% rational. <\/p>\n<h3>Generation of Input Data<\/h3>\n<p>Machine learning models require a set of data, which can be used for the learning process. Typically, the number of records for such training data is in the order of thousands (if not millions, cf. Big Data). As the scenario should stay clear, we will only provide a little redundancy to the model. That is why for each combination for each variable ranging from 1 to 7 the corresponding result and the target value is being computed. Note though, that some values with <em>a = 1<\/em> and <em>a = 2<\/em> are left out (we will come to that a little later). Yet, the number of records are not sufficient yet. That is why the same block of records (with exactly the same values) are being repeated five times. Also note that the (intermediate) result is <strong>not<\/strong> part of the input data for the ML model (as this would be too easy). You may find the resulting input file for your reference attached to this post.<br \/>\n<p><img decoding=\"async\" src=\"http:\/\/blog.schmoigl-online.de\/wp-content\/plugins\/wp-downloadmanager\/images\/ext\/unknown.gif\" alt=\"\" title=\"\" style=\"vertical-align: middle;\" \/>&nbsp;&nbsp;<strong><a href=\"http:\/\/blog.schmoigl-online.de\/?dl_id=6\">input-data-for-amazon-ml.csv<\/a><\/strong> (1.9 KiB, 691 hits)<\/p><\/p>\n<h3>Generating the Machine Learning Model<\/h3>\n<p>Equipped with the input data as depicted above, we may proceed now with generating the ML Datasource and the ML model, very similarly as described in Amazon&#8217;s official tutorial. On creating the datasource, Amazon visualizes the data distribution of the binary target attribute by stating (descriptively) that there are 29 falsy observations and only 11 truthy observations in the data set. As already stated above, automatically, the first 70% of the data is being used for the learning process; the other 30% is being used to assess the learning quality. <\/p>\n<p>After less than 10 minutes processing (wall clock) time, the services returns a model, whose AUC (area under curve) is 1.000, which indicates a &#8220;perfect, not-improvable&#8221; quality. This superior result is also clearly visible in the &#8220;ML model performance analysis&#8221;, which Amazon visualizes for this model (the perfect match is indicated by the fact that the two curves do not overlap at all):<\/p>\n<div id=\"attachment_1062\" style=\"width: 521px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.schmoigl-online.de\/wp-content\/uploads\/auc1.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1062\" src=\"http:\/\/blog.schmoigl-online.de\/wp-content\/uploads\/auc1.png\" alt=\"\" width=\"511\" height=\"370\" class=\"size-full wp-image-1062\" srcset=\"http:\/\/blog.schmoigl-online.de\/wp-content\/uploads\/auc1.png 511w, http:\/\/blog.schmoigl-online.de\/wp-content\/uploads\/auc1-300x217.png 300w\" sizes=\"auto, (max-width: 511px) 100vw, 511px\" \/><\/a><p id=\"caption-attachment-1062\" class=\"wp-caption-text\">ML model performance Analysis<\/p><\/div>\n<p>It is based on the fact that we have created a repetitive set tuples as input data. Due to the 70:30 ratio, the evaluation was based on exactly the same values with which the model was trained before. Therefore, setting the score threshold is quite arbitrary. To be identifiable, we will set it to 0.6.<\/p>\n<p>Querying some sample values like <strong>a = 3, b = 3, c = 3<\/strong>, and <strong>a = 4, b = 4, c = 4<\/strong> return the expected result. It seems that Amazon&#8217;s machine learning did the trick and has &#8220;learned&#8221; what can be learned&#8230;<\/p>\n<h3>Looking a bit closer<\/h3>\n<p>All this so far looked like that the machine is able to master mathematical computations like nothing: Also conducting a test of all 40 values, which we made available with the input data, showed that the expected result is equal to the actual result (i.e. the model responded truthy, if the input data was truthy and the model responded falsy, if the input data indicated that the target value was falsy).<\/p>\n<p>However, &#8220;learning&#8221; is not just the activity of reproducing memorized information &#8211; the technique of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Memoization\">memoization<\/a> is already known to computer science for decades (it came up in the late 1960s) and may be easily made persistent in a relation database management system. Instead, the most important effect is the capability to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Transfer_of_learning\">transfer the learned<\/a>. That is where the missed records mentioned above kick in: Note that, for example, we did not provide the target value <strong>a = 1, b = 2, c = 2<\/strong> in the input data! So, querying for that tuple (and such alike) might give an indication on how strong the &#8220;real learning effect&#8221; really is. See yourself:<\/p>\n<table border=\"1\">\n<tr>\n<th>a<\/th>\n<th>b<\/th>\n<th>c<\/th>\n<th>intermediate result<\/th>\n<th>expected target<\/th>\n<th>actual target<\/th>\n<th>reported score<\/th>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>2<\/td>\n<td>2<\/td>\n<td>0<\/td>\n<td>false<\/td>\n<td>false<\/td>\n<td>0.328229159116745<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>2<\/td>\n<td>3<\/td>\n<td>16<\/td>\n<td>true<\/td>\n<td>false<\/td>\n<td>0.0001738734426908195<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td>1<\/td>\n<td>3<\/td>\n<td>16<\/td>\n<td>true<\/td>\n<td>false<\/td>\n<td>0.0001329469378106296<\/td>\n<\/tr>\n<\/table>\n<p>The result is not very appealing: Though the response for <strong>a = 1, b = 2, c = 2<\/strong> is correct, confidence from the model (the score value) could be higher. Moreover, the other two results provided are simply wrong &#8211; and also the model appears to know that (see the low score values)!<\/p>\n<p>An even more interesting result is provided as soon as we leave the pre-defined range for a, b, and c between 1 and 7:<\/p>\n<table border=\"1\">\n<tr>\n<th>a<\/th>\n<th>b<\/th>\n<th>c<\/th>\n<th>intermediate result<\/th>\n<th>expected target<\/th>\n<th>actual target<\/th>\n<\/tr>\n<tr>\n<td>8<\/td>\n<td>8<\/td>\n<td>8<\/td>\n<td>5<\/td>\n<td>false<\/td>\n<td>false<\/td>\n<\/tr>\n<tr>\n<td>8<\/td>\n<td>8<\/td>\n<td>9<\/td>\n<td>4<\/td>\n<td>false<\/td>\n<td>false<\/td>\n<\/tr>\n<tr>\n<td>8<\/td>\n<td>9<\/td>\n<td>8<\/td>\n<td>13<\/td>\n<td>true<\/td>\n<td>false<\/td>\n<\/tr>\n<tr>\n<td>12<\/td>\n<td>12<\/td>\n<td>12<\/td>\n<td>13<\/td>\n<td>true<\/td>\n<td>false<\/td>\n<\/tr>\n<tr>\n<td>44<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>9<\/td>\n<td>false<\/td>\n<td>false<\/td>\n<\/tr>\n<tr>\n<td>45<\/td>\n<td>1<\/td>\n<td>1<\/td>\n<td>10<\/td>\n<td>true<\/td>\n<td>false<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>1<\/td>\n<td>18<\/td>\n<td>0<\/td>\n<td>false<\/td>\n<td>false<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>1<\/td>\n<td>19<\/td>\n<td>16<\/td>\n<td>true<\/td>\n<td>false<\/td>\n<\/tr>\n<\/table>\n<p>The table above gives an indication to another effect: Observe that for each tuple, where at least one value is used, which is out of the initial range, the target value <strong>false<\/strong> is returned! This observation can also be verified with all tuples having the following patterns:<\/p>\n<ul>\n<li><em>(x,x,x)<\/em> whereas 8 <= x <= 50<\/li>\n<li><em>(1,1,x)<\/em> whereas 8 <= x <= 50<\/li>\n<li><em>(x,1,1)<\/em> whereas 8 <= x <= 50<\/li>\n<\/ul>\n<p>Given the initial data analysis observation that in the training data, there are more falsy values than truthy one&#8217;s, defaulting to the value <strong>false<\/strong> seems reasonable. However, I would not dare to call &#8220;defaulting to a single value whose probability is the highest&#8221; to be the same as &#8220;learning&#8221;.<\/p>\n<h3>Descriptive Analysis of Predictions<\/h3>\n<p>A complete list of all predictions made by the ML model with <em>(x,y,z)<\/em>, whereas each variable ranges from 1 to 10, can be downloaded here: <p><img decoding=\"async\" src=\"http:\/\/blog.schmoigl-online.de\/wp-content\/plugins\/wp-downloadmanager\/images\/ext\/unknown.gif\" alt=\"\" title=\"\" style=\"vertical-align: middle;\" \/>&nbsp;&nbsp;<strong><a href=\"http:\/\/blog.schmoigl-online.de\/?dl_id=7\">amazon-ml-predictions-st0.6.csv<\/a><\/strong> (32.4 KiB, 692 hits)<\/p><\/p>\n<p>Removing all records which were already part of the input data, you will observe the following statistical facts:<\/p>\n<ul>\n<li>420 out of 961 records have been predicted wrongly; vice versa, 541 records have been predicted properly. This is a &#8220;hit ratio&#8221; of 56.3%. Note that it is close but not equal to the score threshold.<\/li>\n<li>Guessing based on the modulo which we have provided, the expectation for always defaulting to <strong>false<\/strong> is 9\/17, or 52.9%. Thus, the advantage of the ML model is 3.4%.<\/li>\n<li>In the input data, however, we have a biased data set, which indicates a probability of 29\/40 (or 72.5%) in favor of the value <strong>false<\/strong>.\n<li>Out of the 420 wrong answers\n<ul>\n<li>102 are false negatives,<\/li>\n<li>318 are false positives,<\/li>\n<li>150 answers had a score of 0.05 or higher,<\/li>\n<li>57 answers even had a score of 0.95 or higher suggesting a very high confidence.<\/li>\n<\/ul>\n<\/ul>\n<h3>Sensitivity Analysis<\/h3>\n<p>To determine the influence of the score threshold, the entire analysis depicted above has been repeated with a deviating score threshold of 0.1. The corresponding result is documented in the following file:<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/blog.schmoigl-online.de\/wp-content\/plugins\/wp-downloadmanager\/images\/ext\/unknown.gif\" alt=\"\" title=\"\" style=\"vertical-align: middle;\" \/>&nbsp;&nbsp;<strong><a href=\"http:\/\/blog.schmoigl-online.de\/?dl_id=8\">amazon-ml-predictions-st0.1.csv<\/a><\/strong> (32.5 KiB, 731 hits)<\/p>\n<p>Looking at the data there, you may observe:<\/p>\n<ul>\n<li>All predictions whose record was already available in the input data, is answered correctly.<\/li>\n<li>472 out of 961 records have been predicted wrongly<\/li>\n<li>vice versa, 489 records have been predicted properly.<\/li>\n<li>This is a &#8220;hit ratio&#8221; of 50.9%. Note that this is close to the probability of flipping a (fair) coin.<\/li>\n<\/ul>\n<p>If this was a test performed with humans, you could suspect that the test person was guessing non-biased and randomly. <\/p>\n<p>The resulting histogram of the scores provided is shown in the following diagram:<\/p>\n<div id=\"attachment_1106\" style=\"width: 615px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.schmoigl-online.de\/wp-content\/uploads\/scorehistogram.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1106\" src=\"http:\/\/blog.schmoigl-online.de\/wp-content\/uploads\/scorehistogram.png\" alt=\"\" width=\"605\" height=\"340\" class=\"size-full wp-image-1106\" srcset=\"http:\/\/blog.schmoigl-online.de\/wp-content\/uploads\/scorehistogram.png 605w, http:\/\/blog.schmoigl-online.de\/wp-content\/uploads\/scorehistogram-300x169.png 300w\" sizes=\"auto, (max-width: 605px) 100vw, 605px\" \/><\/a><p id=\"caption-attachment-1106\" class=\"wp-caption-text\">Histogram of scores values<\/p><\/div>\n<h3>Conclusion<\/h3>\n<p>This small analysis has shown that the default configuration of the Amazon Machine Learning service is capable of memo(r)izing simple mathematically-computed values. However, the term &#8220;to learn&#8221; is understood differently in the streets: Speakers associate with it a certain degree of capability to transfer knowledge gained previously to a different but similar case afterwards. The small test vectors depicted above do not indicate that the Amazon Machine Learning service is capable to do so based on a test using a simple mathematical correlation.<\/p>\n<h3>Limitations<\/h3>\n<p>The analysis depicted in this post is only a spot check. Various reasons may exists, why your test may provide different results. Amongst these are:<\/p>\n<ul>\n<li>Amazon might have changed the service to provide better results without notification.<\/li>\n<li>Your dataset might suite better to Amazon&#8217;s algorithm.<\/li>\n<li>The set up of this test might be biased in such a way that Amazon could not handle the results better.<\/li>\n<li>The execution of the scenario might have been faulty.<\/li>\n<li>This is only a one-time execution of the scenario; multiple executions of this test may provide deviating results, as the ML model derivation consists of drawing at least one random number.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Machine Learning today is in everybody&#8217;s mouth. Amazon Machine Learning services is said to be an offering. Politicians and traders are tasting the &#8220;next big thing&#8221; to be ahead. But does machine learning really deserve to be said to &#8220;learn&#8221;? Let&#8217;s see&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[38],"tags":[],"class_list":["post-1055","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"_links":{"self":[{"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/posts\/1055","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1055"}],"version-history":[{"count":72,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/posts\/1055\/revisions"}],"predecessor-version":[{"id":1140,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=\/wp\/v2\/posts\/1055\/revisions\/1140"}],"wp:attachment":[{"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1055"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1055"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.schmoigl-online.de\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1055"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}