The term “Machine Learning” today has to be in everybody’s mouth. No software company may show up on the market without having a product which does not have at least a tag attached labeled with this buzzword. One of the most famous cases is Amazon, which – for example – promotes its product called “Amazon Machine Learning”. They claim that
The service uses powerful algorithms to create ML models by finding patterns in your existing data. Then, Amazon Machine Learning uses these models to process new data and generate predictions for your application.
Source: https://aws.amazon.com/aml/?nc1=h_ls
But how far does this “learning” go already? If you listen to the news, you could assume that in the future, learning is no longer a task for humans anymore, but soon will only be done by machines. Let’s see how far real life really is.
The Scenario
For doing exactly this, we want to conduct a little experiment with “Amazon Machine Learning”. The service offering permits developers and data scientist to create Machine Learning models (ML models) based on a table typically provided in the CSV file format. The dataset provided (by default) is split up by 70% of data to be used for training the machine learning model, and the rest (30%) of the data is used to verify the quality of the learning process (Amazon calls this an “Evaluation”). Afterwards, Amazon allows to tweak the probability to adjust the likelihood of false positives and negatives. Finally, it is possible to query the model for its prediction either in a real-time fashion (which would be used for single requests where latency is critical), or in batch mode. Pricing is both on training and requesting predictions, whereas the pricing model differs between real-time and batch requests.
The tutorial describes all the necessary steps for this. Pricing can be found at the usual Amazon Pricing pages.
The most likely simplest thing a computer may “learn” is mathematics: It is logical, highly predictable, and reproducible as often as you may want it to be. So, as Machine Learning gurus claim that their algorithms are capable to master the irrationalness of human decision making, mathematical formulas should be a piece of cake for them.
To test the abilities of Amazon’s machine learning offering I have set up the following test scenario for it:
- Let’s take three variables, called a, b, and c, each one being integers ranging from 1 to 7.
- Let’s calculate an (intermediate) result based on the following formula: result = (a * b – c) mod 17, where “mod” indicates the rest calculation of the division (e.g. modulo calculation). So, for example 5 mod 17 equals 5, 17 mod 17 equals 0, and 18 mod 17 equals 1. All these operations are very well-known and already trivial hardware is capable to implement them very efficiently.
- To further simplify the case for the machine learning model, we provide a projection to a binary field, which is defined like this: If the (intermediate) result (see above) is greater (and not equal) to 9, we assume that the result is truthy, false otherwise. Let’s call this attribute the “target (value)” (adopting Amazon’s nomenclature).
- Due to the fact that the modulo divisor is a prime number and slightly less than 18 (which is 2*9), we do have a small bias towards the false direction.
(continued on next page)