![]() Such a scenario is prevalent in high-speed network routers where counting needs to be performant in a very short time window on the fast but expensive Static random-access memory (SRAM). More concretely, consider the scenario where you are limited to a 32 bit register and need to count to numbers larger than 2³² -1= 65,535. Suppose you have a limited amount of memory to work with, but have to count really large numbers. One notable and important problem is the counting of large numbers, and the related problem of determining the size (cardinality) of a large set of items, a problem faced by every modern day database system. Counting large numbersĬounting has always been a standard focus of computer science since the early days of computing. We can do this easily by this simple comparison, assured that our model is performing not out of random chance, but is better than that. Conversely, if it did beat that benchmark, then we would like to quantify by how much. We can see how this can be applied to any sort of binary class prediction scenario, and the randomized predictor serves as a simple sanity check: if a model cannot beat the benchmark then it is time to go back to the drawing board. Note that accuracy is not the only measure of performance others such as the false positive rate, and precision and recall can be used too. Any prediction model would have to beat this new random predictor benchmark. Then, using the coin method above, we now simulate a biased coin with a 30% probability of heads, tails otherwise. We can go even further: suppose we know the historical base click-through rate of the ad is, say 30%. This is why it’s important to measure accuracy of the model, and then compare it to the accuracy of the random predictor (which would be, on average, at 50% accuracy). Now the objective of any trained model is simple: it has to at least beat the random predictor. If we assign heads and tails for a click and non-click on an ad respectively with a probability of 50%, then randomly assigning heads and tails to predict ad clicks by tossing a coin, we get a random classifier. Remarkably, any trained model in this scenario would have to be benchmarked against a simple coin flip. What is the simplest benchmark could you use to know if your model is performant? If they were engaged on your platform, you could use features based on what they did when they used the platform to improve the performance of your model. You use some information about your user, for instance, which country they came from, their demographic information, the landing page they came from and a set of other features. Imagine now that you have trained your machine learning model, say, for predicting click-through rates of an ad on a webpage, given some user context information. Though I don’t necessarily advocate living a life based on coin flips, as it turns out, coin flips and the underlying statistical principles that govern coin flips are particularly effective when applied to some problems regularly faced in data. Ironically though, in order to deal out some comeuppance, Donald did manage to chase down the charlatan Professor Batty by finding the fraud behind the right door based on a coin flip, so perhaps the philosophy does hold some merit (or more likely, merely demonstrating the power of the author). A coin flip for each decision resulted in a series of mishaps for poor Donald. Now, in the comic, unfortunately, flipism didn’t work out well for Donald. “Life is but a gamble! Let flipism chart your ramble!” ![]() ![]() As the inventor of the philosophy, Professor Batty proclaimed: The premise is simple: for every crossroad in life requiring a decision, choose a face of a coin (heads or tails), toss the coin, and then make a decision based on the outcome of the coin flip. In 1953, a Donald Duck comic strip entitled “Flip Decision”, written by Carl Barks, proposed a pseudo-philosophy called flipism.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |