The differences between machine learning, data mining, and statistics

December 10, 2012

Topic

Statistics / data science

From machine learning to data mining. From statistics to probability. A lot of it seems similar, so what are the differences? Statistician William Briggs explains in an FAQ.

What’s the difference between machine learning, deep learning, big data, statistics, decision & risk analysis, probability, fuzzy logic, and all the rest?

None, except for terminology, specific goals, and culture. They are all branches of probability, which is to say the understanding and sometime quantification of uncertainty. Probability itself is an extension of logic.

I was surprised he didn’t throw data science into the mix, but you could and the document would pretty much be the same.

FlowingData Delivered to Your Inbox

7 Comments

somebody — December 10, 2012 at 5:13 am

There are branches of machine learning which do not utilize statistics. i. e. rule induction.
Jeroen Janssen — December 10, 2012 at 5:22 am

I’m sorry, but that link is entirely wrong about fuzzy logic being the same as probability theory, which makes me highly suspicious about the rest of the explanation. From my comment on the linked site:

“Isn’t fuzzy logic different than probability?

No. It sometimes has, like mathematics, many-valued “truths” (but so can probability models), but the theory itself is also evaluated with standard logic like probability. Fuzzy logic in practical applications makes statements of uncertainty or of things which are not certain, and that makes it probability. Fuzzy logic is one of the many rediscoveries of probability, but the best in the sense of possessing a cuddly slogan. Doesn’t fuzzy logic sound cute? Meow.”

I’m sorry, but this is completely wrong. Fuzzy logic has nothing to do with probability. The easiest way to see this is to consider the following example. Suppose someone gives you two drinks: drink one has a 10% chance of being poison and drink two is poisonous to degree 0.1. Which one would you rather drink? I would think drink number two, since it is poisonous to degree 0.1 and thus not very poisonous at all. Drink number two on the other hand has a 10% chance of being poison, hence if you are unlucky enough, you drink poison and die.

The point here is that in probability events are either true or false, i.e. if P(X=x) = p, then X is equal to x in p*100% of the cases. So if someone is big with probability 0.2, then 20% of people are big. In fuzzy logic on the other hand, if F(x) = f, then X is equal to x to degree f in all the cases. Note that I don’t use F(X=x) here since it does not make sense as there are no random variables in fuzzy logic. A fuzzy set is just a mapping from a single object (not a variable) to the degree to which the object corresponds to the concept. If someone is big to degree 0.2, he is not very big. Unlike probability, this does not state anything about the population. In other words, fuzzy logic does not express uncertainty, but rather allows one to calculate with vague quantities, which is something completely different from probability theory.
- Michele Filannino — December 10, 2012 at 8:39 am
  
  Jeroen Janssen is absolutely right! By the way, I don’t like the abstract view of the disciplines presented in the short answer.
- Evan Savage — December 10, 2012 at 10:41 am
  
  While I agree that the theoretical concepts of fuzzy set membership and probability are distinct, I think there’s some practical validity to the original author’s point.
  
  If I say “drink two is poisonous to degree 0.1”, how did I get that value? Perhaps I looked up known medical cases of “drink two”-related toxicity, or maybe I calculated LD50 on some hapless rats. Either way, it really is all just semantic/notational layers over the base concept of probability.
  
  Same applies to machine learning. Yes, rule induction exists; it’s just not really used anywhere. What do successful systems for spam filtering, personalized recommendations, and language translation have in common? Underneath the covers, they all manipulate probabilities.
  
  So: in theory, these areas may be somewhat distinct; in practice, it all ultimately boils down to ever more creative uses of probability.
  - Jeroen Janssen — December 11, 2012 at 5:29 am
    
    No, fuzzy logic does not “boil down to probability”. Let me try to re-explain. A lot of the confusion surrounding probability versus fuzzy logic most likely stems from the fact that originally fuzzy set theory was presented using the Zadeh connectives max for the disjunction, min for the conjunction and 1-x for the negation. There are a lot of other potential connectives, however, giving rise to different many-valued logics (in the sense of mathematical logics) that cannot be interpreted as probability, but can be interpreted as encoding vagueness.
    
    For example, consider Lukasiewicz logic. In this logic p ^ ~p is not necessarily false and p v ~p is not necessarily true. This runs counter-intuitive to an interpretation of this logic, since in probability theory p ^ ~p cannot be true. A person is either tall or not tall, with a probability degree for both, but when combining the two propositions, the resulting statement that the person is “tall and not tall” must be false, i.e. have probability 0. Allowing p ^ ~p =/= 0 is consistent with an interpretation of Lukasiewicz logic as encoding vagueness: we allow objects to be compatible w.r.t. a concept to a certain degree. In this way a person can be both “tall” and “not tall” to two given degrees.
    
    Concluding, since we can define many-valued logics that cannot be interpreted as encoding probability, but can be interpreted as encoding vagueness, fuzzy logic does not boil down to probability theory.
Tom Schenk — December 10, 2012 at 11:13 am

Eh, I’m in the same camp that he’s playing too fast and too loose here. I’ll use a different approach here by offering an analogous statement: “what’s the difference between philosophy and all of the natural sciences? None, except for terminology, specific goals, and culture.” True, but forgets that those are crucial aspects to the discipline. I think William makes the same generalization error.
zyxo — December 23, 2012 at 9:56 am

I invite you to look at my blog post I wrote more than a year ago (http://zyxo.wordpress.com/2011/07/10/is-reading-a-newspaper-data-mining/)
Merry Xmas and happy newyear!

The differences between machine learning, data mining, and statistics

Topic

FlowingData Delivered to Your Inbox

Related

7 Comments