Machine learning provides some statistical tool to ________ data.
Answers
Explanation:
Developers don’t know statistics and this is a huge problem.
Programmers don’t need to know and use statistical methods in order to develop software. Software engineering and computer science courses generally don’t include courses on statistics, let alone advanced statistical tests. As such, it is common for machine learning practitioners coming from the computer science or developer tradition to not know and not value statistical methods.
This is a problem given the pervasive use of statistical methods and statistical thinking in the preparation of data, evaluation of learned models, and all other steps in a predictive modeling project.
2. Practitioners Study The Wrong Stats
Often, machine learning practitioners cotton-on to the need for skills in statistics.
This might start with a need to better interpret descriptive statistics or data visualizations and may progress to the need to start using sophisticated hypothesis tests. The problem is, they don’t seek out the statistical information they need.
Instead, they try to read through a text book on statistics or work through the material for an undergraduate course on statistics.
This approach is slow, it’s boring, and it covers a breadth and depth of material on statistics that is beyond the needs of the machine learning practitioner.
3. Practitioners Study Stats The Wrong Way
It’s worse than this.
Regardless of the medium used to learn statistics, be it books, videos, or course material, machine learning practitioners study statistics the wrong way.
Because the material is intended for undergraduate students that need to pass a test, the material is focused on the theory, on proofs, on derivations. This is great for testing students but terrible for practitioners that need results.
Practitioners need methods that clearly state when they are appropriate and instruction on how to interpret the result.
They need code examples that they can use immediately on their project.
A Better Way into Statistics
I am frustrated at seeing practitioner after practitioner diving into statistics textbooks and online courses designed for undergraduate students and giving up.
The bottom-up approach is hard, especially if you already have a full time job.
Statistics is not only important to machine learning, but it is also a lot of fun, or can be if it is approached in the right way.
I want to help you see the field the way I see it: as just another set of tools we can harness on our journey toward machine learning mastery