In data science there are a lot of methods that solve similar types of problems but how do you choose the right one?
I must admit I love tools. As a hobby, I like building and fixing things and have acquired many tools over the years. Some might wonder how many hammers do you need? How are they different? For some jobs, that specialized tool reduces the effort tremendously and is much more effective than a standard tool. The same can be said for algorithms and methods in Data Science.
If you have been educated as a data scientist and understand the underlying math and have experience using the methods, you will know what situations call for what methods. But if you are just starting your journey in data science or are a citizen data scientist (i.e. your day job is as a marketer) then the large toolbox of methods can be overwhelming. I'm an engineer by education and wanted to put some structure to helping me (and hopefully others) organize my toolbox so I have a place to start. I found a couple of excellent resources that help me frame my understanding... the Udacity nano degree "Predictive Analytics for Business" and the book "The Field Guide to Data Science" by Booz | Allen | Hamilton are 2 among the many useful resources I ran across. Building on those and other sources I created this Methodology Framework spreadsheet.
How to use the Methodology Framework
I've simplified this to make it easier to use and don't pretend to understand all the math behind the data science algorithms. This is a pragmatic approach to understanding what tools to use for the business problem you are trying to tackle.
Read the Data Science Methodology Framework from left to right.
Goal - Start with the type of business problem you are trying to tackle and your goal for the analysis. Note sometimes your problem will have multiple goals, you should break those apart when looking at this methodology Framework.
Describe - typical goals data analysts tackle... pulling data together to describe the status or the business and to draw some insights from it.
Discover - dives a little deeper into the patterns in the data to define groups (cohorts) and understand what fields are important for differentiating groups
Predict - uses the data to predict outcomes, how something will be classified, or what the future values will be, etc
Advise - Recommended course of action
Problem Characteristic - Next we look at characteristics of the problem. Do we need a certain type of output? Does that data type limit options, etc
Method / Process - Finally we have the recommended method or process to use for the business problem / goal
Notes - Links and details to give you more info about the suggested method or process
This method framework is as mentioned build on the work of others much smarter than me and is meant to help people on a learning journey similar to mine. Please help keep this up to date and add and update this Methodology Framework.
(In the comments please share suggestions on content updates and ideas on how to improve this)
Comments