Cathy O’Neil on when there’s enough data to justify a data scientist in the workplace:
Too much to fit on an Excel spreadsheet. And it’s not just how much, it’s really about how high quality the data is; the best is for it to be clean and for it to not be public, or at least not generally used for the purpose that your business uses it for.
Even data that does fit in Excel can be examined more closely. Then again, if you only have that much data, your data scientist will get bored quickly.
I do not disagree with the article but I think it’s worth mentioning Actuaries have been doing this kind of work for decades, albeit in a somewhat specialized industry (insurance). What I like about the message is you do not have to be in the business of insurance to appreciate the need for people with strong skills working with data, who require huge quantities of data, special tools and programming expertise to bring value to an organization. Often, the skills blur lines with traditional IT work. I think sometimes that line is drawn too sharply, which can lead to expectations that IT can fill the “data scientist” roles. In practice I think it is best to separate these areas organizationally, but align them functionally.