Big Data Will Need 1.5 Million Data Scientists

Big Data VisualizationBig Data efforts have a problem — There aren’t enough people out there who know how to take advantage of it. Consulting firm McKinsey projects “a need for 1.5 million additional managers and analysts in the United States who can ask the right questions and consume the results of the analysis of Big Data effectively.”

So what makes a good data scientist? The Wall Street Journal asked Hilary Mason, chief scientist for the URL shortening service bit.ly. She described three key characteristics:

They can take a data set and model it mathematically and understand the math required to build those models; they can actually do that, which means they have the engineering skills; they are someone who can find insights and tell stories from their data. That means asking the right questions, and that is usually the hardest piece.

Turning data into usable information is the toughest part of data science. Gathering data and putting it into charts is straightforward enough, but drawing conclusions from it and forging a plan for the future requires real brain power.

Since data science is a relatively new discipline, it isn’t often taught at the university level. So, companies have to home grow their own talent, and that’s not easy. Finding even one person with computer/database smarts who also has a strong business sense is a challenge. Finding 1.5 million of them will be really hard.

Comments

  1. BY Fred Bosick says:

    If companies come to the realization that technical people are as important as business people and pay accordingly, there won’t be a staffing problem.

    People other than Lumbergh deserve to have a Porsche.

    • BY Rob Schoenfeld says:

      Amen, People with MBA’s and CPA’s are good at balancing the books, but I don’t think that makes then the only choice for Senior Executive positions.

  2. BY ConfusedCountry says:

    If McKinsey says we need 1.5 million Data Scientists they must be right. According to my calculations I estimated 1.478 Million Data Scientists will be needed, but those guys at McKinsey are really smart and know best. I’d love to see their brilliant calculations so I can see exactly where I went wrong in my estimate.

    • BY MobileMath says:

      LOL! Wow ConfusedCountry, at least your calculations are a lot more accurate than mine. I estimated 1.468 Million. I must’ve forgotten to carry a 1, or something. :-)

  3. BY James Green says:

    So what companies are hiring? And are they will train American computer scientist on the job.

  4. BY Bubi says:

    When will people understand that companies such as McKinsey and KPMG and all these “consulting” businesses are not worth 2 cents.

    Stop paying attention to their crap.

  5. BY David says:

    This industry will have to bend on accepting only engineers and mathematicians. There are plenty of us from SA and DBA backgrounds that have exhaustive experience dealing with millions of data points, and who have the practical background and creativity necessary to ask interesting questions.
    Just do what every IT field does at its inception – look for smart, creative people that you would like to work with, and train them.

    This insistence that everybody come in the door with college calculus is not helping.

  6. BY Kim Scott says:

    Hilary Mason is not wrong, I appreciate light being shed on a troubling issue. However this issue it is not limited to “Big Data”, today we have granulated technology to its max, and students graduating from college, or even professionals in the field for 10 years focusing on a subset of new technologies alone, have no idea how to think out of the box, or see the big picture.

    And to make matters worse, the same individuals feel as if they are not accountable to ask the probing questions required to come up with great solutions. The only requirement is to get the job done, with minimal effort.

    And then just when we thought it could not get worse, IT Upper Management supports this new Corporate Structure, of Technical Silos’ anyone in the field for more than 15 years should have recognized the signs.

    So, as for “Big Data” goes, yep I can see how there is a lot of concern. This problem has crept in the day to day business of IT, and MDM and Large DW projects are also having the same issues. No one should be surprised.

    Kim Scott,
    kimscottit@optimum.net

  7. BY Pete says:

    These jobs can easily be filled, but whining about worker shortages is more fun.

  8. BY vincentg64 says:

    Can some of these jobs be performed by bots (computer programs)?

  9. BY Jörg G. Beyer says:

    I think the hype about Big Data is not thought out. Why collect tons of data in the first place and then think about what data is relevant. Now companies need to pay experts only to manage these tons of data.

    Big Data is not the gold nugget, but the river where you can look for gold. The more the merrier does not account for Big Data. More data means slower and more expensive processes. Businesses will have more advantages if they structure in advance what data is needed. Then you don’t need to hire experts only to deal with mostly irrelevant data.

    For an article on this topic, see my blog: http://bit.ly/KMqz1G

  10. BY jake says:

    The problem with “Data Scientist” is that it means different things to different people.

    E.g., in want ads, “Data Scientist” often means recently-minted computer “science” Ph.D. with very little real-world experience in coaxing information out of data. The ad is typically describing a production-programming job, with, oh yah / by the way – the candidate should know something about statistical modeling / machine learning. Shouldn’t real scientists focus on the development of algorithms and prototyping new products / algorithms instead of trying to be also-ran software engineers?

Post a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>