4 Interview Questions for Newcomers to Hadoop

The Hadoop market reached $256 million in vendor revenue during 2012 and is forecast to grow to nearly $1.7 billion in 2017, according to Wikibon, an open source advisory community based in Marlborough, Mass. So veteran developers, architects and data warehousing specialists are spending every spare moment learning the framework for storage and large-scale processing of data sets.

Hadoop ElephantIf you’re new to Hadoop and are interviewing for a Hadoop-heavy job, be ready to describe your hands-on experience with the framework, advises Jobin Chacko, senior associate, recruitment, for Synechron, an IT solutions firm based in New York City.

Click here for Hadoop jobs.

Chacko’s job is to determine whether a candidate has amassed enough practical experience to navigate the data and stringent security requirements of the financial services industry. Here are some of the questions he asks newcomers to Hadoop.

Have you worked on a go-live project or a prototype?

  • What Most People Say: “I’ve dabbled with Hadoop in my spare time.”
  • What You Should Say: “I had considerable experience as a data warehouse architect before taking classes to learn Hadoop. Then, to make sure I was ready to handle big data sets, I pulled massive amounts of historical data from the New York Stock Exchange and used the sample database to hone my analytical skills. I also used the data to create programs in MapReduce. You can see samples of my work by visiting my website.”
  • Why You Should Say It: If you’re going to hone your skills in a simulated environment, make sure it emulates what you’ll find in the real world, says Chacko. Real jobs require you to handle big, heavy data sets.

How many nodes can be in one cluster?

  • What Most People Say: “I would say no more than two to three nodes.”
  • What You Should Say: “Hadoop scales out nicely, so the load really depends on the structure and data warehouse configuration. Hadoop can easily handle 10 to 50 nodes.”
  • Why You Should Say It: Inspire confidence by showing that you understand Hadoop’s clusters and how to coordinate the parallel processing of data using MapReduce. Also, be sure to highlight your previous experience working with large data sets, even if it didn’t involve Hadoop.

Which NoSQL databases have you worked with?

  • What Most People Say: “I’ve worked with Cassandra.”
  • What You Should Say: “There are four categories of NoSQL databases. The first is key-values stores. I’ve used Redis, primarily when working with semi-structured data. The second is column value stores. I’ve used Cassandra when I needed scalability and high availability. The third is document databases. When I’ve needed to store and access semi-structured documents in formats like JSON, I’ve used CouchDB. Finally, there’s graph databases like InfiniteGraph.”
  • Why You Should Say It: Sometimes, professionals are told to work with an open source database simply because it’s cheap. Unfortunately, they’re not ready for prime time because they have no idea why they’re using it or which NoSQL database is more efficient for processing large quantities of structured, semi-structured or unstructured data.

Which tool have you used for monitoring nodes and clusters?

  • What Most People Say: “I haven’t used one.”
  • What You Should Say: “I’ve used Nagios for monitoring servers and switches. And I’ve used Ganglia for monitoring the entire grid.”
  • Why You Should Say It: “There are approximately 59 tools that can be used with Hadoop,” explains Chacko. “And not all of them can be used at the same time.”

“An experienced IT professional may think they’re qualified because they’ve worked with NoSQL or other databases,” says Chacko. “But when you start asking questions, you realize that they really don’t have hands-on experience with some of the most common tools. In fact, some of them don’t know Hadoop at all.”

Related Links

Comments

  1. BY Neal says:

    Basically chacko says to lie:)
    Be realistic. These things really started in 2010-2011
    Any app developer with say 5
    years of exp can adapt to bigdata tch easily.
    Not rocket science. Job advtmnts at crazy.
    They want an experienced candidate on spanking new
    Technology:)

  2. BY Rick K. says:

    I’d never heard of Hadoop until I had to start looking for a new job a few months ago. I was hoping the article would give me a little bit more background information, but it confused me more.

    I think I agree with the previous poster: we’re essentially advised to lie to the interviewer.

    One of the downsides of working for a large company’s IT department is one ends up being over-specialized, often seeing little to none of the new software available, due to contract or proprietary limitations.

  3. BY Steve says:

    Point 1: The title of this artivle is “4 Interview Questions for Newcomers to Hadoop”.

    Point 2: This is moronic: “If you’re new to Hadoop and are interviewing for a Hadoop-heavy job, be ready to describe your hands-on experience with the framework”.

    Given points 1 and 2, it seems to me that if you are a newcomer to Hadoop, that means you are applying for an entry-level job with an employer that is hiring for entry-level Hadoop. For an entry-level position, you should not be expected to answer any technical questions or be expected to know anything about the platform, and you CERTAINLY shouldn’t be expected to have any EXPERIENCE with the framework from which to describe.

  4. BY Sid says:

    the correct answer to “how many nodes can be in one cluster” is “how much power/cooling do you have and/or are willing to procure?” bonus points for “oh, BTW – how much RAM are you willing to buy for the name node?” I built/deployed my 1st 11 node, 22TB cluster in late 2009 & was embarrassed that it was that small…

  5. BY al says:

    All three comments are spot on IMO. There’s a chicken/egg problem here: how to you acquire experience if being hired requires experience? the stock market example is nice….but my interviews suggest employers want experience in their industry, using their preferred toolsets.

    as always….there is a part of the decision-space that is left out: technical knowledge is only one requirement. there is domain knowledge (e..g, ops, security, network), there is statistics/analytics knowledge, there is business knowledge. too many companies, IMO, expect shrink-wrapped employees fresh off the “we have all skills” assembly line. they do not see the growth path. the value function is very myopic.

Post a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>