Subscribe via RSS Feed

Hadoop interview questions

When a company contacted me about the possibility of joining their Hadoop group, I naturally took the web to figure exactly what kinds of questions were asked for Hadoop related interviews. The most unfortunate thing is that the point of contact was a recruiter who had limited knowledge of the technical aspects of the group. As it turned out, the job did not quite have a job description yet which was very odd and made it very difficult to polish up seeing as though I had not coded for more than 6 months.

The recruiter was very nice. I was impressed that he genuinely wanted to help me out which was a surprise to me because I had been looking for a job for more than 3 months and no one treated you with as much respect as he did. He mentioned that it would be nice to know JavaScript and PHP. I always try to be honest upfront and admitted to him that I had never coded JavaScript or PHP for Hadoop. He said it was okay since I would be coming at entry level and would get orientation.

Then the big day came. Someone else called me for the interview. He was one of the Hadoop engineers at the company and was a nice guy. To my surprise, they were really looking for a Hadoop developer with Java experience. I had worked with Java/Hadoop/MapReduce for my thesis and was somewhat comfortable with it but I was rusty as I had not written a single line of code in 6 months. Talk about being caught with your pants down.

These are the questions I was asked. I am recalling this from memory so it is a close approximation. He started off with easier questions:

(1) What is the main difference between Java and C++?
(2) What do you understand about Object Oriented Programming (OOP)? Use Java examples.
(3) What are the main differences between versions 1.5 and version 1.6 of Java?
(4) Describe what happens to a MapReduce job from submission to output?
(5) How would you tackle counting words in several text documents?
(6) In follow up to 4 5, how would you modify that solution to only count the number of unique words in all the documents?
(7) How would you tackle calculating the number of unique visitors for each hour by mining a huge Apache log? You can use post processing on the output of the MapReduce job.

These are the questions that I do recall from memory. I will add more if I think of them. I remember I was asked a couple Java questions and I remember drawing a blank. Very simple questions but I was caught off guard. If I had not prepared for this interview at all I will likely have done much better.

Unfortunately I did not get this job. I am happily employed now but working with Hadoop is one of my passions and I would take a Hadoop job in a heartbeat!

Feel free to add questions in the comment sections.

Tags: , ,

Category: Technology

About the Author: Tinashe blogs mostly about technology. His interests include cloud computing, hadoop, data mining and pretty much any new exciting technologies that can change the world.

Comments (6)

Trackback URL | Comments RSS Feed

  1. Andre says:

    So how would you answer the question number 6?
    what is the most elegant way to count unique words for a particular document, where more documents in a directory are processed?

    regards
    andre

    • Tinashe says:

      Without thinking about efficiency for now, the problem to number 6 is almost solved with the solution to question 5 (my bad for the typo which initially stated 4)

      The output to question 5 would be something like:

      (the,2)
      (lazy,4)
      (dog,7)
      (jumped,6)
      (over,5)

      So basically we already have the list of unique words. To get the number of the unique words, we need a count of the keys, ie words.

      As to how to do this for each document, I can’t think of the most elegant way to do this. But just from the top of my head, instead of just keeping track of individual words, we also need to keep track of the document where the word came from.

      So, what I would do is use the document as the key:

      (doc1,the,4)
      (doc1,dog,7)
      (doc2,the,5)
      (doc2,dog,3)

      Hopefully this may help you a bit.

  2. I too have dream to
    join CIA but lets see what happens
    placement papers download recently posted..ICSE physics sample question papers for class 10

  3. jdk1.5 features are
    1. generics are added.
    2.Language support is been improved
    3.annotations support
    4. static import
    5. auto boxing
    and jkd 1.6 features.

    1.java kernal update
    2. support for window older verions

    I have bloged about hadoop here.
    Learn Basics of HDFS in Hadoop

Leave a Reply




If you want a picture to show with your comment, go get a Gravatar.

CommentLuv badge

This site uses KeywordLuv. Enter YourName@YourKeywords in the Name field to take advantage.