Hadoop interview questions
When a company contacted me about the possibility of joining their Hadoop group, I naturally took the web to figure exactly what kinds of questions were asked for Hadoop related interviews. The most unfortunate thing is that the point of contact was a recruiter who had limited knowledge of the technical aspects of the group. As it turned out, the job did not quite have a job description yet which was very odd and made it very difficult to polish up seeing as though I had not coded for more than 6 months.
The recruiter was very nice. I was impressed that he genuinely wanted to help me out which was a surprise to me because I had been looking for a job for more than 3 months and no one treated you with as much respect as he did. He mentioned that it would be nice to know JavaScript and PHP. I always try to be honest upfront and admitted to him that I had never coded JavaScript or PHP for Hadoop. He said it was okay since I would be coming at entry level and would get orientation.
Then the big day came. Someone else called me for the interview. He was one of the Hadoop engineers at the company and was a nice guy. To my surprise, they were really looking for a Hadoop developer with Java experience. I had worked with Java/Hadoop/MapReduce for my thesis and was somewhat comfortable with it but I was rusty as I had not written a single line of code in 6 months. Talk about being caught with your pants down.
These are the questions I was asked. I am recalling this from memory so it is a close approximation. He started off with easier questions:
(1) What is the main difference between Java and C++?
(2) What do you understand about Object Oriented Programming (OOP)? Use Java examples.
(3) What are the main differences between versions 1.5 and version 1.6 of Java?
(4) Describe what happens to a MapReduce job from submission to output?
(5) How would you tackle counting words in several text documents?
(6) In follow up to 4 5, how would you modify that solution to only count the number of unique words in all the documents?
(7) How would you tackle calculating the number of unique visitors for each hour by mining a huge Apache log? You can use post processing on the output of the MapReduce job.
These are the questions that I do recall from memory. I will add more if I think of them. I remember I was asked a couple Java questions and I remember drawing a blank. Very simple questions but I was caught off guard. If I had not prepared for this interview at all I will likely have done much better.
Unfortunately I did not get this job. I am happily employed now but working with Hadoop is one of my passions and I would take a Hadoop job in a heartbeat!
Feel free to add questions in the comment sections.
Category: Technology



So how would you answer the question number 6?
what is the most elegant way to count unique words for a particular document, where more documents in a directory are processed?
regards
andre
Without thinking about efficiency for now, the problem to number 6 is almost solved with the solution to question 5 (my bad for the typo which initially stated 4)
The output to question 5 would be something like:
(the,2)
(lazy,4)
(dog,7)
(jumped,6)
(over,5)
So basically we already have the list of unique words. To get the number of the unique words, we need a count of the keys, ie words.
As to how to do this for each document, I can’t think of the most elegant way to do this. But just from the top of my head, instead of just keeping track of individual words, we also need to keep track of the document where the word came from.
So, what I would do is use the document as the key:
(doc1,the,4)
(doc1,dog,7)
(doc2,the,5)
(doc2,dog,3)
Hopefully this may help you a bit.
I too have dream to
join CIA but lets see what happens
placement papers download recently posted..ICSE physics sample question papers for class 10
Hi
I have compiled list of hadoop questions.
http://hadoop-interview-questions.blogspot.com/
Aman recently posted..Hadoop Administrator Interview Questions Part 1
Please find 60 Hadoop Interview Question and Answer at follwoing link.
http://www.pappupass.com/class/index.php/technical-topic/hadoop-interview-question
jdk1.5 features are
1. generics are added.
2.Language support is been improved
3.annotations support
4. static import
5. auto boxing
and jkd 1.6 features.
1.java kernal update
2. support for window older verions
I have bloged about hadoop here.
Learn Basics of HDFS in Hadoop