For the benefit of community, Please feel free to add/request anything that hasnt been covered. Please remember this is beginners guide and not a expert level documentation.
- /Flume: contains notes and examples of apache flume
- /Hive: contains notes and examples of apache hive
- /MySQL: code sample containing peices to create db, create table and load data in mysql
- /Sqoop: contains notes and examples of import/export using sqoop
- /spark: contains notes,documentation, sample example(s) of spark APIs
- /exam: sample cca-175 exam questions and solutions (in solution branch)
- /problem1- complex data structure handling using hive. (exposure to Hive,create table,LOAD,named_struct,struct)
- /problem2- Stock data analysis. (exposure to : json file handing, SparkSQL,map,reduce,filter,join,groupByKey,keyBy,UDFs etc)
- /problem3- MovieLens database analysis
- /problem4- Lahman's baseball database analysis
- /problem5- Hortonworks certification sample. Total 10 tasks .
- /Tweeter- Tweeter data analysis
- /problem6- Retail database sample excercises