This is our first project in big data.In this project we worked on maintaining the students details in an institute like whether they have taken demo of the course or not, if the students have taken the demo how many are prefering to join the course and we worked on many use cases.Now i will explain the data flow of this project. First we have downloaded the data from data hub , after downloading we have loaded the data from LFS to HDFS , then we have mentioned the path in the hive database to load the data and then worked onn some use cases.
Hive, Sqoop, HDFS, MySQL
Collected the data and selected the format. Loaded the data from LFS to text table in hive. Data flow(MySQL->LFS->HDFS->Hive). Accessed the data from text to ORC table. Implying ACID properties using ORC. Performed CURD operations on the database using hive in CDH5. Used Partitioning and bucketing for performance tuning. Implemented the use cases on the data. Implemented queries on the database using hive. Deployed project details on Github
This project uses the following license: MIT License