ansible playbook to deploy cloudera hadoop components to the cluster
-
Updated
Sep 8, 2018 - Shell
ansible playbook to deploy cloudera hadoop components to the cluster
Docker image for Cloudera Hadoop components (CDH5)
A quick and dirty CDH cluster skeleton using Docker for Testing
Getting Started with Hadoop and Big Data
💂♂️ Hadoop/MapReduce Streaming
Spark Benchmark suite to evaluate cluster configuration and compare the performance with other big data frameworks.
Otto-von-Guericke Universität Magdeburg - Big Data SoSe 2017
This is my final project for Data Engineer Expert course at Naya College.
This repository contains the TF-IDF score calculation for the documents in the Canterbury dataset for a user given search query
The goal of this programming assignment is to compute the PageRanks of an input set of hyperlinked Wikipedia documents using Hadoop MapReduce. The PageRank score of a web page serves as an indicator of the importance of the page. Many web search engines (e.g., Google) use PageRank scores in some form to rank user-submitted queries. The goals of …
This project creates a small local Hadoop cluster using Cloudera CDH and CentOS.
Data processing using docker containers, kafka, spark, and hadoop
This project analyzes 10 years of U.S. domestic airline data (~3GB) using Hadoop (Cloudera) and Hive for data processing. Power BI dashboards visualize key metrics like delays, on-time rates, air time, and diversions. The solution includes Hive queries, DAX measures, HDFS ingestion scripts, and year-wise insights with recommendations.
chatbot for hipchat (cloud or onpremise) that enables you to talk to your cloudera manager
Keywords network builder based on TF-IDF with the use of Hadoop platform
A qualification project for teaching as an assistant at SLC in the COMP6579001 Big Data Processing course.
Add a description, image, and links to the cloudera-hadoop topic page so that developers can more easily learn about it.
To associate your repository with the cloudera-hadoop topic, visit your repo's landing page and select "manage topics."