ISBN : 978-81-945989-4-7
Category : Academic
Catalogue : Computer
ID : SB19983
5.0
Paperback
850.00
e Book
250.00
Pages : 350
Language : English
Big data analytics emerged as a revolution in the field of information technology. It is the ability of the organization to stay agile which gives it a competitive edge over its competitors. Data harvesting and data analytics enable the organization identify new opportunities which in turn results in efficient operations, leads to smarter business moves and higher business turnovers. All these issues are addressed by big data analytics and its initiatives. Chapter 7 deals with Flume model and configuration of Apache Flume agent. The topics in Flume which are more difficult to comprehend and assimilate, such as Interceptors, channel selectors, event serializers, sink processors are explored in greater detail with suitable working examples. The salient feature of the Chapter 7 is case study illustration on fetching Twitter data and storing it in HDFS, fetching data from sequence generator and netcat source. The chapter concludes with installation of Flume and hands-on lab sessions with Apache Flume. In the big data era, Apache Spark emerged to address the high latency associated with MapReduce Model which the Spark addresses using its own specialized fundamental data structure, Resilient Distributed Datasets (RDD). RDD is key to the high performance of Apache Spark. The in-memory computation supported by Spark makes it a language of choice for implementing machine learning algorithms, graph algorithms etc. which otherwise would involve high latency in MapReduce processing paradigm. Chapter 8 focuses on different components of Apache Spark and Spark deployment architectures. Programming with RDD is dealt with with special emphasis on Spark functional programming, RDD transformations and actions, chaining RDD transformations, Spark lazy evaluation. Step-by-step procedure for installing Spark ove Hadoop is described. The main highlight of the chapter is setting up of virtual multi node cluster using VMWare Workstation 15.0 and configuration of master and slave nod