HDFS |
38% |
- Describe the function of all Hadoop Daemons
- Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing.
- Identify current features of computing systems that motivate a system like Apache Hadoop.
- Classify major goals of HDFS Design
- Given a scenario, identify appropriate use case for HDFS Federation
- Identify components and daemon of an HDFS HA-Quorum cluster
- Analyze the role of HDFS security (Kerberos)
- Determine the best data serialization choice for a given scenario
- Describe file read and write paths
- Identify the commands to manipulate files in the Hadoop File System Shell.
|
|
MapReduce |
10% |
- Understand how to deploy MapReduce MapReduce v1 (MRv1)
- Understand how to deploy MapReduce v2 (MRv2 / YARN)
- Understand basic design strategy for MapReduce v2 (MRv2)
|
|
Hadoop Cluster Planning |
12% |
- Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster.
- Analyze the choices in selecting an OS
- Understand kernel tuning and disk swapping
- Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario
- Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O
- Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster
- Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario
|
|
Hadoop Cluster Installation and Administration |
17% |
- Given a scenario, identify how the cluster will handle disk and machine failures.
- Analyze a logging configuration and logging configuration file format.
- Understand the basics of Hadoop metrics and cluster health monitoring.
- Identify the function and purpose of available tools for cluster monitoring.
- Identify the function and purpose of available tools for managing the Apache Hadoop file system.
|
|
|
Resource Management |
06% |
- Understand the overall design goals of each of Hadoop schedulers.
- Given a scenario, determine how the FIFO Scheduler allocates cluster resources.
- Given a scenario, determine how the Fair Scheduler allocates cluster resources.
- Given a scenario, determine how the Capacity Scheduler allocates cluster resources
|
|
Monitoring and Logging |
12% |
- Understand the functions and features of Hadoop’s metric collection abilities
- Analyze the NameNode and JobTracker Web UIs
- Interpret a log4j configuration
- Understand how to monitor the Hadoop Daemons
- Identify and monitor CPU usage on master nodes
- Describe how to monitor swap and memory allocation on all nodes
- Identify how to view and manage Hadoop’s log files
- Interpret a log file
|
|
The Hadoop Ecosystem |
05% |
- Understand Ecosystem projects and what you need to do to deploy them on a cluster.
|
|