Apache Hadoop Cloudera管理员培训
培训大纲:
1. Hadoop and HDFS
Why Hadoop?
HDFS
MapReduce
Hive, Pig, HBase, and Other Ecosystem Projects
2. Planning Your Hadoop Cluster
General Planning Considerations
Choosing the Right Hardware
Node Topologies
Choosing the Right Software
3. Deploying Your Cluster
Installing Hadoop
Using SCM Express for Easy Installation
Typical Configuration Parameters
Configuring Rack Awareness
Using Configuration Management Tools
4. Managing and Scheduling Jobs
Starting and Stopping MapReduce Jobs
FIFO Scheduler
Fair Scheduler
5. Cluster Maintenance
Checking HDFS with Fsck
Copying Data with Distcp
Rebalancing Cluster Nodes
Adding and Removing Cluster Nodes
Backup and Restore
Upgrading and Migrating
NameNode Metadata
6. Cluster Monitoring, Troubleshooting, and Optimizing
Hadoop Log Files
Using the NameNode and JobTracker Web UIs
Interpreting Job Logs
Monitoring with Ganglia
Other Monitoring Tools
General Optimization Tips
Benchmarking Your Cluster
7. Populating HDFS from External Sources
Using Sqoop
Using Flume
Best Practices for Data Ingestion
8. Installing and Managing Other Hadoop Projects
Hive
Pig
HBase
Metastore