Expertise in setting up fully distributed multi node Hadoop clusters, with Apache & Cloudera.
Administration of Hadoop Cluster and managing request for Performance management based on sample dataset available. Capacity planning of cluster from the available data set.
Strong Linux Administration to tune the nodes as per the behavior of the application jobs that are running by end users.
Good knowledge in installing, configuring and using ecosystem components like Hadoop, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Zookeeper, Kafka, NameNode Recovery and HDFS High Availability using Cloudera Manager and Ambari.
Extensive experience on performing administration, configuration management, monitoring, debugging in Hadoop Clusters.
Good knowledge in Import/Export structured, un-structured data from various data sources such as RDBMS, Event logs, Message queues into HDFS, using a variety of tools such as Sqoop, Flume etc..
Hands on experience in resolving complex technical issues like recovery of nodes, Maintenance of Hadoop configuration files across the cluster nodes.
High availability, BAR and DR strategies and principles for Cloudera BDR Cluster.
Experience on setting up Cluster and configuring Multimode Hadoop Cluster on various Linux Platforms.
Should be able to integrate different Hadoop distributions such as CDH, HortonWorks, and Apache Hadoop etc.
Experience on CDH components like HDFS, Sqoop, Sqoop2, Pig, Hive, Zookeeper, Hbase, Oozie, Impala, Hue etc.
Experience on YARN, MapReduce (MRv1), YARN (MRv2) and Spark.
Experience on Cluster maintenance tasks such as Add, remove, and rebalance nodes in a cluster using cluster management tools like Cloudera Manager & Apache Hadoop.
Configuring High Availability (HA) using Cloudera Manager and High Availability for Other CDH Components.