We provide IT Staff Augmentation Services!

Sr Hadoop Admin / Architect Resume

5.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • An Overall 9 years of IT experience which includes 6.5 Years of experience in Administering Hadoop Ecosystem.
  • Expertise in Big data technologies like Cloudera Manager, Cloudera Director, Pig, Hive, HBase, Phoenix, Oozie, Zookeeper, Sqoop, Storm, Flume, Zookeeper, Impala, Tez, Kafka and Spark with hands on experience in writing Map Reduce/YARN and Spark/Scala jobs.
  • Hands on experience migrating data from OnPrem systems to Azure Cloud Stroage accounts ADLS Gen1/Gen2, Blob Storage, Site Recovery, Geo - Replication etc.
  • Expertise in end to end setup including network groups, subnets, firewall, DNS, platform setup, security setup, monitoring setup etc.
  • Hands on experience in installating, ugrading, configuring, supporting and managing Enterprise versions of Cloudera Manager and CDH services.
  • Hands-on experience with Hadoop cluster Designing, Implementation, configuration, Administration, debugging, and performance tuning.
  • Good hands on Experience in Distributed Processing Framework of Hadoop 1 (Job Tracker, Task Tracker) and Hadoop 2 (Resource Manager, Node Manager, Application Master, Yarn Child).
  • More than 2 year of Experience in Tableau BI reporting tools & Tableau Dashboards Developments.
  • Hands on experience in configuring automation systems to allow for hands-off management of clusters via Puppet.
  • Good Experience in writing scripts using Bash and Python to automate day to day tasks and log rotation of multiple logs from web servers.
  • Experience in Monitoring Hadoop Cluster using tools like Nagios and Cloudera Manager.
  • Expertise in Phoenix-Hive integration setup. Hbase-Hive Mapping with Different Storage types.
  • Expertise in Cluster Security and Monitoring Setup includes LDAP, Sentry, and Kerberos.
  • Deep Knowledge of Cloudera Manager it includes all type of service upgrade, External DB interface, Recomissioning and Decommisiong of nodes .
  • Deep Knowledge of Data Migration, File System Check, Disaster Recovery Planning, Namenode Backup, Troubleshooting Inconsistencies and Data Replication.
  • Expert knowledge in setting up HDFS High Availability Using the Quorum Journal Manager and Shared Storage .
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in migrating the data using Sqoop from Hadoop to Relational Database System and vice-versa.
  • Extensive work experience in ETL processes consisting of data sourcing, data transformation, mapping and loading of data from multiple source systems into Data Warehouse.
  • Strong foundation in Programming, Debugging skills, developed modules which have met with client requirements & targets.
  • Expertise in Hadoop administration such as managing cluster, reviewing Hadoop log files.
  • Extensive Experience in developing test cases, performing Unit Testing and Integration Testing using source code management tools such as GIT, SVN, Perforce.
  • Good experience with Map Reduce performance optimization techniques to effective utilization of cluster resources.

TECHNICAL SKILLS

Operating Systems: Windows XP/7, UNIX and Linux.

Business tools: Tableau 8.X/9.X, Micro-Strategy, Business Objects XI R2, Informatica Power center, OLAP/OLTP, Dimension Modeling, Data Modeling, Microsoft SQL Server, PHP, Pentaho, Talend.

Big data tools: Hadoop, Cloudera Manager, Map Reduce 1.0/2.0, Pig, Hive, HBase, Sqoop, Oozie, Zookeeper, Avro, Kafka, Spark, Flume, Storm, Impala, Scala, Mahout, Hue.

Databases: DB2, MySQL, MS Access, MS SQL server, Teradata, Vertica, Aster n-Cluster, SSAS, Oracle, Presto-DB, Oracle Essbase.

Operating systems: Mac OS, Unix, Linux (Various Versions), RHEL 5.x, 6.x, CentOS, Windows 2003/7/8/8.1/10/ XP/Vista.

Version tools: Git, SVN, Perforce, Mercurial.

PROFESSIONAL EXPERIENCE

Confidential - Charlotte, NC

Sr Hadoop Admin / Architect

Responsibilities:

  • Worked on user access management, container creation, tenant creation and troubleshooting platform related issues.
  • Architected the complete data migration project from OnPrem to Azure Cloud.
  • Installed and configured production and development clusters on Azure Cloud with latest Cloudera Version CM/CDH 6.3.3.
  • Worked with developers in migrating code from lower CDH, Spark, Syncsort to latest versions.
  • Worked on overall security setup for platform which includes restricting the network traffic in tier2 and tier 3, Proxy Server setup, LDAP-Kerberos-AD integration setup, SSSD, SASL authentication, data encryption Confidential rest and in transit using TLS/SSL certificate (Self Signed and CA Signed).
  • Configured and Deployed all the security updates in platform including user authentication and authorization for all the CDH and CM Services.
  • Managed and created roles in Sentry and ACLs for HDFS directories.
  • Worked on daily BDR schedules and snapshotting the HDFS directories for disaster recovery purpose.
  • Worked on Cloudera Director setup for automating the provisioning of servers and deploying the CDH cluster in one go.
  • Upgraded all the OnPrem environments from CDH 5.15 to CDH 6.3.2 and moved data from Azure Storage Accounts ADLS Gen1/Gen2, Blob storage, Cold Storage.
  • Installed and restored all the External PostgreSQL DBs including Navigator Audit and Meta.
  • Worked on shell and ansible script creation for automating all the platform activities like user creation / deletion, db creation / deletion, cluster utilization report, user access report, SMTP alerts etc.
  • Enabled HA for NameNode, Resource Manager, Yarn Configuration and Hive Metastore Server.
  • Worked on Splunk, Flume and Kafka integration projects for ingesting Syslogs from the Splunk Heavy forwarders in HDFS.
  • Setting up quotas and replication factor for user/group directories to keep the disk usage and cluster resource consumption under control using HDFS quotas and Dynamic Resource Pool.

Confidential - Philadelphia, PA

Sr Hadoop Administrative

Responsibilities:

  • Installed and managed Hadoop production cluster with 50+ nodes with storage capacity of 10PB with Cloudera Manager and CDH services version 5.13.0.
  • Worked on setting up Data Lake for Xfinity Mobile Data all the way from Data Ingestion, Landing Zone, Staging Zone, ETL Frameworks and Analytics.
  • Worked on architecture design, data modeling, and implementation of Big Data platform and analytic applications for Confidential product - Xfinity Mobiles.
  • Managed all the user tickets related to cluster issues and handled them with perfection within SLA.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals, ACL Implementation, Service Accounts etc.
  • Expertise in Installing, Configuration and Managing RedHat Linux 4, 5, 6 & Centos Linux 6.5.
  • Developed Shell and Python scripts to automate the jobs, metadata backups, log purge etc.
  • Enabled HA for NameNode, Resource Manager, Yarn Configuration and Hive Metastore Server.
  • Worked on Flume Kafka and Kafka Spark integration to store live events and logs in HDFS.
  • Worked on setting automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
  • Worked on configuring Zookeeper to provide high availability and Cluster services coordination.
  • Setting up quotas and replication factor for user/group directories to keep the disk usage and cluster resource consumption under control using HDFS quotas and Dynamic Resource Pool.

Confidential - Menlo Park, CA

Sr Hadoop Administrative

Responsibilities:

  • Experience in implementing new cluster all together from scratch and done live data migration from the old cluster to the newly built one without affecting any running production jobs.
  • Excellent understanding of Hadoop Cluster security and implemented secure Hadoop cluster using Kerberos, Sentry and LDAP.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like HIVE, Impala, Zookeeper, Sqoop, Flume, Kafka, HUE, Spark on Cloudera Hadoop Distribution.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Resource Manager, Node Manager, Container, Yarn Chiled, Application Master and MapReduce concepts, combiners and partitioners.
  • Expertise in Installing, Configuration and Managing RedHat Linux 4, 5, 6 & Centos Linux 6.5.
  • Maintained 100+ node Hadoop clusters using Cloudera Hadoop Cluster CDH 5.8 using Cloudera Manager.
  • Setting up Kerberos principals in KDC server and testing HDFS, Hive, Pig and MapReduce access for the new users and creating key tabs for service ID's using key tab scripts.
  • Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP.
  • Worked on file system management and monitoring and Capacity planning.
  • User account management such as, adding, removing, or updating user account information, resetting passwords.
  • Worked on configuring Zookeeper to provide high availability and Cluster services coordination.

Confidential - Denver, Colorado

Sr Hadoop Administrative

Responsibilities:

  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
  • Created HBase data replication among the Production Environments.
  • Developed Monitoring Setup for Entire Cloudera Manager / CDH services.
  • Worked on Cluster Security using LDAP, Sentry and Kerberos.
  • Worked on Capacity Planning and performance tuning for the cluster.
  • Upgraded all the Services of Cloudera Manager and CDH to the latest version 5.X for Production and Development Environments.
  • Importing and Exporting of data from RDBMS to HDFS and vice versa using Sqoop
  • Worked on different POCs like Apache Phoenix Source Code breakdown to get the Hive Phoenix Integration, Hive - Hbase Mapping with Different Storage types and Formats includes Base64, MD5, Binary, ASCII, UTF etc.
  • Wrote Hive/Pig/Impala UDFs to pre-process the data for analysis
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Create Mapping Documents with business rules between Hadoop source and Reporting tools like Tableau, Microsoft SQL Server, PHP etc.
  • Dependency Setup between Hadoop jobs and ETL Jobs.
  • Created and Maintained user accounts, profiles, security, disk space and process monitoring graphs.
  • Generated Reports using Tableau report designer, MS Excel and Microsoft SQL Server.
  • Worked on minor and major upgrades, commissioning and decommissioning of data nodes on Hadoop cluster
  • Worked on Setting up high availability for major production cluster and designed automatic failover control using zookeeper and and quorum journal nodes.
  • Worked on troubleshooting production level issues in the cluster and its functionality.
  • Designed and implemented MapReduce-based large-scale parallel relation-learning system Setup and benchmarked Hadoop/HBase clusters for internal use

Confidential - Chicago, Illinois

Sr. Hadoop Administrator

Responsibilities:

  • Responsible for cluster maintenance, troubleshooting, manage data backups, review log files in multiple clusters
  • Installed and configured Spark ecosystem components (Spark SQL, Spark Streaming, MLlib or GraphX)
  • Cloudera Hadoop installation & configuration of multiple nodes using Cloudera Manager and CDH 4.X/5. X.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Provided security for Hadoop cluster Active Directory/LDAP, and TLS/SSL utilizations.
  • Build/Tune/Maintain Hive QL and Pig Scripts for user reporting.
  • Developed the PIG code for loading, filtering and storing the data.
  • Worked on commissioning, decommissioning and recovery of Data Nodes, Name Nodes, Region Server and Master Server.
  • Timely and reliable support for all production and development environment: deploy, upgrade, operate and troubleshoot Involved in loading data from UNIX to HDFS.
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Involved in creating Hive tables, loading data, and writing Hive queries.
  • Understanding the business Requirements and Technical Requirements.
  • Designed and published workbooks and dashboards using Tableau Dashboard/Server 6.X/7.X
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.

Confidential - Dallas, Texas

Sr. Hadoop Administrator

Responsibilities:

  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Worked on setting up name node high availability for major production cluster and designed Automatic failover control using zookeeper and quorum journal nodes.
  • Implemented Kerberos security authentication protocol for production cluster.
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Implemented generic export framework for moving data from HDFS to RDBMS and vice-versa.
  • Involved in running Hadoop jobs for processing millions of records of text data for batch and online processes by using Tuned/Modified SQL.
  • Worked on loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Responsible for processing unstructured data using Pig and Hive.
  • Adding nodes into the clusters & decommission nodes for maintenance.
  • Extensive experience in managing and reviewing Hadoop log files.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Created WebI reports with multiple data providers and synchronized the data using Merge Dimensions.

Confidential - Palo Alto, CA

Hadoop Administrator / Developer

Responsibilities:

  • Monitored cluster health status on daily basis, optimized/tuned system performance related configuration parameters.
  • Involved in the design and development of Data Warehouse.
  • Performed operating system installation, Hadoop version updates using automation tools.
  • Responsible for architecting Hadoop clusters translation of functional and technical requirements into detailed architecture and design.
  • Involved in loading data from UNIX file system to HDFS.
  • Extensively used Informatica Power Center 9.5/8.6.1 as ETL tool for developing the project.
  • Gathered the requirements from the client for the ETL Objects implementation Designed jobs to FTP the data, using FTP stage, from flat file source systems onto the Informatica UNIX server.
  • Interacted with business users on regular basis to consolidate and analyze the requirements and presented them with design results.
  • Involved in data visualization and provided the files required for the team by analyzing the data in hive and developed Pig scripts for advanced analytics on the data
  • Created many user-defined routines, functions, before/after subroutines which facilitated in implementing some of the complex logical solutions.
  • Worked on improving the performance by using various performance tuning strategies.
  • Migrated jobs from development to test and production environments.
  • Used Shell Scripts for loading, unloading, validating and records auditing purposes.
  • Used Aster UDFs to unload data from staging tables and client data for SCD which resided on Aster database.

Confidential 

Java / ETL Developer

Responsibilities:

  • Involved in Requirements analysis, design, and development and testing.
  • Involved in developing of Group portal and Member portal applications.
  • Developed customized reports and Unit Testing using JUNIT.
  • Used Java 1.7, spring, Hibernate, Oracle, to build the product suite.
  • Managed the evaluation of ETL and OLAP tools and recommended the most suitable solutions depending on business needs.
  • Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
  • Involved in moving all log files generated from various sources to HDFS for further processing.
  • Responsible for building projects in deployable files (WAR files and JAR files).
  • Coded Java Servlets to control and maintain the session state and handle user requests.
  • Wrote complex SQL queries and stored procedures.
  • Involved in development, and Testing, phases of the project by following agile methodology.
  • Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
  • Verified software errors and interacted with developers to resolve the technical issues.
  • Used Maven to build the J2EE application.
  • Involved in maintenance of different applications.

We'd love your feedback!