Sr/lead Hadoop Platform Engineer Resume
SUMMARY
- Around 10 years of professional IT experience in all phases of Software Development Life Cycle which includes hands on experience in Data Analytics.
- 7 years experience installing, configuring, testing Hadoop ecosystem components in all distributions.
- 7+ years of comprehensive experience as a Techno - Functional Hadoop Data Analyst in Finance, Insurance, Health care and E-Commerce sectors.
- Excellent in writing business and system specifications, designing and developing Use case, Activity, Interaction (Sequence & Collaboration), and Case Diagrams.
- Capable of processing huge amounts of structured, semi-structured and unstructured data.
- Experience with Sequence files, AVRO, ORC, Parquet, HAR and JSON formats and compression.
- Expertise in Hive Query Language, debugging hive issues HIVE Security and Hadoop Security.
- Very good understanding on NoSQL databases like MongoDB, Cassandra and HBase.
- Used MAVEN and ANT for build automation. Skilled in creating work-flows using Oozie for cronjobs.
- Knowledge on ETL tool Talend to design workflows.
- Experienced in writing custom UDFs and UDAFs for extending Hive and Pig core functionality.
- Good at HBase related Architecture Design, like batch data analysis system or near real-time data.
- Experience in using various network protocols like HTTP, UDP, POP, FTP, TCP/IP, and SMTP.
- Familiar with various Hadoop distributions like Cloudera, Hortonworks, MapR, Pivotal and Apache.
- Experience in managing Hadoop clusters using Cloudera Manager and Pivotal Command Center.
- Installing and Monitoring the Hadoop cluster resources using Grafana, Ganglia and Nagios.
- Expertise in cluster coordination services through Zoo Keeper.
- Experience in designing and implementation of secure Hadoop cluster using Kerberos.
- Managing the cluster resources by implementing Fair scheduler and Capacity scheduler.
- Working knowledge with ETL application architecture, including data ingestion/transformation pipeline design, data modeling and data mining, machine learning, and advanced data processing.
- Experience in conducting GAP Analysis, SWOT analysis, Feasibility Analysis and ROI, Business Process Engineering by using Business Analysis tools.
- Experienced in implementing Puppet, Salt, Chef and used JIRA for Bug and issue tracking.
- Used bulk load HBase Api to load the created HFiles into HBase for faster access of large customer base without taking performance.
- Excellent analytical, problem solving, communication and interpersonal skills with ability to interact with individuals at all levels and can work as a part of a team as well as independently.
TECHNICAL SKILLS
Languages: C, JAVA, J2EE, JavaScript, SQL, R, HTML, XML, Shell Python
Hadoop Eco System: HDFS, MapReduce, YARN, Hadoop Streaming, PIG, HIVE, Impala, OozieMahout, Zookeeper, Sqoop, Flume, Avro, HAWQ, Apache Sentry Ganglia, Hue, Nagios, PCC, MR Unit, Cloudera manager, Medusa, SpotlightKafka, Puppet, Splunk, Salt, GIT, Kerberos, Pepperdata, AutomicApache Phoenix, Xmatters, Medusa, Spotlight, Druid, Tableau, Ranger, Atlas
Databases: MySQL, Oracle, MS-Access, SQL server, GreenplumHD
NoSQL: MongoDB, HBase
OS, Tools & Methodologies: Windows, UNIX, Linux (Ubuntu, Fedora), Mac OS, MS office 2010, NetbeansJIRA, Jenkins, Eclipse, Adobe Professional, Rational Rose (RR), MS Visio, AgileWaterfall, Scrum, RUP, RAD
Hadoop Distributions: MapR, Cloudera, PivotalHD, Hortonworks (HDP), Apache
PROFESSIONAL EXPERIENCE
Sr/Lead Hadoop Platform Engineer
Confidential
Responsibilities:
- Lead a team of 8 (4 onshore, 4 offshore) to onboard 12 LOB’s into our support model as part of L1/L2 initiative.
- Do thorough analysis along with Application teams to see why the jobs are missing SLA’s and come up with optimized solutions.
- Developed a python script to generate alerts to DE leads who own the workflows from Automic to track Business SLA’s.
- Used Medusa, Spotlight and xMatters for alerting and monitoring of Hadoop workflows.
- Leverage Pepperdata for providing useful insights to the end users in tuning their jobs.
- Worked with various teams to do Hive performance tuning.
- Worked with LOB heads to identify space consuming directories through druid SLA dashboard and maintaining the health of cluster.
- Developed custom dashboards in Tableau and Superset for monitoring purposes.
- Involved in capacity planning of the cluster and edge nodes of LOB’s.
- Used Atlas to create tag-based policies and implement them in Ranger for sensitive data access in Hive.
- Part of migration from on prem to GCP.
- Involved in jobs deployment cycle starting from Dev->Test->Prod.
- Imposed strict user quotas on the cluster.
Environment: HDP 2.6.5.0, Hadoop Yarn architecture, Ambari 2.6.2.2, Spark2, MySQL, Oracle, Atlas, Ranger, Tableau, Superset, Hive 1.2.1000, Hive LLAP, Medusa, Pepper data, Spotlight.
Sr. Hadoop Administrator
Confidential
Responsibilities:
- Lead 3 other Hadoop Admins and responsible for on-boarding applications on top of Hadoop infrastructure, Dev and Production environments, ensuring access assets are being created.
- Involved in Hadoop Cluster capacity planning and expansion with Managers, Director, VP, Business users and vendors cut across Dataservices & Data Nursery teams.
- Played part in on-boarding DAS and Dataplane services.
- Resolved tickets/escalations/incidents created in Service Now and JIRA, through root cause analysis, in adherence to SLA, quality, process & security standards to meet the business requirements.
- Implemented KDC high availability using kpropd
- Administered two HDP clusters (Development and Production) consisting of 20 nodes and 40 nodes.
- Implemented high availability for Ranger KMS and Ranger Admin.
- Automated Hive Stats collection script to support CBO for better query performance.
- Supported Administration of Talend and Kyvos as part of BI architecture on Hadoop.
- Expert in toggling between various parameters of YARN, Hive, Tez, MapReduce Spark and HDFS for better performance of cluster.
- Setup AWS POC environment to test different tools in Big Data which can fit our environment and for testing the tez and Mapreduce job performances.
- Lead the upgrade of Ambari and HDP stack from 2.6.0 to 2.6.5. Planned document created for major upgrade to 3.0 scheduled in Q3.
- Implemented disaster recovery cluster which has data transferring from active cluster through distcp in two intervals daily.
- Responsible for implementing Kerberos, creating service principals, user accounts, keytabs, & syncing with LDAP groups.
- Regular ongoing cluster maintenance, Health checks, commissioning Datanodes, Datanodes balancing.
- Perform Knowledge Transfer for offshore Hadoop L1 support including documentation on environments, monitoring requirements, access & communication process.
Environment: HDP 2.6.1/2.6.5 , Hadoop Yarn architecture, Ambari 2.5.1/2.6.2 , Spark2, MySQL, Ranger, Talend, Hive 1.4.2, Hive LLAP, Kyvos 5.0/5.5, Zeppelin.
Senior Hadoop Administrator
Confidential, Phoenix, AZ
Responsibilities:
- Performed patching and upgrades on all the 3 clusters (Development, Test and Production).
- Administration of 700+ node MapR cluster.
- Debugging user issues across various Hadoop Eco system tools like Hive, Spark, HBase, Kafka, Oozie and Zookeeper.
- Good knowledge of warden, node labels, MCS, CLDB, Storage pools, Volumes, NFS and Snapshots which are part of MapR Architecture.
- Implemented custom scripts in Puppet to mimic the properties of different set of nodes for future nodes.
- Implemented high availability for Resource Manager & Spark history server.
- Took care of daily alarms like cores, inodes, volume and disk failures.
- Commissioned and decommissioned queues as per the business requirements.
- Integrated the cluster with third party tools like Jethro, Zeppelin and Dr. Elephant.
- Performed cluster upgrade from MapR 5.2 to 5.2.2.
Environment: Cloak Hive, MapR 5.2.2, Spark 2.1, MySQL, MCS, Kafka, HBase 1.1.1, flume 1.6, hive 1.2.
Hadoop Analyst
Confidential
Responsibilities:
- Migrated data from MySQL to ETL then to Hadoop Data lake. Performed QC checks and distribution checks in this process.
- Lead the PayPal project (Xoom, Paydiant, Venmo, Braintree). Design the workflow for data migration.
- Identify KPI’s to build analytics over the datalake. Involved in client meetings to identify various issues and resolve them.
- Performed Kerberos Integration with Hadoop. Created principals and key tabs for various services.
- Authorized users to access only appropriate data in HDFS using Apache Sentry.
- Created Hive external tables on the data residing in HDFS and moved those tables into Parquet for other teams to work on the data.
- Write Shell scripts to remove the empty folders generated in HDFS, automatic table creation on daily basis in hive and managed HDFS Trash.
- Automated the workflow in UC4 to trigger the batch jobs of incremental data to existing tables.
- Collaborated with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Environment: Hive, Hadoop Yarn architecture, HDP 2.5.5, UC4, Spark SQL, MySQL, Ambari, Ranger.