Sr. Hadoop Developer Resume
Nashville, TN
SUMMARY:
- 12+ years of extensive IT experience with multinational clients this includes 4+ years of recent experience in Big Data/Hadoop Ecosystem.
- Hands - on experience in working on Apache Hadoop ecosystem components like Map-Reduce, Hive, Pig, SQOOP, Spark, Flume, Confidential, Kafka, Oozie and Zookeeper.
- Excellent knowledge on Hadoop Components such as HDFS, MapReduce and YARN programming paradigm.
- Experience with installation, configuration, supporting and managing of BigData and underlying infrastructure of Hadoop Cluster.
- Experience in analyzing data using HiveQL, Pig Latin and extending HIVE and PIG core functionality by using custom UDFs.
- Proficient in Relational Database Management Systems (RDBMS).
- Extensive working knowledge of Partitioned table, UDFs, Performance tuning, compression related properties in Hive.
- Good understanding of NoSQL databases and hands on experience in writing applications on NoSQL databases like Confidential .
- Hands on experience in using Amazon Web Services like EC2, EMR, RedShift, DynamoDB and S3.
- Hands on using Apache Kafka for tracking data ingestion to Hadoop cluster and implementing Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Experience in Spark Streaming to ingest data from multiple data sources into HDFS.
- Skillful Hands on Experience on Stream Processing including Storm and Spark streaming.
- Knowledge in job work-flow scheduling and monitoring tools like Oozie.
- Experience in analyzing data using Confidential and custom MapReduce programs in Java.
- Proficient in importing and exporting the data using SQOOP from HDFS to Relational Database systems and vice-versa.
- Excellent knowledge in data transformations using MapReduce, HIVE and Pig scripts for different file formats.
- Extensive experience in Spark/Scala,MapReduce MRv1 and MapReduce MRv2 (YARN) .
- Involved in importing Streaming data using FLUME to HDFS and analyzing using PIG and HIVE.
- Experience in using Flume for aggregating log data from web servers and dumping into HDFS.
- Experience in scheduling and monitoring Oozie workflows for parallel execution of jobs.
- Proficient in Core Java, Servlets, Hibernate, JDBC and Web Services.
- Experience in all Phases of Software Development Life Cycle (Analysis, Design, Development, Testing and Maintenance) using Waterfall and Agile methodologies.
- Experience in using Sequence files, AVRO file, Parquet file formats; Managing and reviewing Hadoop log files.
- Experience in Developing and maintaining applications on the AWS platform.
- Hands on experience in working with RESTful web services using JAX-RS and SOAP web services using JAX-WS.
TECHNICAL SKILLS:
BigData/ Hadoop Framework: Spark, HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume Confidential, Amazon AWS (EMR)
Databases: Cassandra, MySQL, Oracle
Languages: Java, Scala 2.11.0, Python2.7/3.x, Pig Latin, HiveQL
Web Technologies: JSP, Servlets, JavaBeans, JDBC, AWT
Operating Systems: CentOS, Ubuntu, Macintosh, Windows 10, Windows 90/00/NT/XP
Front-End: HTML/HTML 5, CSS3, JavaScript/JQuery
Development Tools: Microsoft SQL Studio, DbVisualizer, Eclipse, Intelij, MySQLWorkbench, Pycharm, Sublime, PL/SQL Developer
Reporting Tool: Tableau, SAP Business Objects
Office Tools: Microsoft Office Suite
Development Methodologies: Agile/Scrum, Waterfall
Other skills: Machine Learning, Internet of Things.
PROFESSIONAL EXPERIENCE:
Sr. Hadoop Developer
Confidential, Nashville, TN
Responsibilities:
- Developed Pyspark code to read data from Hive, group the fields and generate XML files.Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
- Implemented REST call to submit the generated CDAs to vendor website Implemented Impyla to support JDBC/ODBC connections for Hiveserver2
- Enhanced the Pyspark code to replace spark with Impyla. Performed installation for Impyla on the Edge node
- Evaluated performance of Spark application by testing on cluster deployment mode vs local mode
- Experimented submissions with Test OIDs to the vendor website
- Explored StreamSet Data collector Implemented StreamSets data collector tool for ingestion into Hadoop.
- Created a StreamSet pipeline to parse the file in XML format and convert to a format that is fed to Solr
- Built a data validation dashboard in Solr to be able to display the message record. Wrote shell script to run Sqoop job for bulk data ingestion from Oracle into Hive
- Created tables for the ingested data in Hive. Scheduled Oozie job for data ingestion for the Sqoop job
- Worked with JSON file format for StreamSets. Worked with Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
- Shell scripts to dump the data from MySQL to HDFS.
- Scala Script to load processed into DataStax Cassandra 4.8.
- Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Worked on Maven 3.3.9 for building and managing Java based projects. Hands - on experience with using Linux and HDFS shell commands. Worked on Kafka for message queuing solutions.
- Developing Unit Test Cases for Mapper, Reducer and Driver classes using MRUNIT.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Generated Java APIs for retrieval and analysis on No-SQL database such as Confidential and Cassandra. eveloped the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework
- Written Confidential Client program in Java and web services.
Environment: Sqoop, StreamSets,scala, Impyla, Pyspark, Solr, Oozie, Hive, Impala
Hadoop Developer
Confidential, Bethpage, NY
Responsibilities:
- Involved in architecture design, technical landscape design and validation.
- Performing data loading into ORC File Hive tables.
- Developed programs to perform data transformations using Pig, Spark and Python.
- Involved in environment set - up for off-shore team.
- Involved in historical data transfer from SAP HANA to Hadoop platform using Sqoop tool.
- Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
- Involved in data flow design in Hadoop technology, migration from Sap HANA.
- Developed Spark code using Scala for faster processing of data and have experience in Scala programming language and used it extensively with Spark for data processing.
- Developed programs/Scripts using Python and Unix shell script.
- Implemented changes for Sqoop to retrieve data from SAP HANA to Hadoop environment.
- Involved in creation of Hive databases and tables.
- Performed coding, Unit & Integration testing.
Environment: Hortonworks 2.5, Spark 2.1, Hive 1.2, Sqoop1.6, Scala2.1, Oozie4.2, Git, HDFS, Shell script.
Sr. Hadoop Administrator
Confidential, NJ
Responsibilities:
- Working for Confidential and financial services
- Providing hardware architectural guidance, planning and estimating cluster capacity, and creating roadmaps for Hadoop cluster deployment.
- Working on Hortonworks & Cloudera cluster setup
- Automate the repeated tasks using python.
- Having knowledge in shell and python programing language
- Installing, Configuring, Maintaining, and Troubleshooting Standalone systems
- Configuring and updating parameters in servers using python.
- Analyze the shell/python script code during the migrations.
- Troubleshooting the issues like job failures and performance issues
- Hadoop user administration using Sentry.
- Upgrade from CDH from 5.2 to 5.3
- Responsible to manage data coming from different sources.
- Knowledge on snapshots.
- User administration via Ldap, Kerberos mechanism.
- Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Adding new nodes to an existing cluster, recovering from a Name Node failure.
- Decommissioning and commissioning the Node on running cluster
- Supported Map Reduce Programs those are running on the cluster.
- Configured Fair Scheduler to provide service - level agreements for multiple users of a cluster.
- Managing nodes on Hadoop cluster connectivity and security
- Experienced in managing and reviewing Hadoop log files
- Maintaining Backup for name node.
- Knowledge in Name node recoveries from previous backups
- Importing and exporting data into HDFS using Sqoop
- Depth conceptual and functional understanding of Map Reduce and Hadoop eco-system Infrastructure (Both MRv1 and MRv2)
- Deploy & scale-out multi-node Hadoop cluster components including MapReduce, PIG, Hive, Confidential
- Design/implement workflow and coordinator jobs using Oozie tool
- Functional knowledge of flume, sqoop
- Experience in trouble shooting, optimization& performance tuning
- Experience in change management.
- Follow the functional spec analysis and develop ETL pipeline Develop MapReduce/PIG application to transform the data available in HDFS
- Generate reporting data using PIG/Hive to serve business team's ad-hoc requests
- Import data to Hive from RDBMS sources, process it, write resulted data to HDFS
- Cluster management, troubleshoot, share best practices to the team
- Schedule the jobs using Oozie workflow
- Built hadoop cluster from scratch in a "start small and scale quickly" approach
- Well versed with the security issues like Quotas, RBAC, ACL, setuid and sticky bit.
- Using Kerberos, LDAP, Rangers for Access identification management.
Jr. Hadoop Administrator
Confidential, Baltimore, MD
Responsibilities:
- Install and Manage HDP Hortonworks DL and DW components.
- Worked on Hadoop Hortonworks (HDP 2.6.0.2.2) distribution which managed services viz. HDFS, MapReduce2, Tez, Hive, Pig, Confidential, Sqoop, Flume, Spark, Ambari Metrics, Zookeeper, Falcon and oozie etc.) for 4 cluster ranges from LAB, DEV, QA to PROD contains nearly 350+ nodes with 7PB data.
- Monitor Hadoop cluster connectivity and security on Ambari monitoring system.
- Led the installation, configuration and deployment of product soft wares on new edge nodes that connect and contact Hadoop cluster for data acquisition.
- Rendered L3/L4 support services for BI users, Developers and Tableau team through Jira ticketing system.
- One of the key engineers in Aetna's HDP web engineering team, Confidential engineering ISE.
- Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately for the tickets raised by users in JIRA ticketing system
- Worked closely with System Administrators, BI analysts, developers, and key business leaders to establish SLAs and acceptable performance metrics for the Hadoop as a service offering.
- Performance Tuning and ETL, Agile Software Deployment, Team Building & Leadership, Engineering Management.
- Hortonworks Ambari, Apache Hadoop on Redhat, and Centos as data storage, retrieval, and processing systems.
- Setting up Kerberos principals in KDC server and testing HDFS, Hive, Pig and MapReduce access for the new users and creating key tabs for service ID's using keytab scripts.
- Performed a Major upgrade in production environment from HDP 2.3 to HDP 2.6. As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Installed OLAP software Atscale on its designated edge node server.
- Implemented dual data center set up for all Cassandra cluster. Performed much complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize.
- Conducted cluster sizing, tuning, and performance benchmarking on a multi - tenant OpenStack platform to achieve desired performance metrics.
- Good knowledge on providing solution to the users who encountered java exception and error problems while running the data models in SAS script and Rscript. Good understanding on forest data models.
- Worked on data ingestion on systems to pull data scooping from traditional RDBMS platforms such as Oracle, MySQL and Teradata to Hadoop cluster using automated ingestion scripts and also store data in NoSQL databases such as Confidential, Cassandra.
- Provided security and authentication with Kerberos which works on issuing Kerberos tickets to users.
- Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, Rstudio which provides GUI for developers/business users for day-to-day activities.
- Create queues and allocated the clusters resources to provide the priority for jobs in hive.
- Implementing the SFTP for the projects to transfer data SCP from External servers to servers. Experienced in managing and reviewing log files. Involved in scheduling Oozie workflow engine to run multiple Hive, sqoop and pig jobs.
Environment: CDH 5.4.3 and 4.x, Cloudera Manager CM 5.1.1, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Chef, Redhat/Centos 6.5, Control-M.
Spark/Hadoop Consultant
Confidential, Rensselaer, NY
Responsibilities:
- Designing the entire architecture of the data pipeline for analysis.
- Worked on Sqoop jobs to import data from Oracle and bring into HDFS.
- Scala Script to load processed into DataStax Cassandra 4.8.
- Performace tuning of Spark and Sqoop Job
- Developing parser and loader map reduce application to retrieve data from HDFS and store to Confidential and Hive.
- Map - Reduce Job to compare two files TSV and save the processed output into Oracle
- Hands on design and development of an application using Hive (UDF).
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query
- Provide support data analysts in running Pig and Hive queries.
- Transformed the ABintio Process into Hadoop using PIG and HIVE
- Created partitioned tables in Hive
- Created Reports using Tableau on HiveServer2.
- Worked on Data Modelling for Dimension and Fact tables in Hive Warehouse.
- Sceduling the jobs thorugh Walgreens EBS internal Scehduling System.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hortoworks Data Platform 2.3.4, Hadoop 2.7, Spark 1.4.1, Scala 2.10, SBT 0.13, Sqoop 1.4.6, Mapreduce, HDFS, Pig, Hive 0.13, Java, Oracle 11g, DataStaxCassandra 4.8,Centos, Windows, Python 3.0