We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Lead Resume

4.00/5 (Submit Your Rating)

Kansas City, MO

PROFESSIONAL SUMMARY

  • 8+ years of professional experience in IT in Analysis, Testing, Documentation, Deployment, Integration, and Maintenance of web based and Client/Server applications.
  • Qualified Hadoop developerwith experience in Hadoop, database management system architecture, Java core, Testing and Implementing Big Data.
  • Good experience in developing and implementing big data solutions and data mining applications on Hadoop using HDFS, Map Reduce, Hbase, Pig, Hive, Sqoop, Flume, Kafka, Nifi, Strom, Spark, Oozie, Zookeeper.
  • Experience in developing Map Reduce Programs using Apache Hadoopfor analyzing the big data as per requirement.
  • Experience in importing and exporting the data using Sqoop and Flume from HDFS to Relational Database System and vice - versa
  • Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Knowledge on architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
  • Expertise in Real time data ingestion into HBASE and HIVE using Storm.
  • Good experience in loading unstructured data into HDFS using Flume/Kafka.
  • Excellent experience in dealing with Compression Codecs like Snappy, Gzip.
  • Expertise in managing and reviewing Hadoop Log files.
  • Hands on experience in in-memory data processing with Apache Spark.
  • Extensively used ETL methodology for supporting Data Extraction, transformations and loading processing, using Hadoop.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Excellent communication skills, interpersonal skills, problem solving skills, a very good team player along with extremely strong positive attitude.
  • Strong programming experience in creating procedures, functions and packages and other database objects using SQL & PL/SQL.
  • Involved in designing and deploying a multitude application utilizing almost all AWS stack (EC2, S3, Lambda, Glue).

TECHNICAL SKILLS

Hadoop Environment: Hortonworks, Cloudera Hadoop Distibutions.

Hadoop/Big Data: Hadoop 1.x/2.x (Yarn), HDFS, Map Reduce, Spark, Hive, Zookeeper, Oozie, Pig, Sqoop, Flume, Kafka, Parquet, Athena, Amazon DynamoDB, Amazon S3, AWS Lambda, API gateway, CloudWatch, redshift spectrum, Amazon Ec2, Aws Glue .

Development Tools: Eclipse, SQL Developer, Microsoft Suite (Word, Excel, PowerPoint, Access)., Reports 6i/10g & XML Publisher

Programming/Scripting Languages: Java, SQL, Unix Shell Scripting, Python.

Databases: Oracle 11g,10g,9i, MySQL, PL/SQL, SQL Server 2005,2008 & DB2

NoSQL Databases: HBase, Cassandra, Mongo DB, Postgres.

ETL: Informatica, Talend, Pentaho.

Web Tools: HTML, JavaScript, XML,XSL,DOM

Methodologies: Agile/ Scrum, Waterfall

Operating Systems: Windows 98/2000/XP/Vista/7/8, 10, Macintosh, Unix, Linux and Solaris.

Monitoring & Reporting Tools: Custom shell reports

PROFESSIONAL EXPERIENCE

Sr. Hadoop Developer Lead

Confidential, Kansas City, MO

Responsibilities:

  • Co-Coordinate between the Business and the Off-shore team.
  • Work closely with the business and analytics team in gathering the system requirements.
  • Supported code/design analysis, strategy development and project planning.
  • Build Workload Management & Resource Management capability to handle multiple job execution competing resources on Hadoop Cluster. Build memory and compute management pool for application using YARN. Build queueing, throttling process of jobs running on Spark, Hive and Impala using python
  • Collaborate with Hadoop admins on configuration changes for Big Data Technology Stack-Spark, Hive, Impala, open source systems and packages installations, performance tuning of common services and Hadoop cluster and code migration activities from lower to higher environment.
  • Build real time streaming process to log the data quality issues using Kafka and Spark Streaming.
  • Perform parsing, cleaning join and map functions to integrate data from metadata excel spreadsheets provided by DST Data Management Team.
  • Define Configuration Metadata for control validation, delta and other core services required for ingestion into data lake
  • Perform semantic build to load data to Data Marts, build alerts with reference data to run compliance tests.
  • Migrate different applications to the EDP platform and retro-fitting it to utilize core services like data quality, job/plan orchestration, auditing/logging framework.
  • Create Spark based template for business rules processing and integration with surrogate key and CDC key process for loading into the Retirement Data Mart.
  • Develop technical documents and handbooks to accurately represent application design and code.
  • Build, test and support of DST Enterprise Data Platform (estimate 30 core services).
  • Developing MapReduce programs to parse the raw data and store the refined data in tables.
  • Injecting, analyzing, processing the data and storing results into HDFS, Hive/HBase using Sqoop.
  • Responsible for managing data from various sources and their metadata using Hive.
  • Working with Hive for partitioning, bucketing of data to improve the performance of data from different kind of data sources.
  • Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Altered existing Scala programs to enhance performance and obtain partitioned results.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Developing Spark code in Scala and Spark-SQL environment for faster testing and processing of data.
  • Developed Spark code to process data from different sources and store into Hive/Hbase (Data is pre-processed and stores in HDFS using NiFi before spark consumption).
  • Exporting analyzed data to the relational databases using Sqoop for virtualization and to generate reports for the BI team.
  • Involved in loading data into Cassandra NoSQL Database
  • Working with Oozie to automate the flow of jobs and coordination in the cluster respectively.
  • Developing AWS Glue Job for the entire ETL process and triggering Using AWS Lambda Function based on events.
  • Deployed a static website using Amazon S3 for hosting, Lambda functions as backend server processing, DynamoDB for database and API Gateway for RESTful endpoints.
  • Created API using API Gateway console and set up API Gateway method requests.
  • Connected to AWS EC2 using SSH and ran spark-submit jobs.
  • Set up an Integrated response to the API Gateway requests from the backend (Dynamodb) to client using Lambda functions.

Environment: Cloudera, Hadoop 0.20.2, Hive, HBase, Apache Sqoop, Scala, PIG, Spark, Nifi, Oozie, Cassandra, Cloudera manager, GIT, Linux/Unix, NoSQL, Shell Scripting, AWS Glue, AWS Lambda, DynamoDB.

Hadoop Developer

Confidential

Responsibilities:

  • Responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Experience in creating Hive tables and loading and analyzing data using hive queries.
  • Converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Developed various complex Mapper and Reduce transformations for Pentaho MapReduce Jobs.
  • Experience with batch processing of data sources using Apache Spark.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Worked with different Hadoop Ecosystems such as Hive, Impala, HBase, Pig, Sqoop, Spark, Kafka using Pentaho.
  • Developed RESTFUL API's using Python, Java and generated Model Scoring results in JSON.
  • Set up automated processes to archive/clean the unwanted data on the cluster, in particular on HDFS and Local file system.
  • Performed Major and Minor upgrades in production environment and followed standard Back up policies to make sure the high availability of cluster.
  • Involved in designing and developing the Data Models in MongoDB based on the data from existing Customer Central Application.
  • Worked on Amazon Web Services (AWS) Cloud services like S3, EBS, EMR, RDS, VPC, and IAM.
  • Developed API for using AWS Lambda to manage the servers and run the code in the AWS.
  • Setting up databases in AWS using RDS, storage using S3 bucket and configuring instance backups to S3 bucket.
  • Worked with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN).
  • Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
  • Experienced in working on DevOps/Agile operations process and tools area (Code review, unit test automation, Build & Release automation, Environment, Service, Incident and Change Management).
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Involved in efficiently collecting and aggregating large amounts of streaming log data into the Hadoop cluster using Apache Flume.
  • Used Kafka for reading log data and send it to respective Storm bolts through respective stream.
  • Custom shell scripts for automating redundant tasks on the cluster.
  • Worked on GIT for version control, JIRA for project tracking.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of sources.
  • Responsible to check-in the developed code into Harvest for release management which is a part of CI/CD.
  • Experienced on setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line.
  • Migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Effectively used Sqoop to transfer data between databases and HDFS.
  • Extracted the data from SQL Server into HDFS using Sqoop.
  • Understanding and designing of Partitions, Bucketing concepts in Hive.
  • Scheduling and managing jobs on aHadoopcluster using Oozie work flow.
  • Analyzed the data by performing Hive SQL queries (HiveQL or HQL) and running Pig Latin scripts to study customer behavior.
  • Commissioning and Decommissioning of Data nodes in case of problems.
  • Developing and running MapReduce jobs on YARN andHadoopclusters to produce daily and monthly reports as per user's need.
  • Involved with study of Functional Design Document (MD050) and creating Technical Design Document (MD070).
  • Designed and developed an error-handling Package to store errors in custom tables using PL/SQL in Oracle HRMS APIs.
  • Worked on Order Import Interface Program and to move the data from the legacy system to the new system
  • Designed, documented MD070, CV040 documents, developed Conversion routines, staging tables, PLSQL API’s for converting Employees, Assignments, Jobs, Payroll.
  • Designed Training Documents for users.
  • Worked on performance tuning and testing of the Oracle PL/SQL database Packages, procedures and triggers of various applications.
  • Creating S3 buckets and managing policies for S3 buckets and utilized S3 bucket and Glacier for storage and backup on AWS.
  • Conducted POC to implement serverless cloud architecture using AWS Lambda, Glue and S3.

Environment: Hortonworks, Oozie, Hive, Pig, Sqoop, Perl, Spark, HBase, AWS EMR, AWS Lambda, Zookeeper, Scala, Python, Grafana, Ganglia, NoSQL, MongoDB, Shell Scripting, Pentaho, Linux/Unix, Storm, Kafka, MySQL, Oracle, SQL Server, GIT, Oracle Applications R12 (OM, INV, BOM, HR), SQL*Loader, PL/SQL, TOAD, PUTTY, Winscp, Amazon S3.

Technical Consultant

Confidential

Responsibilities:

  • Involved in development of reports based on business requirements and registered them in Oracle applications.
  • Developed PL/SQL queries, Triggers, Cursors, and Stored Procedures as a part of Application development.
  • For Inbound messages when the data is inserted into Interface tables, run the API to load the data into original tables. Schedule this task.
  • Created a new process flow for Outbound PO Acknowledgement as needed.
  • Error and exception handling in XML gateway map and Oracle process.
  • Creating Sales Orders and Purchase Orders from Oracle Receivables and Purchasing modules.
  • Setting up the trading partner with seeded transaction details and with supplier details for both inbound and outbound messages
  • Setup the mailing and EDI details of the receiver and receiving server.
  • Also customize the mapping files and DTD files and copy them into the server using FTP.
  • Worked on PO Requisition workflow, Customized as per the client requirement.
  • Worked on Journal Batch workflow, Customized as per the client requirement.
  • Responsible for creating a large number of custom reports for Oracle Financials in a short period of time in order to meet an aggressive deadline.
  • Defined value sets, concurrent programs, request sets, registering reports and PL/SQL Procedures, and Packages.
  • Involved in the development of interfaces/ Conversions to import the data from legacy system into oracle application base tables.
  • Worked on UTL PACKAGE and SQL*LOADER for the interfaces and conversions.
  • Worked on the RDBMS system using PL/SQL to create packages, procedures, functions, triggers as per the business requirements.
  • Writing complex packages, procedures and functions. Executing concurrent programs from PL/SQL.
  • Creating SR in metal ink as per client requirement and setup as per Oracle supporting people.
  • Developed Interface Programs to upload legacy data into oracle apps base tables.
Environment: Oracle 9i, Oracle Applications Release 12.1, (AP AR GL OM INV) Oracle XML Gateway, Oracle Workflows, PL/SQL, UNIX Shell scripting, UNIX, Business Objects, Windows NTEducation:Bachelor’s in Technology from Jawaharlal Nehru Technological University, India.

We'd love your feedback!