We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

Overland Park, KS

SUMMARY:

  • Senior Hadoop Developer with 7 + years of programming and software development experience with skills in data analysis, design, development, testing and deployment of software systems from development stage to production stage in Big Data and Java technologies.
  • Experience in Big Data and tools in Hadoop Ecosystem including Pig, Hive, Sqoop, Oozie, Zookeeper, Pyspark and Flume.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Expert in creating PIG Latin Scripts and UDFs using JAVA for analysis of data efficiently.
  • Expert in creating Hive Queries and UDFs using Java for analysis of data efficiently.
  • Knowledge of Hadoop GEN2 Federation, High Availability and YARN architecture .
  • Expert in using Sqoop for fetching data from different systems and HDFS to analyze in HDFS, and again putting it back to the previous system for further processing.
  • Also used HBase in accordance with PIG/Hive as and when required for real time low latency queries.
  • Hands on experience on operating system internals in multithreaded environment using Inter Process Communications and deep knowledge of UNIX operating system internals and working knowledge of Linux software distributions and High Performance Computing (HPC).
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
  • Excellent knowledge on Hadoop Ecosystem Architecture and components such as Hadoop Distributes File System (HDFS), MRv1, MRv2, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming.
  • Good experience in optimizing MapReduce algorithms using Mappers, Reducers, Combiners and Partitioners to deliver best results for the large dataset.
  • Experience is Design, Develop, Publish and Schedule of reports using Microsoft Power BI.
  • Used Flume to process real time processing data.
  • Good understanding of NoSQL Databases.
  • Worked in Windows, UNIX/Linux platform with different technologies such as SQL, PL/SQL, XML, HTML, CSS, Java Script, Core Java etc.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Cloudera and AWS.
  • Experience in using IDEs like Eclipse and NetBeans.
  • Extensive programming experience in developing web based applications using Core Java, J2EE, JSP and JDBC.
  • Experience in deploying applications in Web/Application Servers like Tomcat, WebLogic and Oracle Application Servers.
  • Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
  • Extensive experience with Waterfall and Agile Scrum Methodologies.
  • Experience in development of logging standards and mechanism based on Log4J
  • Developed UML Diagrams for Object Oriented Design: Use Cases, Sequence Diagrams and Class Diagrams using Visual.
  • Working knowledge of database such as Oracle 8i/9i/10g, Microsoft SQL Server.
  • Experienced in creating Product Documentation & Presentations.
  • Strong expertise on MapReduce programming model with XML, JASON, CSV file formats.
  • Extensive experience worked on CVS, Clear Case, GIT and SVN for Source Controlling.
  • Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
  • Understanding of Data warehouse and ETL tools.
  • Highly proficient in Object Oriented Programming concepts.
  • Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result - oriented problem solving technique and leadership skills.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: Hadoop 2.x, HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Cloudera, Hortonworks, MapR, Oozie, Avro, YARN, Storm and Zookeeper.

Programming Languages: Java, C, C++, SQL, PLSQL

Scripting/Web Technologies: JavaScript, HTML, XML, Shell Scripting, CSS, JSON.

Databases: Oracle 9i/10g/11g, MySQL, SQL Server and NoSQL

Operating Systems: Linux, UNIX and Windows.

Java IDE: Eclipse and NetBeans.

Visualization Tools: Crystal Reports, Tableau, Microsoft Power BI

PROFESSIONAL EXPERIENCE:

Confidential, Overland Park, KS

Senior Hadoop Developer

Responsibilities:

  • Worked on setting up a daily ingestion pipeline for Sqooping the records from SQL Server to Hadoop.
  • Automated the process to create Hive table schemas of several tables from the SQL Server to Hadoop.
  • Automated the process to perform Sqoop only when there are delta records found in the daily ingestion process.
  • Worked on improving the time taken for daily ingestion process by validating and performing the required Sqoop of Tables data where there are delta records.
  • Worked on tuning the performance of Hive Views using map side join where there is a possibility of joining a big table with a small table.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Created big flat tables by developing views which fetches a single record of all the possible necessary columns for the business requirement, also handled the daily delta records, updates, deletes that happen on each record.
  • Replicated the views in the SQL Server to Hadoop and built the process to run on daily scheduled basis using Spark SQL to load the data in dynamically partitioned external tables.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.
  • Used the big flat tables as a data source for visualizations and to generate reports using Power BI .
  • Published the business reports to Power BI report server, also scheduled the reports on daily basis to refresh the data automatically.
  • Developed data validation reports which visualizes the time taken and also validates the ETL run.
  • Analyze large and critical datasets using Hortonworks, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Spark.
  • Created the stored procedures in SQL Server for automatically updating the process flags corresponding to the primary keys after the Sqoop is completed.
  • Used cisco product Tidal for scheduling the shell scripts, call the procedures and send email alerts.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Stored the data in tabular formats using Hive tables and Hive SerDes.
  • Created Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • ­­­Responsible for running a process which reads data from the scanned documents and then performs the data cleansing by scrubbing of the PHI and sensitive data fields and encrypting them as per business requirement.
  • Prepared the large data sets for running R - Models.
  • Used Python for developing Spark code to run the hive views which will load the data warehouse tables on daily basis.
  • Also used Python for performing several file operations in searching keywords for finding out the most used prescription, generating calculated fields.
  • Worked on the data profiling with the 11 different data bases and gathered some useful columns for the business corresponding to the open claims.

Environment: Hadoop, HDFS, Pig, Sqoop, Oozie, MapReduce, Hortonworks, Power BI, Cisco Tidal, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat, SQL Server, GIT, Spark, Zeppelin, Jupyter Notebooks, Teradata SQL

Confidential, Chicago, IL

Senior Hadoop Developer

Responsibilities:

  • Developed and maintained the ingestion process to ingest data coming from SQL server to Hadoop using shell script.
  • Involved in copying data from old cluster to new development cluster.
  • Automated the Hadoop ETL process using Oozie workflows.
  • Responsible for Setup of Oozie workflow in the new development cluster.
  • Designed and implemented the efficient way of concurrent/parallel running workflows using Oozie, to reduce the ETL processing time, which helped the project deliverables with in time.
  • Re-designed the code to run in Map-Reduce using Hadoop streaming with Python.
  • Exported Hadoop ETL data to the relational databases using Sqoop for visualization as input data to the Tableau dashboards.
  • Created Hive partitioned external tables, for easy access to the Tableau using the partitioned columns.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using zip, tar etc..
  • Developed custom aggregate functions and business required column values using python
  • Used Pig on larger data sets in the deduping process and stored the data into HDFS.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Used Pig for Notes parsing the data and Store in HDFS and then export the data to Oracle database using Sqoop.
  • Stored the data in tabular formats using Hive tables and Hive SerDes.
  • Implemented a script to transmit information from Oracle to HDFS using Sqoop.
  • Worked on tuning the performance Hive queries, Pig queries.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Installed Oozie workflow engine to run multiple Hive, Hadoop Map-Reduce jobs with python streaming and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Assisted QA team in writing the Hive queries to test the data accuracy and test cases design.
  • Involved in the debugging of major defects for quicker resolutions.
  • Automated the onboarding of new hospitals data in Hadoop using Oozie workflows.
  • Responsible for delivering the weekly and monthly processed data for QA Team.

Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, MapR, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Worked on installing cluster, commissioning & decommissioning of DataNode, NameNode high availability, capacity planning, and slots configuration.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using LZO, Snappy, etc.
  • Analyze large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, & Spark.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Used Pig to store the data into HBase.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Used Pig to parse the data and Store in Avro format.
  • Stored the data in tabular formats using Hive tables and Hive SerDes.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Implemented a script to transmit information from Oracle to HBase using Sqoop.
  • Worked on tuning the performance Pig queries.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop, HDFS, Pig, Sqoop, Oozie, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Confidential, Rochester, MN

Hadoop Developer

Responsibilities:

  • Worked on writing transformer/mapping Map-Reduce pipelines using Java.
  • Handling structured and unstructured data and applying ETL processes.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
  • Designed and implemented Incremental Imports into Hive tables.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Extensively used Pig for data cleansing.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database system and vice-versa. Loading data into HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Experienced in managing and reviewing the Hadoop log files.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Developed scripts and automated data management from end to end and sync up between all the clusters.
  • Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.

Environment: Hadoop, Big Data, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera., PL/SQL, Windows, UNIX Shell Scripting, and Eclipse

Confidential, St. Louis, MO

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written multiple Map Reduce programs in Java for Data Analysis.
  • Wrote Map Reduce job using Pig Latin and Java API.
  • Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
  • Developed pig scripts for analyzing large data sets in the HDFS.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Designed and presented plan for POC on impala.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
  • Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Implemented Daily jobs that automate parallel tasks of loading the data into HDFS using Autosys and Oozie coordinator jobs.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Responsible for performing extensive data validation using Hive.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Used Kafka to load data in to HDFS and move data into NoSQL databases(Cassandra)
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in submitting and tracking Map Reduce jobs using Job Tracker.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Responsible for cleansing the data from source systems using Ab Initio components such as Join, Dedup, Sorted, De Normalize, Normalize, Reformat, Filter-by-Expression, Rollup.
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Teradata, Zookeeper, Autosys, HBase, Cassandra, Apache ignite

Confidential

Implementation Engineer

Responsibilities:

  • Responsible for gathering the requirements of customizations requested by the clients.
  • Developed PL/SQL stored procedures, Functions, database triggers and created packages to access the database from front end screens.
  • Developing of Client Specific Customization Reports using Oracle SQL/PLSQL code for developing Crystal Reports.
  • Design and Development of business required reports using the Crystal Reports tool.
  • Automation of the product reports for the specified authorized users.
  • Documentation of reports and the users with the report parameters and inputs while scheduling the reports.
  • Worked on Performance Tuning for fine tuning the reports and work flow procedures.
  • Creation of indexes on big data tables so as to improve the performance while using the data for generation of the reports or moving the data to a different database.
  • Creation of Oracle External Tables, Views, Triggers, Procedures and Directories as per the requirement.
  • Testing and Integrating the Provisioning adapters with the Clients applications.
  • Scheduling the monthly, weekly, daily, hourly reports after the month end bill generation.
  • Scheduling the business process actions like disconnections, reconnections and renewals as per clients wish based upon the customers status.
  • Have Knowledge of Design and Schedule of Broadcaster reports for audit purpose.
  • Configuring the monthly Statement of Account reports to the customer’s personal email through the application.
  • Loading data into the relocation database using Toad Import utility and transferred the data by Export / Import Utility.
  • Helped in transferring the data during migration of the application to the latest version.
  • Helping the users in correction of data with a ticket tracking system of the issue and documenting the reason for error and the correction process.
  • Responsible for sending the periodic reports to the clients on the status of the issues reported.
  • Responsible in assisting various clients to install the patch releases and the product upgrades.
  • Identifying the alert mechanism in case of any service failures.
  • Helped the team by coordinating with different teams during the bill generations which usually happens during every month end.
  • Responsible for coordinating with various clients in analyzing and addressing the problems reported by them as per service level agreement.
  • Responsible for providing assistance in configuring complex business rules.

Environment: Oracle 10g, Crystal Reports, Windows Server 2006, Toad, SQL Developer, SQL * loader, .NET application Monitoring, UNIX/LINUX .

Hire Now