We provide IT Staff Augmentation Services!

Hadoop Data Engineer Resume

2.00/5 (Submit Your Rating)

IL

SUMMARY

  • Senior Software Engineer having 8+ years of professional IT experience with 5+ years of expertise in Big Data Ecosystem in ingestion, storage, querying, processing and analysis of big data.
  • Extensive experience working in various key industry verticals includingBanking, Finance, Insurance, Healthcare and Retail.
  • Excellent exposure in understanding Big Data business requirements and providing them Hadoop based solutions
  • Have great Interests in Devloping and implementing new concepts in Distributed Computing technologies, Hadoop, Map - Reduce and NoSQL databases.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Significant expertise in implementing real-time Big Data systems using Hadoop Ecosystem tools likeHadoop Map Reduce, Spark HDFS, HIVE, PIG, Hbase,Pentaho, Zookeeper, Yarn, Sqoop, Kafka, Scala, Oozie, Flume.
  • Exposure of working on different big data distributions like Cloudera, Hortonworks, Apache etc.
  • Experience in usingAmazon Web Services (AWS) cloud. Performed Export and import of data into S3 and Amazon Redshift database.Working Knowledge on Azureby using Data Factory, Azure Resource Manager etc.
  • Expertise in designing scalable data stores using the Apache Cassandra NoSQL database.
  • Extensive experience in performing data processing and analysis using HiveQL, Pig Latin, custom Map Reduce programs in Java, Scala and python scripts in Spark and SparkSQL.
  • Working noledge with both Hadoopv1 and v2.
  • Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata, DB2 into HDFS and vice-versa using Sqoop.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experience in managing and reviewing Hadoop Log files.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Profound experience in creating real time datastreamingsolutions using ApacheSpark/SparkStreaming, Kafka.
  • Experience includes Requirements Gathering/Analysis, Design, Development, versioning, Integration, Documentation, Testing, Build and Deployment
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Successfully loaded files to Hive and HDFS from HBase
  • Experience in supporting analysts by administering and configuring HIVE
  • Experience in providing support to data analyst in running Pig and Hive queries.
  • Knowledge on Elastic search.
  • Worked on a POC using EMR (Elastic MapReduce).
  • Knowledge on graph database management system like neo4j.
  • Good noledge on Hadoop Cluster architecture and monitoring the cluster.
  • In-depth understanding of Data Structures and Algorithms.
  • Excellent shell scripts skills in Unix/Linux.
  • Involved in fixing bugs and unit testing with test cases using JUnit.
  • Excellent Java development skills using J2EE, J2SE, Junit, JSP, JDBC
  • Implemented Unit Testing using JUNIT testing and system testing during the projects.
  • Good experience in design the jobs and transformations and load the data sequentially & parallel for initial and incremental loads.
  • Always worked in teams and appreciated for collaboration and problem solving competencies.
  • Excellent verbal and written communication skills.
  • Well versed with project lifecycle documentation with specialization in development of robust reusable designs and patterns.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, MapReduce, Spark, HDFS, HBase,Cassandra,Sqoop, Pig, Hive, Oozie, Zookeeper,Yarn, Flume,Kafka, AVRO, Parquet, Cloudera

Operating Systems: Windows, Linux, Unix

Programming/Scripting Languages: C, Java, Python, Shell, R, SQL, JavaScript, Perl

Cloud: AWS EC2, S3, Redshift

Database: AWS Redshift, MongoDb, Teradata, Oracle, DB2

Business Intelligence: Business Objects, Tableau

Tools: & Utilities: Eclipse, Netbeans, Git, Svn,Win SCP, Putty, Autosys

Web technologies: HTML5, CSS, XML, Javascript

PROFESSIONAL EXPERIENCE

Confidential, IL

Hadoop Data Engineer

Responsibilities:

  • Designed the process and the required environment to load the data into the On Premise Data Lake and a business specified subset to the AWS Redshift Database
  • Worked on the Data Ingestion framework to load data from the Edge Node to the appropriate directories in HDFS
  • Built scripts in python running Spark Confidential the core to perform end to end data load and transformation, key components being, reconciliation for file validity, data quality and implementation of custom rules like filtering and transformation of data as per business requirements
  • Exploring with theSparkfor improving the performance and optimization of the existing algorithms in Hadoop usingSparkContext,Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Developed a Shell Script Master Wrapper to wrap the different code sets of the project, reducing multiple jobs to a single job schedule in Control M
  • Designed the shell script to be configurable to plug in and plug out reusable code components based on configuration parameters resulting in a reusable packaged framework
  • Incorporated logical exit points, error logging and restart ability in the shell script for a robust support and troubleshooting framework
  • Created Hive tables dynamically partitioned on Data Date to make the Raw data available to business users as a structure interface and storing files as Parquet with Snappy compression
  • Loaded subset of the data to AWS S3 and tan AWS Redshift Database to provide an RDMS kind of business user experience
  • Took Infrastructure team halp to understand AWS and get an EC2 instance running to push data from S3 to Redshift
  • Developed a Historical load strategy to load history data from the source system getting experience with Sqoop for load from Teradata to HDFS and Snowball for load to AWS Redshift
  • Designed numerousMapReduce jobs in java for data cleaning.
  • Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
  • Developed Java MapReduce program to extract the required information from the semi-structured files.
  • Created internal table, Externals tables in HIVE, and merged the data sets using Hive joins.
  • Responsible fordataextraction anddataintegration from differentdatasources intoHadoopDataLakeby creating ETL pipelines Using Spark, MapReduce and Hive.
  • Implemented scripts in Spark to import and exportdatafrom Cassandra, Teradata,Hadoop.
  • Used Flume to stream the log data from servers.
  • Used Apache Kafkato gather log data and fed into HDFS.
  • POC onSparkStreamingapplication for real time analytics.
  • Played fungible roles in the spirit of agile with primary as developer and secondary as tester working on system testing and UAT
  • Leveraged the companies Dev Ops process for CICD (Continuous Integration Continuous Deployment) by storing code in the GitHub Code repository and leveraging Jenkins and U-Deploy utilities to promote to QA and Prod environments
  • Active participant in all other housekeeping activities like Metadata Registry, Design Documentation, Test Cases, User Stories for Agile Story Board, and all team activities/session

Environment: Hadoop, Spark, MapReduce, Java, HDFS,HBase, Hive, CDH5, Python, Sqoop, UNIX, AWS EC2, S3, Redshift, GitHub

Confidential, NC

Hadoop Data Engineer

Responsibilities:

  • In conjunction to developer played an additional role of a Systems Integrator and captured requirements for the project and got involved with infrastructure setting up of EC2 instances to install Mongo drivers to extract data from the MongoDB
  • Got a detailed understandingof NoSQL databases with Mongo and understood the concept of key value pair storage
  • Researched on different ways to parse JSON data, the Mongo DB extract output.
  • Developed the process to extract from Mongo on an EC2 instance and SFTP data to Hadoop Data Lake
  • Created pig script to parse JSON data and added custom UDF to handle conversion of some hexadecimal fields to ASCII
  • Developed MapReduce programs to cleanse the data in HDFS to make it suitable for ingestion into Hive schema for analysis and also to perform business specific transformations, like conversion of data fields, validation of data, and other business logic
  • Loaded the final parsed and transformed data into Hive structures creating Avro files partitioned on load date time stamp
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Implemented a view based strategy to segregate sensitive data and assign different access roles to different views
  • Created a parallel branch to load the same data to Teradata using sqoop utilities
  • Experience in using Flume and Kafka to load the log data from multiple sources into HDFS.
  • Created a Tableau report on the Teradata solution to provide business with their day to day audit reporting needs
  • Worked with the System testers to manually query the data and do a source to target comparison to ensure data integrity
  • Prepared test plans and writing test cases
  • Worked with the business team to get them access to Hive and the Tableau reports along with UAT for the project.

Environment: Hadoop, Hive, Pig, MongoDB, UNIX, MapReduce, Kafka, Flume, CDH, SQL, Teradata, Tableau, AWS EC2

Confidential, TX

Hadoop/ Java developer

Responsibilities:

  • All the fact and dimension tables were imported from sql server into Hadoop using Sqoop.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and also logs data from servers.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Involving actively in Hadoop administration tasks.
  • Developed python UDFs in Pig and Hive
  • Used Apache Kafkato gather log data and fed into HDFS.
  • UsedOozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as JavaMapReduce, Hive and Sqoop as well as system specific jobs.
  • Installed and configured various components of Hadoop ecosystem and maintained their integrity.
  • Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Implemented automatic failover Zookeeper and zookeeper failover controller.
  • UsedOozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Developed Java map-reduce programs for to encapsulate transformations.
  • Developed Oozie workflows and sub workflows to orchestrate the Sqoop scripts, pig scripts, hive queries and the Oozie workflows are scheduled through Autosys.
  • Worked with DevOps team in Hadoop cluster planning and installation.
  • Participation in Performance tuning in database side, transformations, and jobs level.

Environment: Hadoop, HDFS, Map Reduce, Sqoop, Hive, Pig, Oozie, NDM, Cassandra, SVN, CDH4, Cloudera Manager, MySQL, Informatica, Eclipse

Confidential, OK

Java/J2EE Developer

Responsibilities:

  • Involved in various Software Development Life Cycle (SDLC) phases of the project like Requirement gathering, development, enhancements usingagilemethodologies.
  • Developed the user interface using Spring MVC, JSP, JSTL, Javascript. Custom Tags, Jquery, Html and CSS.
  • Used Spring MVC for implementing the Web layer of the application. This includes developing Controllers, Views and Validators.
  • Developed the service and domain layer using Spring Framework modules like Core-IOC, AOP.
  • Developed the Application Framework using Java, Spring, Hibernate and Log4J.
  • Created DB tables, functions, Joins and wrote prepared statements using SQL.
  • Configured Hibernate session factory in applicationcontext.xml to integrate Hibernate with Spring.
  • Configured ApplicationContext.xml in SPRING to adopt communication between Operations and their corresponding handlers.
  • Developed Spring rest controllers to handle json data and wrote dao’s and services to handle the data.
  • Created DB tables, functions, Joins and wrote prepared statements using PL/SQL.
  • Consumed and Create REST Web services using Springand Apache CXF.
  • For integration framework Apache Camel is used.
  • Developed MySQL stored procedures and triggers using SQL in order to calculate and update the tables to implement business logic.
  • Used Perlscripts for automation of deployments to Application server
  • Used Maven to build the application and deployed on JBoss Application Server.
  • Used intellij for development and JBossApplication Server for deploying the web application.
  • Performed usability testing for the application using JUnit Test.
  • Monitored the error logs using log4j.
  • Developed JUnit testing framework for Unit level testing.
  • Implemented Spring JMS message listeners with JMS queues for consumption of Asynchronous requests.
  • Used AOP concepts like aspect, join point, advice, point cut, target object and also AOP proxies.

Environment: Jdk1.6, HTML, Jsp, Spring, JBoss, log 4j, Perl, Tortoise SVN, Hibernate, SOAP web services, maven, SOAP UI, Eclipse Kepler, java script, Xml, Mysql v5

We'd love your feedback!