We provide IT Staff Augmentation Services!

Hadoop Data Engineer Resume

2.00/5 (Submit Your Rating)

IL

SUMMARY

  • Senior Software Engineer having 8+ years of professional IT experience wif 5+ years of expertise in Big Data Ecosystem in ingestion, storage, querying, processing and analysis of big data.
  • Extensive experience working in various key industry verticals includingBanking, Finance, Insurance, Healthcare and Retail.
  • Excellent exposure in understanding Big Data business requirements and providing them Hadoop based solutions
  • Have great Interests in Devloping and implementing new concepts in Distributed Computing technologies, Hadoop, Map - Reduce and NoSQL databases.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Significant expertise in implementing real-time Big Data systems using Hadoop Ecosystem tools likeHadoop Map Reduce, Spark HDFS, HIVE, PIG, Hbase,Pentaho, Zookeeper, Yarn, Sqoop, Kafka, Scala, Oozie, Flume.
  • Exposure of working on different big data distributions like Cloudera, Hortonworks, Apache etc.
  • Experience in usingAmazon Web Services (AWS) cloud. Performed Export and import of data into S3 and Amazon Redshift database.Working Knowledge on Azureby using Data Factory, Azure Resource Manager etc.
  • Expertise in designing scalable data stores using teh Apache Cassandra NoSQL database.
  • Extensive experience in performing data processing and analysis using HiveQL, Pig Latin, custom Map Reduce programs in Java, Scala and python scripts in Spark and SparkSQL.
  • Working noledge wif both Hadoopv1 and v2.
  • Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata, DB2 into HDFS and vice-versa using Sqoop.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experience in managing and reviewing Hadoop Log files.
  • Experience wif Oozie Workflow Engine in running workflow jobs wif actions dat run Hadoop Map/Reduce and Pig jobs.
  • Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark using Scala.
  • Profound experience in creating real time datastreamingsolutions using ApacheSpark/SparkStreaming, Kafka.
  • Experience includes Requirements Gathering/Analysis, Design, Development, versioning, Integration, Documentation, Testing, Build and Deployment
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Successfully loaded files to Hive and HDFS from HBase
  • Experience in supporting analysts by administering and configuring HIVE
  • Experience in providing support to data analyst in running Pig and Hive queries.
  • Knowledge on Elastic search.
  • Worked on a POC using EMR (Elastic MapReduce).
  • Knowledge on graph database management system like neo4j.
  • Good noledge on Hadoop Cluster architecture and monitoring teh cluster.
  • In-depth understanding of Data Structures and Algorithms.
  • Excellent shell scripts skills in Unix/Linux.
  • Involved in fixing bugs and unit testing wif test cases using JUnit.
  • Excellent Java development skills using J2EE, J2SE, Junit, JSP, JDBC
  • Implemented Unit Testing using JUNIT testing and system testing during teh projects.
  • Good experience in design teh jobs and transformations and load teh data sequentially & parallel for initial and incremental loads.
  • Always worked in teams and appreciated for collaboration and problem solving competencies.
  • Excellent verbal and written communication skills.
  • Well versed wif project lifecycle documentation wif specialization in development of robust reusable designs and patterns.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, MapReduce, Spark, HDFS, HBase,Cassandra,Sqoop, Pig, Hive, Oozie, Zookeeper,Yarn, Flume,Kafka, AVRO, Parquet, Cloudera

Operating Systems: Windows, Linux, Unix

Programming/Scripting Languages: C, Java, Python, Shell, R, SQL, JavaScript, Perl

Cloud: AWS EC2, S3, Redshift

Database: AWS Redshift, MongoDb, Teradata, Oracle, DB2

Business Intelligence: Business Objects, Tableau

Tools: & Utilities: Eclipse, Netbeans, Git, Svn,Win SCP, Putty, Autosys

Web technologies: HTML5, CSS, XML, Javascript

PROFESSIONAL EXPERIENCE

Confidential, IL

Hadoop Data Engineer

Responsibilities:

  • Designed teh process and teh required environment to load teh data into teh On Premise Data Lake and a business specified subset to teh AWS Redshift Database
  • Worked on teh Data Ingestion framework to load data from teh Edge Node to teh appropriate directories in HDFS
  • Built scripts in python running Spark Confidential teh core to perform end to end data load and transformation, key components being, reconciliation for file validity, data quality and implementation of custom rules like filtering and transformation of data as per business requirements
  • Exploring wif theSparkfor improving teh performance and optimization of teh existing algorithms in Hadoop usingSparkContext,Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Developed a Shell Script Master Wrapper to wrap teh different code sets of teh project, reducing multiple jobs to a single job schedule in Control M
  • Designed teh shell script to be configurable to plug in and plug out reusable code components based on configuration parameters resulting in a reusable packaged framework
  • Incorporated logical exit points, error logging and restart ability in teh shell script for a robust support and troubleshooting framework
  • Created Hive tables dynamically partitioned on Data Date to make teh Raw data available to business users as a structure interface and storing files as Parquet wif Snappy compression
  • Loaded subset of teh data to AWS S3 and then AWS Redshift Database to provide an RDMS kind of business user experience
  • Took Infrastructure team help to understand AWS and get an EC2 instance running to push data from S3 to Redshift
  • Developed a Historical load strategy to load history data from teh source system getting experience wif Sqoop for load from Teradata to HDFS and Snowball for load to AWS Redshift
  • Designed numerousMapReduce jobs in java for data cleaning.
  • Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
  • Developed Java MapReduce program to extract teh required information from teh semi-structured files.
  • Created internal table, Externals tables in HIVE, and merged teh data sets using Hive joins.
  • Responsible fordataextraction anddataintegration from differentdatasources intoHadoopDataLakeby creating ETL pipelines Using Spark, MapReduce and Hive.
  • Implemented scripts in Spark to import and exportdatafrom Cassandra, Teradata,Hadoop.
  • Used Flume to stream teh log data from servers.
  • Used Apache Kafkato gather log data and fed into HDFS.
  • POC onSparkStreamingapplication for real time analytics.
  • Played fungible roles in teh spirit of agile wif primary as developer and secondary as tester working on system testing and UAT
  • Leveraged teh companies Dev Ops process for CICD (Continuous Integration Continuous Deployment) by storing code in teh GitHub Code repository and leveraging Jenkins and U-Deploy utilities to promote to QA and Prod environments
  • Active participant in all other housekeeping activities like Metadata Registry, Design Documentation, Test Cases, User Stories for Agile Story Board, and all team activities/session

Environment: Hadoop, Spark, MapReduce, Java, HDFS,HBase, Hive, CDH5, Python, Sqoop, UNIX, AWS EC2, S3, Redshift, GitHub

Confidential, NC

Hadoop Data Engineer

Responsibilities:

  • In conjunction to developer played an additional role of a Systems Integrator and captured requirements for teh project and got involved wif infrastructure setting up of EC2 instances to install Mongo drivers to extract data from teh MongoDB
  • Got a detailed understandingof NoSQL databases wif Mongo and understood teh concept of key value pair storage
  • Researched on different ways to parse JSON data, teh Mongo DB extract output.
  • Developed teh process to extract from Mongo on an EC2 instance and SFTP data to Hadoop Data Lake
  • Created pig script to parse JSON data and added custom UDF to handle conversion of some hexadecimal fields to ASCII
  • Developed MapReduce programs to cleanse teh data in HDFS to make it suitable for ingestion into Hive schema for analysis and also to perform business specific transformations, like conversion of data fields, validation of data, and other business logic
  • Loaded teh final parsed and transformed data into Hive structures creating Avro files partitioned on load date time stamp
  • Implemented Partitioning, Bucketing in Hive for better organization of teh data.
  • Implemented a view based strategy to segregate sensitive data and assign different access roles to different views
  • Created a parallel branch to load teh same data to Teradata using sqoop utilities
  • Experience in using Flume and Kafka to load teh log data from multiple sources into HDFS.
  • Created a Tableau report on teh Teradata solution to provide business wif their day to day audit reporting needs
  • Worked wif teh System testers to manually query teh data and do a source to target comparison to ensure data integrity
  • Prepared test plans and writing test cases
  • Worked wif teh business team to get them access to Hive and teh Tableau reports along wif UAT for teh project.

Environment: Hadoop, Hive, Pig, MongoDB, UNIX, MapReduce, Kafka, Flume, CDH, SQL, Teradata, Tableau, AWS EC2

Confidential, TX

Hadoop/ Java developer

Responsibilities:

  • All teh fact and dimension tables were imported from sql server into Hadoop using Sqoop.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream teh log data from servers.
  • Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and also logs data from servers.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream teh log data from servers.
  • Developed MapReduce programs to cleanse teh data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Teh Hive tables created as per requirement were managed or external tables defined wif appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Partitioning, Bucketing in Hive for better organization of teh data.
  • Involving actively in Hadoop administration tasks.
  • Developed python UDFs in Pig and Hive
  • Used Apache Kafkato gather log data and fed into HDFS.
  • UsedOozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as JavaMapReduce, Hive and Sqoop as well as system specific jobs.
  • Installed and configured various components of Hadoop ecosystem and maintained their integrity.
  • Implemented Fair Scheduler on teh job tracker to allocate teh fair amount of resources to small jobs.
  • Implemented automatic failover Zookeeper and zookeeper failover controller.
  • UsedOozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Developed Java map-reduce programs for to encapsulate transformations.
  • Developed Oozie workflows and sub workflows to orchestrate teh Sqoop scripts, pig scripts, hive queries and teh Oozie workflows are scheduled through Autosys.
  • Worked wif DevOps team in Hadoop cluster planning and installation.
  • Participation in Performance tuning in database side, transformations, and jobs level.

Environment: Hadoop, HDFS, Map Reduce, Sqoop, Hive, Pig, Oozie, NDM, Cassandra, SVN, CDH4, Cloudera Manager, MySQL, Informatica, Eclipse

Confidential, OK

Java/J2EE Developer

Responsibilities:

  • Involved in various Software Development Life Cycle (SDLC) phases of teh project like Requirement gathering, development, enhancements usingagilemethodologies.
  • Developed teh user interface using Spring MVC, JSP, JSTL, Javascript. Custom Tags, Jquery, Html and CSS.
  • Used Spring MVC for implementing teh Web layer of teh application. This includes developing Controllers, Views and Validators.
  • Developed teh service and domain layer using Spring Framework modules like Core-IOC, AOP.
  • Developed teh Application Framework using Java, Spring, Hibernate and Log4J.
  • Created DB tables, functions, Joins and wrote prepared statements using SQL.
  • Configured Hibernate session factory in applicationcontext.xml to integrate Hibernate wif Spring.
  • Configured ApplicationContext.xml in SPRING to adopt communication between Operations and their corresponding handlers.
  • Developed Spring rest controllers to handle json data and wrote dao’s and services to handle teh data.
  • Created DB tables, functions, Joins and wrote prepared statements using PL/SQL.
  • Consumed and Create REST Web services using Springand Apache CXF.
  • For integration framework Apache Camel is used.
  • Developed MySQL stored procedures and triggers using SQL in order to calculate and update teh tables to implement business logic.
  • Used Perlscripts for automation of deployments to Application server
  • Used Maven to build teh application and deployed on JBoss Application Server.
  • Used intellij for development and JBossApplication Server for deploying teh web application.
  • Performed usability testing for teh application using JUnit Test.
  • Monitored teh error logs using log4j.
  • Developed JUnit testing framework for Unit level testing.
  • Implemented Spring JMS message listeners wif JMS queues for consumption of Asynchronous requests.
  • Used AOP concepts like aspect, join point, advice, point cut, target object and also AOP proxies.

Environment: Jdk1.6, HTML, Jsp, Spring, JBoss, log 4j, Perl, Tortoise SVN, Hibernate, SOAP web services, maven, SOAP UI, Eclipse Kepler, java script, Xml, Mysql v5

We'd love your feedback!