We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

Plano, TX

SUMMARY

  • Around 7 years of experience in design and deployment of Enterprise Application Development, Web Applications, Client - Server Technologies, Web Programming using Java and Big data technologies
  • Possesses 3+ years of rich Hadoop experience in design and development of Big Data applications, which involves ecosystem tools and technologies like Apache Hadoop Map/Reduce, HDFS, Hive, Impala, Pig, Oozie, Sqoop, Flume, Kafka, Storm and Spark.
  • Expertise in developing solutions around NOSQL databases like HBase, mongo DB.
  • Experience with all flavor of Hadoop distributions, including Cloudera, Horton works and MapR.
  • Excellent understanding of Hadoop architecture Map Reduce MRv1 and Map Reduce MRv2 (YARN).
  • Developed multiple Map Reduce programs to process large volumes of semi/unstructured data files using different Map Reduce design patterns.
  • Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Knowledge on installing, configuring, testing Hadoop ecosystem components.
  • Strong experience in writing Map Reduce jobs in Java.
  • Experience with various performance optimizations like using distributed cache for small datasets, partitioning, bucketing in Hive and Map Side joins when writing Map Reduce jobs.
  • Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Worked extensively over semi-structured data (fixed length & delimited files), for data sanitation, report generation and standardization.
  • Developed UDF’s in Java as and when necessary to use with PIG and HIVE queries.
  • Have dealt with Zookeeper an Oozie Operational Services for coordinating the cluster and scheduling workflows.
  • Proficient using big data ingestion tools like Flume and Sqoop.
  • Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Experience in handling continuous streaming data using Flume and memory channels.
  • Experience with Puppet, Chef, Perl and Python.
  • Good knowledge on executing Spark SQL queries against data in Hive.
  • Experienced in monitoring Hadoop cluster using Cloudera Manager and Web UI.
  • Extensive Experience working on web technologies like HTML, CSS, XML, JSON, JQuery
  • Experienced with build tools Maven, ANT and continuous integrations like Jenkins.
  • Implemented real time system with Kafka, Storm and Zookeeper.
  • Extensive experience with SQL, PL/SQL, ORACLE/Postgress and DB2 database concepts.
  • Performed Load testing using JMeter on postgress.
  • Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA to track issues and crucible for code reviews.
  • Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.

TECHNICAL SKILLS

Hadoop Core Services: HDFS, Map Reduce, Spark, YARN,S3

Hadoop Distribution: Horton works, Cloudera, MapR

NO SQL Databases: HBase, Cassandra

Hadoop Data Ecosystem Services: Hive, Pig, Sqoop, Flume, Sqoop, Kafka, Storm

Hadoop Operational Services: Zookeeper, Oozie

Log Management: Splunk, Log Logic, ESP

Cloud Computing Tools: Amazon AWS

Languages: Java, Python, SQL, PL/SQL, Pig Latin, HiveQL, Impala, Unix Shell Scripting

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB

Databases: Oracle, MySQL, Postgress, Teradata

Operating Systems: UNIX, Windows, LINUX

Build Tools: Jenkins, Maven, ANT

Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans

Development Methodologies: Agile/Scrum, Agile/Kanban, Waterfall

ETL Tools: Informatica

PROFESSIONAL EXPERIENCE

Confidential, Plano, TX

Data Engineer

Responsibilities:

  • Collaborated with the Internal/Client BA’s in understanding the requirement and architect adataflow system.
  • Developed complete end to endBig - dataprocessing in hadoop eco system using python and Scala.
  • Used Spark API 2.3.1 over Amazon EMR to perform analytics ondatain Impala 2.3.1
  • Implemented extensive hive queries and creating views for adhoc and business processing.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop 3.0 using Spark Context, SparkSQL,DataFrames.
  • Created hive schemas using performance techniques like partitioning and bucketing.
  • Written extensive Hive queries to do transformations on thedatato be used by downstream models.
  • Involved in some product business and functional requirement through gathering team, update the user comments in JIRA and documentation in Confluence.
  • Importedand exported datafrom different sources like HDFS/Hbase/postgress databases to Spark RDD.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Developed numerous MapReduce jobs in Scala 2.3.1 x forDataCleansing and AnalyzingDatain Hive
  • Loaded and extracted thedatausing python 3 from, postgress into S3 .
  • Used Jira for project tracking, Bug tracking and Project Management. Performed all the roles and responsibilities of Scrum master.
  • • Involved in Scrum calls, Grooming and Demo meeting.
  • Performed load testing/Performance testing on Postgress database using JMeter.
  • Involved in writing expensive queries in Postgress for load testing.

Environment: Amazon EMR 5.12.0, S3, Spark, Scala, Python, HDFS, Pig, Hive, Hbase, postgress, Java, Linux, JMeter, JIRA.

Confidential, Sunnyvale, CA

Sr. Hadoop developer

Responsibilities:

  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed multiple Kafka Producers and Consumers from scratch implementing organization’s requirements.
  • Responsible for creating, modifying and deleting topics (Kafka Queues) as and when required with varying configurations involving replication factors, partitions.
  • Developed code to write canonical model JSON records from various input sources to Kafka Queues.
  • Configured, deployed and maintained a single node Zookeeper cluster in DEV environment.
  • Used Spark-Streaming APIs to perform necessary transformations and actions for building the common learner data model that gets the data from Kafka in near real time and Persists into Hbase.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Worked extensively with Sqoop for importing metadata from Mysql.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data and Elastic search for indexing.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Cloudera CDH 5.10, Spark, Scala, Kafka, HDFS, Pig, Hive, Hbase, Mysql, Java, Linux, Tableau, Sqoop

Confidential, Frankfort, KY

Sr.Hadoop developer

Responsibilities:

  • Developed simple and complex Map Reduce programs in Java for Data transformations and analysis on different data formats.
  • Building enterprise platforms coding in python and devops with Chef.
  • Performing Hadoop day-to-day operations using Hadoop ecosystem tools (HDFS, Map-Reduce, Hbase, and Hive) including operation, deployment and debugging of job issues.
  • Exploring with Spark improving the performance and optimization of the existing algorithms inHadoopusing Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed Map Reduce programs that filter bad and un-necessary records and find out unique records based on different criteria.
  • Developed Secondary sorting implementation to get sorted values at reduce side to improve map reduce performance.
  • Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for Map Reduce computationsto handle custom business requirements.
  • Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
  • Used Horton works Ambari to monitor the health of the cluster
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
  • Responsible for performing extensive data validation using Hive.
  • Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS.
  • Supports tuple processing, writing data with Storm by provide Storm-Kafka connectors.
  • Worked intuning Hive and Pig scriptsto improve performance
  • Involved in submitting and tracking Map Reduce jobs using JobTracker.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
  • Planned, implemented, and managed Splunk for log management and analytics.
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented Junit and MRunit test scripts to support test driven development and continuous integration.

Environment: Horton Works Distribution, Map Reduce, Spark, Scala, HDFS, Pig, Hive, Oozie, Splunk, Kafka, Java, Linux, Teradata

Confidential

Sr.Hadoop Developer

Responsibilities:

  • Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop
  • Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie
  • Importing log files using Flume into HDFS and load into Hive and Impala tables to query data
  • Monitoring the runningMap Reduceprograms on the cluster.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Experience in managing Hadoop clusters usingCloudera Manager Tool.
  • Responsible for loading data from UNIX file systems to HDFS
  • Used PIG-Hive integration with Hcatalog, written multiple Hive UDFs for complex queries
  • Involved in writing APIs to readHBasetables, cleanse data and write to anotherHBasetable
  • Created multiple Hive and Impala tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access
  • Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
  • Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements
  • Experienced in writing programs using HBase Client API
  • Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop
  • Experienced in design, development, tuning and maintenance of NoSQL database
  • Written Map Reduce program in Python with the Hadoop streaming API
  • Developed unit test cases for Hadoop Map Reduce jobs with MRUnit
  • Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.

Environment: Cloudera CDH4, Map Reduce, HDFS, HBase, Hive, Impala, Pig, Java, SQL, Sqoop, Oozie, UNIX, Java, Maven, Eclipse

Confidential

Java Developer

Responsibilities:

  • Gathered business requirements and wrote functional specifications and detailed design documents
  • Extensively used Core Java, Servlets, JSP and XML
  • Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database
  • Implemented Enterprise Logging service using JMS and apache CXF.
  • Designed and developed automated deployment and scaling processes based on Vagrant and Chef for a wide range of server types and application tiers, including Elastic-search, and Zend PHP Clusters.
  • Developed Unit Test Cases, and used JUNIT for unit testing of the application
  • Implemented Framework Component to consume ELS service.
  • Involved in designing user screens and validations using HTML, jQuery, Ext JS and JSP as per user requirements
  • Implemented JMS producer and Consumer using Mule ESB.
  • Wrote SQL queries, stored procedures, and triggers to perform back-end database operations
  • Sending Email Alerts to supporting team using BMC.
  • Designed Low Level design documents for ELS Service.
  • Closely worked with QA, Business and Architect to solve various Defects in quick and fast to meet deadlines.

Environment: Java, Spring core, JMS Web services, JMS, JDK, SVN, Maven, Mule ESB Mule,Chef,Junit,WAS7,Jquery, Ajax, SAX.

Confidential

Java Developer

Responsibilities:

  • Designed, developed, maintained, tested, and troubleshoot Java and PL/SQL programs in support of Payroll employees.
  • Developed documentation for new and existing programs, designs specific enhancements to application.
  • Implemented web layer using JSF and Ice faces.
  • Implemented business layer using Spring MVC.
  • Implemented Getting Reports based on start date using HQL.
  • Implemented Session Management using Session Factory in Hibernate.
  • Developed the DO’s and DAO’s using hibernate.
  • Implement SOAP web service to validate zip code using Apache Axis.
  • Wrote complex queries, PL/SQL Stored Procedures, Functions and Packages to implement Business Rules.
  • Wrote PL/SQL program to send EMAIL to a group from backend.
  • Developer scripts to be triggered monthly to give current monthly analysis.
  • Scheduled Jobs to be triggered on a specific day and time.
  • Modified SQL statements to increase the overall performance as a part of basic performance tuning and exception handling.
  • Extensively used log4j for logging the log files
  • Performed UNIT testing in all the environments.
  • UsedSubversionas the version control system

Environment: Java, Servlets, JSP, Hibernate, Junit Testing, Oracle DB, SQL, Jasper Reports, iReport, Maven, Jenkins.

We'd love your feedback!