Data Engineer Resume Plano, TX - Hire IT People

SUMMARY

Around 7 years of experience in design and deployment of Enterprise Application Development, Web Applications, Client - Server Technologies, Web Programming using Java and Big data technologies
Possesses 3+ years of rich Hadoop experience in design and development of Big Data applications, which involves ecosystem tools and technologies like Apache Hadoop Map/Reduce, HDFS, Hive, Impala, Pig, Oozie, Sqoop, Flume, Kafka, Storm and Spark.
Expertise in developing solutions around NOSQL databases like HBase, mongo DB.
Experience with all flavor of Hadoop distributions, including Cloudera, Horton works and MapR.
Excellent understanding of Hadoop architecture Map Reduce MRv1 and Map Reduce MRv2 (YARN).
Developed multiple Map Reduce programs to process large volumes of semi/unstructured data files using different Map Reduce design patterns.
Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
Knowledge on installing, configuring, testing Hadoop ecosystem components.
Strong experience in writing Map Reduce jobs in Java.
Experience with various performance optimizations like using distributed cache for small datasets, partitioning, bucketing in Hive and Map Side joins when writing Map Reduce jobs.
Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Worked extensively over semi-structured data (fixed length & delimited files), for data sanitation, report generation and standardization.
Developed UDF’s in Java as and when necessary to use with PIG and HIVE queries.
Have dealt with Zookeeper an Oozie Operational Services for coordinating the cluster and scheduling workflows.
Proficient using big data ingestion tools like Flume and Sqoop.
Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
Experience in handling continuous streaming data using Flume and memory channels.
Experience with Puppet, Chef, Perl and Python.
Good knowledge on executing Spark SQL queries against data in Hive.
Experienced in monitoring Hadoop cluster using Cloudera Manager and Web UI.
Extensive Experience working on web technologies like HTML, CSS, XML, JSON, JQuery
Experienced with build tools Maven, ANT and continuous integrations like Jenkins.
Implemented real time system with Kafka, Storm and Zookeeper.
Extensive experience with SQL, PL/SQL, ORACLE/Postgress and DB2 database concepts.
Performed Load testing using JMeter on postgress.
Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA to track issues and crucible for code reviews.
Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.

TECHNICAL SKILLS

Hadoop Core Services: HDFS, Map Reduce, Spark, YARN,S3

Hadoop Distribution: Horton works, Cloudera, MapR

NO SQL Databases: HBase, Cassandra

Hadoop Data Ecosystem Services: Hive, Pig, Sqoop, Flume, Sqoop, Kafka, Storm

Hadoop Operational Services: Zookeeper, Oozie

Log Management: Splunk, Log Logic, ESP

Cloud Computing Tools: Amazon AWS

Languages: Java, Python, SQL, PL/SQL, Pig Latin, HiveQL, Impala, Unix Shell Scripting

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB

Databases: Oracle, MySQL, Postgress, Teradata

Operating Systems: UNIX, Windows, LINUX

Build Tools: Jenkins, Maven, ANT

Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans

Development Methodologies: Agile/Scrum, Agile/Kanban, Waterfall

ETL Tools: Informatica

PROFESSIONAL EXPERIENCE

Confidential, Plano, TX

Data Engineer

Responsibilities:

Collaborated with the Internal/Client BA’s in understanding the requirement and architect adataflow system.
Developed complete end to endBig - dataprocessing in hadoop eco system using python and Scala.
Used Spark API 2.3.1 over Amazon EMR to perform analytics ondatain Impala 2.3.1
Implemented extensive hive queries and creating views for adhoc and business processing.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop 3.0 using Spark Context, SparkSQL,DataFrames.
Created hive schemas using performance techniques like partitioning and bucketing.
Written extensive Hive queries to do transformations on thedatato be used by downstream models.
Involved in some product business and functional requirement through gathering team, update the user comments in JIRA and documentation in Confluence.
Importedand exported datafrom different sources like HDFS/Hbase/postgress databases to Spark RDD.
Developed Spark scripts by using Scala Shell commands as per the requirement.
Developed numerous MapReduce jobs in Scala 2.3.1 x forDataCleansing and AnalyzingDatain Hive
Loaded and extracted thedatausing python 3 from, postgress into S3 .
Used Jira for project tracking, Bug tracking and Project Management. Performed all the roles and responsibilities of Scrum master.
• Involved in Scrum calls, Grooming and Demo meeting.
Performed load testing/Performance testing on Postgress database using JMeter.
Involved in writing expensive queries in Postgress for load testing.

Environment: Amazon EMR 5.12.0, S3, Spark, Scala, Python, HDFS, Pig, Hive, Hbase, postgress, Java, Linux, JMeter, JIRA.

Confidential, Sunnyvale, CA

Sr. Hadoop developer

Responsibilities:

Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Developed multiple Kafka Producers and Consumers from scratch implementing organization’s requirements.
Responsible for creating, modifying and deleting topics (Kafka Queues) as and when required with varying configurations involving replication factors, partitions.
Developed code to write canonical model JSON records from various input sources to Kafka Queues.
Configured, deployed and maintained a single node Zookeeper cluster in DEV environment.
Used Spark-Streaming APIs to perform necessary transformations and actions for building the common learner data model that gets the data from Kafka in near real time and Persists into Hbase.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Worked extensively with Sqoop for importing metadata from Mysql.
Involved in creating Hive tables, and loading and analyzing data using hive queries.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Implemented schema extraction for Parquet and Avro file Formats in Hive.
Used Reporting tools like Tableau to connect with Hive for generating daily reports of data and Elastic search for indexing.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Cloudera CDH 5.10, Spark, Scala, Kafka, HDFS, Pig, Hive, Hbase, Mysql, Java, Linux, Tableau, Sqoop

Confidential, Frankfort, KY

Sr.Hadoop developer

Responsibilities:

Developed simple and complex Map Reduce programs in Java for Data transformations and analysis on different data formats.
Building enterprise platforms coding in python and devops with Chef.
Performing Hadoop day-to-day operations using Hadoop ecosystem tools (HDFS, Map-Reduce, Hbase, and Hive) including operation, deployment and debugging of job issues.
Exploring with Spark improving the performance and optimization of the existing algorithms inHadoopusing Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Developed Map Reduce programs that filter bad and un-necessary records and find out unique records based on different criteria.
Developed Secondary sorting implementation to get sorted values at reduce side to improve map reduce performance.
Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for Map Reduce computationsto handle custom business requirements.
Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
Used Horton works Ambari to monitor the health of the cluster
Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
Developed Spark scripts by using Scala shell commands as per the requirement.
Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
Responsible for performing extensive data validation using Hive.
Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS.
Supports tuple processing, writing data with Storm by provide Storm-Kafka connectors.
Worked intuning Hive and Pig scriptsto improve performance
Involved in submitting and tracking Map Reduce jobs using JobTracker.
Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
Planned, implemented, and managed Splunk for log management and analytics.
Implemented Hive Generic UDF's to implement business logic.
Implemented Junit and MRunit test scripts to support test driven development and continuous integration.

Environment: Horton Works Distribution, Map Reduce, Spark, Scala, HDFS, Pig, Hive, Oozie, Splunk, Kafka, Java, Linux, Teradata

Confidential

Sr.Hadoop Developer

Responsibilities:

Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop
Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie
Importing log files using Flume into HDFS and load into Hive and Impala tables to query data
Monitoring the runningMap Reduceprograms on the cluster.
Participated in development/implementation of Cloudera Hadoop environment.
Experience in managing Hadoop clusters usingCloudera Manager Tool.
Responsible for loading data from UNIX file systems to HDFS
Used PIG-Hive integration with Hcatalog, written multiple Hive UDFs for complex queries
Involved in writing APIs to readHBasetables, cleanse data and write to anotherHBasetable
Created multiple Hive and Impala tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access
Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements
Experienced in writing programs using HBase Client API
Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop
Experienced in design, development, tuning and maintenance of NoSQL database
Written Map Reduce program in Python with the Hadoop streaming API
Developed unit test cases for Hadoop Map Reduce jobs with MRUnit
Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.

Environment: Cloudera CDH4, Map Reduce, HDFS, HBase, Hive, Impala, Pig, Java, SQL, Sqoop, Oozie, UNIX, Java, Maven, Eclipse

Confidential

Java Developer

Responsibilities:

Gathered business requirements and wrote functional specifications and detailed design documents
Extensively used Core Java, Servlets, JSP and XML
Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database
Implemented Enterprise Logging service using JMS and apache CXF.
Designed and developed automated deployment and scaling processes based on Vagrant and Chef for a wide range of server types and application tiers, including Elastic-search, and Zend PHP Clusters.
Developed Unit Test Cases, and used JUNIT for unit testing of the application
Implemented Framework Component to consume ELS service.
Involved in designing user screens and validations using HTML, jQuery, Ext JS and JSP as per user requirements
Implemented JMS producer and Consumer using Mule ESB.
Wrote SQL queries, stored procedures, and triggers to perform back-end database operations
Sending Email Alerts to supporting team using BMC.
Designed Low Level design documents for ELS Service.
Closely worked with QA, Business and Architect to solve various Defects in quick and fast to meet deadlines.

Environment: Java, Spring core, JMS Web services, JMS, JDK, SVN, Maven, Mule ESB Mule,Chef,Junit,WAS7,Jquery, Ajax, SAX.

Confidential

Java Developer

Responsibilities:

Designed, developed, maintained, tested, and troubleshoot Java and PL/SQL programs in support of Payroll employees.
Developed documentation for new and existing programs, designs specific enhancements to application.
Implemented web layer using JSF and Ice faces.
Implemented business layer using Spring MVC.
Implemented Getting Reports based on start date using HQL.
Implemented Session Management using Session Factory in Hibernate.
Developed the DO’s and DAO’s using hibernate.
Implement SOAP web service to validate zip code using Apache Axis.
Wrote complex queries, PL/SQL Stored Procedures, Functions and Packages to implement Business Rules.
Wrote PL/SQL program to send EMAIL to a group from backend.
Developer scripts to be triggered monthly to give current monthly analysis.
Scheduled Jobs to be triggered on a specific day and time.
Modified SQL statements to increase the overall performance as a part of basic performance tuning and exception handling.
Extensively used log4j for logging the log files
Performed UNIT testing in all the environments.
UsedSubversionas the version control system

Environment: Java, Servlets, JSP, Hibernate, Junit Testing, Oracle DB, SQL, Jasper Reports, iReport, Maven, Jenkins.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Plano, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship