Data Engineer Resume
Phoenix, AZ
SUMMARY:
- Professional experience in IT industry as a software Engineer with a background in design, development, and testing of applications.
- Worked in various domains including Media, Finance, Manufacturing and E - commerce
- Dedicated professional Data Engineer with a solid background in Hadoop ecosystem like HDFS, MapReduce, Spark, Hive, Kafka, Pig, Sqoop and Zookeeper
- Have a deep understanding of workload management, schedulers, scalability and distributed platform architectures
- Proficient in Spark programing with Scala and Python for high-volume data processing
- Experience in collecting, processing and aggregating large amounts of streaming data using Kafka, Spark Streaming
- Experience in writing Pig Latin scripts and HiveQL Queries for preprocessing and analyzing large volumes of data
- Proficient in writing MapReduce programs with Java for data processing in Hadoop
- Experience in importing and exporting buck of data using Sqoop from HDFS/Hive/HBase to RDBMS
- Experience in working with RDBMS including Oracle and MySQL
- Experience in developing scalable solutions using NoSQL databases including Cassandra, HBase
- Knowledge of data serialization and familiar with data formats including SequenceFile, Avro, Parquet, XML and JSON
- Experience on commercial distribution of Hadoop including HortonWorks HDP and Cloudera CDH, and MapR
- Experience in working with AWS using the services like EC2/EMR/S3
- Involved in Hadoop cluster administration & performance tuning
- Experience in all the phases of Data warehouse life cycle involving requirement analysis, design, coding, testing, and deployment
- Strong in Core Java, Data Structure and Algorithms, and Object-Oriented Design
- Experience in Unit Testing with JUnit, Scala Test, Python unittest
- Familiar with various web development technologies including JavaScript, Bootstrap, Ajax, JQuery, Node.js, AngularJS, Hibernate, and Spring
- Familiar with software development tools like Git, SVN, JIRA and Jenkins.
- Expose to various software development methodologies like Agile and Waterfall.
- A good team-player, can work independently in a fast-paced multitasking environment, and a self-motivated learner
TECHNICAL SKILLS:
Hadoop/Spark Ecosystem \Programming Language: \: Hadoop 2.x, MapReduce, Spark 2.x, Pig 0.12, \Java, Scala, Python, SQL, Unix/Bash shell, \Hive 0.14, Sqoop 1.4.6, Kafka 0.9.x, Yarn, \JavaScript, HTML, CSS, XML\Mesos, Zookeeper 3.4.x\
Web Development Framework \Database: \: JQuery, Ajax, AngularJS, Bootstrap, Hibernate, \Oracle 10g, MySQL 5.x, HBase 0.98, \Spring, \Cassandra 2.1.x\
Operating System \Cloud Platform: \: Linux, Mac OS, Windows\Amazon Web Services EC2/EMR/S3 \
Environment & Tools: \IDE: \: Git/Github, Agile/Scrum, SVN, JIRA, Jenkins\IntelliJ IDEA, Eclipse, Visual Studio Code\
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix, AZ
Data Engineer
Responsibilities:
- Designed, developed, implemented, testing and maintenance of data ingestion and integrated the new enterprise ETL pipelines for Amex Email Marketing Department including Kafka, batch processing, PySpark, Spark streaming, Hive
- Developed Kafka producers and consumers using Java and Python to move data from various data sources to different departments
- Developed Spark Streaming programs using Python to process real time data from Kafka with batch processing
- Utilized Hive as Data warehouse to provide store structured data.
- Validated data Avro schema from different data source
- Stored processed data in Hive Tables for Machine Learning team to analysis the data
- Used Git for version control and JIRA for project tracking
- Involved in reviewing Functional requirements and designing solutions
- Documented systems process and procedures for future s
- Involved in gathering the requirements, designing, development and testing
- Used shell scripts for administration, maintenance and troubleshooting
- Involved in story-driven Agile development methodology and actively participated in daily Scrum meetings
Environment: Hadoop 2.x, HDFS, Kafka, Spark 2.x, Spark Streaming, Hive, Avro, Java, Python 3.5, Python unittest, Maven, Jenkins, Git, JIRA
Confidential, Sunnyvale, CA
Data Engineer
Responsibilities:
- Involved ETL processes including data processing and data storage.
- Applied Spark using Scala to do the data batch processing
- Designed, developed, implemented Kafka Streaming using Scala including Producer and Consumer.
- Processed data between different topics in Kafka in Avro files
- Utilized Sqoop to import and output data between Oracle database and HDFS
- Configure the Sqoop incremental import job for importing the updated input data
- Convert raw data with sequence data format, such as Avro to reduce data processing time and increase data transferring efficiency through the network
- Involved in application performance tuning and troubleshooting
- Collaborate and tracking the work with Git and JIRA
- Actively participated and provided feedback constructively during daily Stand up meetings and weekly Iterative review meetings
Environment: Hadoop 2.x, Kafka, HDFS, Sqoop 1.4.6, Spark 2.x, Scala, ScalaTest, Jenkins, Git,JIRA, Agile
Confidential, Piscataway, NJ
Data Engineer
Responsibilities:
- Designed, developed, implemented, testing and maintenance of data ingestion and integration ETL pipelines including Kafka, batch processing, Spark streaming, Cassandra
- Developed Kafka consumers efficient ingested data from various data sources
- Developed Spark Streaming programs to process real time data from Kafka, and process data with both stateless and state full transformations
- Developed Spark programs with Scala and applied principles of functional programming to do batch processing
- Utilized Spark SQL with Data Frames API to provide efficiently structured data processing.
- Built a Cassandra data model based on different requirement
- Stored both the raw data and processed results in the Cassandra for future decision support and BI analytics
- Configured ZooKeeper to coordinate and support Kafka, Spark, Cassandra and HDFS
- Deploy services on AWS and utilized Lambda function to trigger the data pipeline.
- Performed unit testing using ScalaTest
- Used Git for version control and JIRA for project tracking
Environment: Hadoop 2.x, HDFS, Kafka 0.9.x, Spark 2.x, Spark Streaming, Spark SQL, Cassandra 2.1.x, Zookeeper 3.4.x, ScalaTest, AWS, Git, JIRA
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Involved ETL processes including data processing and data storage.
- Applied Spark using Scala to do the data batch processing, and store the output in HBase for scalable storage and fast query
- Designed and created of Hive tables and worked on various performance optimizations like Partition, Bucketing in Hive
- Implemented Hive custom UDFs and Analyzed large data sets by running HiveQL to achieve comprehensive data analysis
- Migrated of MapReduce jobs and Hive queries into Spark transformations and actions to improve the performance
- Utilized Sqoop to import and output data between Oracle database and HDFS
- Configure the Sqoop incremental import job for importing the updated input data
- Convert raw data with sequence data format, such as Avro, and Parquet to reduce data processing time and increase data transferring efficiency through the network
- Involved in application performance tuning and troubleshooting
- Collaborate and tracking the work with Git and JIRA
- Actively participated and provided feedback constructively during daily Stand up meetings and weekly Iterative review meetings
Environment: Hadoop 2.x, MapReduce, HDFS, Sqoop 1.4.6, Hive 0.14, Spark 1.4.x, Scala, HBase 0.98, Git,JIRA, Agile
Confidential, Shenyang, CN
Java Developer
Responsibilities:
- Developed user interface using HTML, CSS3 and JavaScript for the presentation tier
- Used JSP and JavaScript for encapsulating presentation for sales module
- Developed Controller Servlet to handle all the request and MySQL database access.
- Involved in integration with Spring and developing ORM using Hibernate
- Installed and configured Apache Tomcat
- Deployed the application, supported and maintained regular functioning on server.
Environment: Java, Servlet 3.0, JSP 2.2, HTML, CSS3, JavaScript, Spring MVC, Hibernate 4.0, Apache Tomcat 7.0, MySQL 5.1.54, Eclipse
Confidential, Shenyang, CN
Java Developer
Responsibilities:
- Designed and coded application components with JSP, Servlet and AJAX.
- Implemented data persistency using JDBC for database connectivity and Hibernate for database/java object mapping.
- Designed the logical and physical data model, generated DDL, DML scripts.
- Designed user-interface and used JavaScript to check validations.
- Wrote MySQL queries, stored procedures and database triggers as required on the database objects.
