We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY:

  • Professional experience in IT industry as a software Engineer with a background in design, development, and testing of applications.
  • Worked in various domains including Media, Finance, Manufacturing and E - commerce
  • Dedicated professional Data Engineer with a solid background in Hadoop ecosystem like HDFS, MapReduce, Spark, Hive, Kafka, Pig, Sqoop and Zookeeper
  • Have a deep understanding of workload management, schedulers, scalability and distributed platform architectures
  • Proficient in Spark programing with Scala and Python for high-volume data processing
  • Experience in collecting, processing and aggregating large amounts of streaming data using Kafka, Spark Streaming
  • Experience in writing Pig Latin scripts and HiveQL Queries for preprocessing and analyzing large volumes of data
  • Proficient in writing MapReduce programs with Java for data processing in Hadoop
  • Experience in importing and exporting buck of data using Sqoop from HDFS/Hive/HBase to RDBMS
  • Experience in working with RDBMS including Oracle and MySQL
  • Experience in developing scalable solutions using NoSQL databases including Cassandra, HBase
  • Knowledge of data serialization and familiar with data formats including SequenceFile, Avro, Parquet, XML and JSON
  • Experience on commercial distribution of Hadoop including HortonWorks HDP and Cloudera CDH, and MapR
  • Experience in working with AWS using the services like EC2/EMR/S3
  • Involved in Hadoop cluster administration & performance tuning
  • Experience in all the phases of Data warehouse life cycle involving requirement analysis, design, coding, testing, and deployment
  • Strong in Core Java, Data Structure and Algorithms, and Object-Oriented Design
  • Experience in Unit Testing with JUnit, Scala Test, Python unittest
  • Familiar with various web development technologies including JavaScript, Bootstrap, Ajax, JQuery, Node.js, AngularJS, Hibernate, and Spring
  • Familiar with software development tools like Git, SVN, JIRA and Jenkins.
  • Expose to various software development methodologies like Agile and Waterfall.
  • A good team-player, can work independently in a fast-paced multitasking environment, and a self-motivated learner

TECHNICAL SKILLS:

Hadoop/Spark Ecosystem \Programming Language: \: Hadoop 2.x, MapReduce, Spark 2.x, Pig 0.12, \Java, Scala, Python, SQL, Unix/Bash shell, \Hive 0.14, Sqoop 1.4.6, Kafka 0.9.x, Yarn, \JavaScript, HTML, CSS, XML\Mesos, Zookeeper 3.4.x\

Web Development Framework \Database: \: JQuery, Ajax, AngularJS, Bootstrap, Hibernate, \Oracle 10g, MySQL 5.x, HBase 0.98, \Spring, \Cassandra 2.1.x\

Operating System \Cloud Platform: \: Linux, Mac OS, Windows\Amazon Web Services EC2/EMR/S3 \

Environment & Tools: \IDE: \: Git/Github, Agile/Scrum, SVN, JIRA, Jenkins\IntelliJ IDEA, Eclipse, Visual Studio Code\

PROFESSIONAL EXPERIENCE:

Confidential, Phoenix, AZ

Data Engineer

Responsibilities:

  • Designed, developed, implemented, testing and maintenance of data ingestion and integrated the new enterprise ETL pipelines for Amex Email Marketing Department including Kafka, batch processing, PySpark, Spark streaming, Hive
  • Developed Kafka producers and consumers using Java and Python to move data from various data sources to different departments
  • Developed Spark Streaming programs using Python to process real time data from Kafka with batch processing
  • Utilized Hive as Data warehouse to provide store structured data.
  • Validated data Avro schema from different data source
  • Stored processed data in Hive Tables for Machine Learning team to analysis the data
  • Used Git for version control and JIRA for project tracking
  • Involved in reviewing Functional requirements and designing solutions
  • Documented systems process and procedures for future s
  • Involved in gathering the requirements, designing, development and testing
  • Used shell scripts for administration, maintenance and troubleshooting
  • Involved in story-driven Agile development methodology and actively participated in daily Scrum meetings

Environment: Hadoop 2.x, HDFS, Kafka, Spark 2.x, Spark Streaming, Hive, Avro, Java, Python 3.5, Python unittest, Maven, Jenkins, Git, JIRA

Confidential, Sunnyvale, CA

Data Engineer

Responsibilities:

  • Involved ETL processes including data processing and data storage.
  • Applied Spark using Scala to do the data batch processing
  • Designed, developed, implemented Kafka Streaming using Scala including Producer and Consumer.
  • Processed data between different topics in Kafka in Avro files
  • Utilized Sqoop to import and output data between Oracle database and HDFS
  • Configure the Sqoop incremental import job for importing the updated input data
  • Convert raw data with sequence data format, such as Avro to reduce data processing time and increase data transferring efficiency through the network
  • Involved in application performance tuning and troubleshooting
  • Collaborate and tracking the work with Git and JIRA
  • Actively participated and provided feedback constructively during daily Stand up meetings and weekly Iterative review meetings

Environment: Hadoop 2.x, Kafka, HDFS, Sqoop 1.4.6, Spark 2.x, Scala, ScalaTest, Jenkins, Git,JIRA, Agile

Confidential, Piscataway, NJ

Data Engineer

Responsibilities:

  • Designed, developed, implemented, testing and maintenance of data ingestion and integration ETL pipelines including Kafka, batch processing, Spark streaming, Cassandra
  • Developed Kafka consumers efficient ingested data from various data sources
  • Developed Spark Streaming programs to process real time data from Kafka, and process data with both stateless and state full transformations
  • Developed Spark programs with Scala and applied principles of functional programming to do batch processing
  • Utilized Spark SQL with Data Frames API to provide efficiently structured data processing.
  • Built a Cassandra data model based on different requirement
  • Stored both the raw data and processed results in the Cassandra for future decision support and BI analytics
  • Configured ZooKeeper to coordinate and support Kafka, Spark, Cassandra and HDFS
  • Deploy services on AWS and utilized Lambda function to trigger the data pipeline.
  • Performed unit testing using ScalaTest
  • Used Git for version control and JIRA for project tracking

Environment: Hadoop 2.x, HDFS, Kafka 0.9.x, Spark 2.x, Spark Streaming, Spark SQL, Cassandra 2.1.x, Zookeeper 3.4.x, ScalaTest, AWS, Git, JIRA

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Involved ETL processes including data processing and data storage.
  • Applied Spark using Scala to do the data batch processing, and store the output in HBase for scalable storage and fast query
  • Designed and created of Hive tables and worked on various performance optimizations like Partition, Bucketing in Hive
  • Implemented Hive custom UDFs and Analyzed large data sets by running HiveQL to achieve comprehensive data analysis
  • Migrated of MapReduce jobs and Hive queries into Spark transformations and actions to improve the performance
  • Utilized Sqoop to import and output data between Oracle database and HDFS
  • Configure the Sqoop incremental import job for importing the updated input data
  • Convert raw data with sequence data format, such as Avro, and Parquet to reduce data processing time and increase data transferring efficiency through the network
  • Involved in application performance tuning and troubleshooting
  • Collaborate and tracking the work with Git and JIRA
  • Actively participated and provided feedback constructively during daily Stand up meetings and weekly Iterative review meetings

Environment: Hadoop 2.x, MapReduce, HDFS, Sqoop 1.4.6, Hive 0.14, Spark 1.4.x, Scala, HBase 0.98, Git,JIRA, Agile

Confidential, Shenyang, CN

Java Developer

Responsibilities:

  • Developed user interface using HTML, CSS3 and JavaScript for the presentation tier
  • Used JSP and JavaScript for encapsulating presentation for sales module
  • Developed Controller Servlet to handle all the request and MySQL database access.
  • Involved in integration with Spring and developing ORM using Hibernate
  • Installed and configured Apache Tomcat
  • Deployed the application, supported and maintained regular functioning on server.

Environment: Java, Servlet 3.0, JSP 2.2, HTML, CSS3, JavaScript, Spring MVC, Hibernate 4.0, Apache Tomcat 7.0, MySQL 5.1.54, Eclipse

Confidential, Shenyang, CN

Java Developer

Responsibilities:

  • Designed and coded application components with JSP, Servlet and AJAX.
  • Implemented data persistency using JDBC for database connectivity and Hibernate for database/java object mapping.
  • Designed the logical and physical data model, generated DDL, DML scripts.
  • Designed user-interface and used JavaScript to check validations.
  • Wrote MySQL queries, stored procedures and database triggers as required on the database objects.

We'd love your feedback!