We provide IT Staff Augmentation Services!

Data Engineer Resume

Sunnyvale, CA


  • 7+ years of overall experience in Enterprise Application Development in diverse industries which includes hands on experience in Bigdata ecosystem related technologies, including 4.5 years of comprehensive experience as a Hadoop, Bigdata & Analytics Developer.
  • Experienced in processing Bigdata on the Apache Hadoop framework using MapReduce programs.
  • Experienced in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
  • Experienced in using Pig, Hive, Scoop, Oozie, ZooKeeper, HBase, MapR and Cloudera Manager.
  • Imported and exported data using Sqoop from RDBMS to HDFS and vice versa.
  • Application development using Java, RDBMS, and Linux shell scripting.
  • Extended Hive and Pig core functionality by writing custom UDF, UDAF, UDTFs.
  • Experienced in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java
  • Familiar with Java virtual machine (JVM) and multi - threaded processing
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper
  • Experienced in designing, developing and implementing connectivity products that allow efficient exchange of data between the core database engine and the Hadoop ecosystem
  • Experienced in Data warehousing and using ETL tools like Informatica
  • Experienced in SQL tuning techniques
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSP, JSF, Struts, Spring, Hibernate, JDBC, EJB.
  • Possess excellent technical skills, consistently outperformed schedules and acquired interpersonal and communication skills
  • In-depth understanding of Data Structure and Algorithms
  • Expert in developing ETL functionalities using Hadoop technologies
  • Experience with Software Development Processes & Models: Agile, Waterfall and Scrum Model .
  • Reliable, hardworking, dedicated team player and customer oriented problem solver who works well under pressure and with minimum supervision.


Hadoop/BigData: HDFS, Mapreduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Cassandra, Kafka, Oozie, Zookeeper, Spark, Azkaban

Java & J2EE: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

IDE s: Eclipse

Frameworks: MVC, Struts, Hibernate, Spring

Programming languages: C, C++, Java, Python, Ant scripts, Linux shell scripts

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL

ETL Tools: Informatica

Versioning Tools: Clearcase, SVN, GIT, BITBUCKET


Confidential, Sunnyvale, CA

Data Engineer


  • Contributing to the data warehouse design and data preparation by implementing a solid, robust, extensible design that supports key business flows.
  • Responsible for migration of traditional ETL system to scalable distributed data solutions using Hadoop technologies with GDPR compliance.
  • Designing Datalake system to bring data under single umbrella.
  • Performs impact analysis of new change request and gap analysis of as-Is system and To-Be system.
  • Developed generic custom Hive UDF.
  • Defined and scheduled Hadoop jobs in Azkaban.
  • Migrated pig scripts to spark or hive code to improve the job performance.
  • Responsible for end-to-end White Box testing of Hadoop sessions written using Hive queries.
  • Designing, integrating and documenting technical components for seamless data extraction and analysis on big data platform.
  • Designs and develops application using JAVA, SQL and various scripting language to performs data warehousing activities like data mining, report generation that assists in business decision.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java, Scala, SQL, Yarn, Azkaban, Kafka, Presto, Teradata

Confidential, San Jose, CA

Data Engineer


  • Responsible for building scalable distributed data solutions using Hadoop.
  • Migrated business functionality from Informatica to Bigdata Hadoop ecosystem and reduced metrics execution time from 24 hours to 6 hours.
  • Developed Hive queries along with generic UDF and UDAF to process large data set.
  • Automated data import and export process to relational database using SQOOP.
  • Developed generic lookup by using caching mechanism to eliminate small table joins, leading to 20% of reduction in query execution time.
  • Developed ETL functionalities using Hadoop technologies.
  • Developed automated data pipelines and monitoring system.
  • Developed data quality scripts to perform anomaly checks in the data.
  • Responsible for setting up and maintaining Development, QA and Production environment.
  • Developed automated tool to create Hive tables of various file formats such as AVRO, Parquet, ORC, and Text by reading Oracle source definition.
  • Developed mechanism to monitor health of the services and respond accordingly to any warning or failure conditions.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, MapR, Sqoop, Oozie, ZooKeeper, Informatica, PL/SQL, MySQL, Windows, Tidal

Confidential, Milpitas, CA

Java Developer


  • Implemented automated mechanism to save log information to external system.
  • Loaded data into hadoop cluster from dynamically generated files using Flume.
  • Developed SQOOP scripts to import data from RDBMS to HDFS
  • Exported the detail data to the relational databases using Sqoop for visualization and to generate reports

Environment: Hadoop, MapReduce, YARN, Sqoop, HDFS, Hive, Pig, Oozie, Hbase, Java, Oracle, CentOS, Eclipse, Maven, Informatica BDE

Confidential, Ardmore, PA

Java Developer, Intern


  • Involved in developing user interface using JavaScript and CSS.
  • Fixed bugs/defects identified by QA during preproduction testing.
  • Wrote JUnit/NG test cases to perform sanity check prior to production build.
  • Used Multithreading, synchronization, caching and memory management.
  • Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)

Environment: Oracle 10g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6. Java, JDBC, Struts, Maven, SQL language, Eclipse


Research Assistant


  • Responsible for maintaining Hadoop clusters
  • Leading a team of Confidential .
  • Enhanced and maintained website www.computingportal.org using DRUPAL CMS, MySQL and PHP, enhanced the frontend features using HTML, JavaScript and CSS.
  • Participated actively in issue resolution.
  • Created a Mathematical application to assist the department research work- used Maple
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups Involved in loading data from UNIX file system to HDFS.

Environment: DRUPAL, MySQL, PHP, Maple, Cloudera, UNIX, JavaScript, HDFS


Programmer Analyst


  • Analyzing client’s requirements/needs and understanding the Airlines domain so as to achieve the functionality to the fullest without any gaps.
  • Involved in analyzing, designing and developing the web-app with the use of the best suitable technology.
  • Unit testing of components and debugging to fix bugs/defects if any.
  • Making presentations/walkthrough and providing documentations to customer regarding the application to make customer familiarize on the same.

Environment: Java, J2EE, SOAP UI, HSQL, MySQL, Maven, Clearcase, UNIX, Groovy 2.0, SpringSourceSuite, Grails, Oracle 9g.

Hire Now