Data Engineer Resume
Sunnyvale, CA
SUMMARY:
- 7+ years of overall experience in Enterprise Application Development in diverse industries which includes hands on experience in Bigdata ecosystem related technologies, including 4.5 years of comprehensive experience as a Hadoop, Bigdata & Analytics Developer.
- Experienced in processing Bigdata on the Apache Hadoop framework using MapReduce programs.
- Experienced in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- Experienced in using Pig, Hive, Scoop, Oozie, ZooKeeper, HBase, MapR and Cloudera Manager.
- Imported and exported data using Sqoop from RDBMS to HDFS and vice versa.
- Application development using Java, RDBMS, and Linux shell scripting.
- Extended Hive and Pig core functionality by writing custom UDF, UDAF, UDTFs.
- Experienced in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java
- Familiar with Java virtual machine (JVM) and multi - threaded processing
- Worked on NoSQL databases including HBase, Cassandra and MongoDB
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Experienced in designing, developing and implementing connectivity products that allow efficient exchange of data between the core database engine and the Hadoop ecosystem
- Experienced in Data warehousing and using ETL tools like Informatica
- Experienced in SQL tuning techniques
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSP, JSF, Struts, Spring, Hibernate, JDBC, EJB.
- Possess excellent technical skills, consistently outperformed schedules and acquired interpersonal and communication skills
- In-depth understanding of Data Structure and Algorithms
- Expert in developing ETL functionalities using Hadoop technologies
- Experience with Software Development Processes & Models: Agile, Waterfall and Scrum Model .
- Reliable, hardworking, dedicated team player and customer oriented problem solver who works well under pressure and with minimum supervision.
TECHNICAL SKILLS:
Hadoop/BigData: HDFS, Mapreduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Cassandra, Kafka, Oozie, Zookeeper, Spark, Azkaban
Java & J2EE: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
IDE s: Eclipse
Frameworks: MVC, Struts, Hibernate, Spring
Programming languages: C, C++, Java, Python, Ant scripts, Linux shell scripts
Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server
Web Servers: Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL
ETL Tools: Informatica
Versioning Tools: Clearcase, SVN, GIT, BITBUCKET
PROFESSIONAL EXPERIENCE:
Confidential, Sunnyvale, CA
Data Engineer
Responsibilities:
- Contributing to the data warehouse design and data preparation by implementing a solid, robust, extensible design that supports key business flows.
- Responsible for migration of traditional ETL system to scalable distributed data solutions using Hadoop technologies with GDPR compliance.
- Designing Datalake system to bring data under single umbrella.
- Performs impact analysis of new change request and gap analysis of as-Is system and To-Be system.
- Developed generic custom Hive UDF.
- Defined and scheduled Hadoop jobs in Azkaban.
- Migrated pig scripts to spark or hive code to improve the job performance.
- Responsible for end-to-end White Box testing of Hadoop sessions written using Hive queries.
- Designing, integrating and documenting technical components for seamless data extraction and analysis on big data platform.
- Designs and develops application using JAVA, SQL and various scripting language to performs data warehousing activities like data mining, report generation that assists in business decision.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java, Scala, SQL, Yarn, Azkaban, Kafka, Presto, Teradata
Confidential, San Jose, CA
Data Engineer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Migrated business functionality from Informatica to Bigdata Hadoop ecosystem and reduced metrics execution time from 24 hours to 6 hours.
- Developed Hive queries along with generic UDF and UDAF to process large data set.
- Automated data import and export process to relational database using SQOOP.
- Developed generic lookup by using caching mechanism to eliminate small table joins, leading to 20% of reduction in query execution time.
- Developed ETL functionalities using Hadoop technologies.
- Developed automated data pipelines and monitoring system.
- Developed data quality scripts to perform anomaly checks in the data.
- Responsible for setting up and maintaining Development, QA and Production environment.
- Developed automated tool to create Hive tables of various file formats such as AVRO, Parquet, ORC, and Text by reading Oracle source definition.
- Developed mechanism to monitor health of the services and respond accordingly to any warning or failure conditions.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, MapR, Sqoop, Oozie, ZooKeeper, Informatica, PL/SQL, MySQL, Windows, Tidal
Confidential, Milpitas, CA
Java Developer
Responsibilities:
- Implemented automated mechanism to save log information to external system.
- Loaded data into hadoop cluster from dynamically generated files using Flume.
- Developed SQOOP scripts to import data from RDBMS to HDFS
- Exported the detail data to the relational databases using Sqoop for visualization and to generate reports
Environment: Hadoop, MapReduce, YARN, Sqoop, HDFS, Hive, Pig, Oozie, Hbase, Java, Oracle, CentOS, Eclipse, Maven, Informatica BDE
Confidential, Ardmore, PA
Java Developer, Intern
Responsibilities:
- Involved in developing user interface using JavaScript and CSS.
- Fixed bugs/defects identified by QA during preproduction testing.
- Wrote JUnit/NG test cases to perform sanity check prior to production build.
- Used Multithreading, synchronization, caching and memory management.
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
Environment: Oracle 10g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6. Java, JDBC, Struts, Maven, SQL language, Eclipse
Confidential
Research Assistant
Responsibilities:
- Responsible for maintaining Hadoop clusters
- Leading a team of Confidential .
- Enhanced and maintained website using DRUPAL CMS, MySQL and PHP, enhanced the frontend features using HTML, JavaScript and CSS.
- Participated actively in issue resolution.
- Created a Mathematical application to assist the department research work- used Maple
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups Involved in loading data from UNIX file system to HDFS.
Environment: DRUPAL, MySQL, PHP, Maple, Cloudera, UNIX, JavaScript, HDFS
Confidential
Programmer Analyst
Responsibilities:
- Analyzing client’s requirements/needs and understanding the Airlines domain so as to achieve the functionality to the fullest without any gaps.
- Involved in analyzing, designing and developing the web-app with the use of the best suitable technology.
- Unit testing of components and debugging to fix bugs/defects if any.
- Making presentations/walkthrough and providing documentations to customer regarding the application to make customer familiarize on the same.
Environment: Java, J2EE, SOAP UI, HSQL, MySQL, Maven, Clearcase, UNIX, Groovy 2.0, SpringSourceSuite, Grails, Oracle 9g.