Data Engineer Resume
PROFESSIONAL SUMMARY:
- Over 6+ years of overall IT experience in Application Development in Python and spark and its eco system technologies.
- 2+ years of exclusive experience in Spark 2.1,2.2, Hadoop 2.7 and its components like HDFS, Map Reduce, Hive 1.1, and HBase.
- Good knowledge of Hadoop Development and various components such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map - Reduce concepts.
- Experience in analyzing data using HiveQL, HBase and custom Map Reduce programs in Java 1.8.
- Involved in project planning, setting up standards for implementation and design of Hadoop based applications.
- Written MapReduce programs with custom logics based on the requirement for spark-based applications in AWS EMR for data processing.
- Installed Spark and performed analyzing HDFS data and then, by caching a dataset in memory to perform a large variety of complex computations interactively.
- Experience in importing and exporting the different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.
- Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and actions for the faster analysis of the data.
- Experience in using Oracle, SQL Server, MySQL databases along with writing and tuning SQL queries.
- Used Different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL.
- Converted Various Hive queries into Spark transformations and Actions that are required.
- In-depth understanding of Data Structures and Algorithms and Optimization. Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
- Have a good experience working in Agile development environment including Scrum methodology using JIRA.
- Fast learner with good interpersonal skills, having strong analytical and communication skills and interested in problem solving and troubleshooting. Self-motivated, excellent team player statement to well-documented design. Excellent business knowledge, ability to work under pressure and good interpersonal skills.
TECHNICAL SKILLS:
Languages: Python, Java, Shell Scripting
Big Data Technologies: Spark, Hadoop, MapReduce, Hive, HBase
Databases: PostgresSQL, Oracle, MySQL, SQL Server
Application Tools: Pycharm, Eclipse, IntelliJ
AWS Services: EMR, ECS, IAM, S3, RDS, Route53, EC2, Cloud Watch, Lambda
Operating systems: Linux, mac, Windows
PROFESSIONAL EXPERIENCE:
Data Engineer
Confidential
Responsibilities:
- Participated in Gathering requirements, analyze requirements and design technical documents for business requirements.
- Developing pipelines and executing pyspark jobs on AWS EMR for data validation between snowflake & AWS S3.
- Delivering 100+ large Auto Finance datasets from Snowflake to Data Lake for Global Finance consumer. Wrote pyspark jobs for monitoring delivery of all these datasets everyday. Created incidents in pagerduty for delivery failures.
- Optimizing Snow SQLs for loading the data from source to destination tables.
- Scheduling and Monitoring ETL jobs using apache airflow. Enhanced onboarding dags process by dockerizing and deploying the service using AWS services (ECS, EC2, ALB, Route53).
- Integrated the flask application with JWT and SSO for authentication and authorization.
- Updated the application as per data compliance policies in the respective line of business using python.
- Wrote pytest cases, documentation for new onboarding modules. Provided and support for teams using airflow.
- Involved in writing MapReduce programs.
- Worked autonomously within a team of Data Analysts, to analyze, review, update, edit, clean, translate, and ensure accuracy of customer data.
- Involved in Data Pipeline and ETL process and Testing. Involved different phases in big data projects like data acquiring, data processing, data monitoring and data serving using dash boards
- Involved in writing test cases for application using unittest, pytest library and support application through debugging, bug fixing and production support.
- Monitored System health logs and responds accordingly to any warning or failure conditions.
- Performing end-to-end system testing of product writing test cases and fixed the issues found.
Environment: Python, Spark, AWS ECS, AWS S3, AWS EMR, Snowflake, AWS Lambda, AWS RDS, AWS EC2, Pycharm, Jenkins, docker, Apache Airflow, Postgres SQL.
Hadoop Developer
Confidential
Responsibilities:
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, HIVE, HBASE, JSON.
- Improving the performance and optimization of existing algorithms in Hadoop using JAVA.
- Reading data from File system into the Apache Spark RDD.
- Good understanding in processing of real-time data using Apache Spark.
- Inject data using Sqoop from various RDBMS like Oracle, MYSQL, and Microsoft SQL Server into Hadoop HDFS.
- Mainly worked on Hive queries to categorize data of different claims. Reviewing peer table creation in Hive, data loading and queries.
- Written customized Hive UDFs in Java where the functionality is too complex.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Responsible to manage the test data coming from different sources.
- Involved in loading data from UNIX file system to HDFS. Experience in managing and reviewing Hadoop Log files.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries.
- Created Hive tables to store the processed results in a tabular format. Created partitions, bucketing across state in Hive to handle structured data.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: HDFS, HIVE, HBASE, JDBC, Spark, MapReduce, Oracle 11g, SQL Server, Spark-SQL, Spark YARN, UNIX, JAVA7, MYSQL, LINUX, Shell scripting, Agile Methodologies, IntelliJ, Eclipse.
Software Engineer
Confidential
Responsibilities:
- Deployed J2EE application on Windows and Linux using JBoss application server.
- Used GIT for version control and Quality Control for defect tracking.
- Developed J2EE based web application by implementing Spring MVC framework.
- Involved in requirement meetings with both project managers and technical mangers.
- Involved in front end development using JSP, HTML & CSS.
- Implemented JavaScript for client-side validations.
- Designed and developed Validation framework for field validations in Struts framework.
- Designed and developed new J2EE Components like Value Objects, Servlets and bean components like Session Bean to in corporate business level validations.
- Developed application code using Core Java and J2EE (Servlets, XML) in Eclipse tool.
- Deployed the application on Oracle Web logic server.
- Implemented Multithreading concepts in java classes to avoid deadlocking.
- Used MySQL database to store data and execute SQL queries on the backend.
- Prepared and Maintained test environment.
- Tested the application before going live to production.
- Used SVN and GIT as source repository and controlling versions of the code. Documented and communicated test result to the team lead on daily basis.
Environment: SVN, GIT, J2EE, XML, Servlets, SQL, MySQL, Struts, JavaScript, Spring MVC framework, LINUX, JBoss, JSP, HTML, CSS, Oracle.