Sr. Data Engineer Resume
Bentonville, AR
SUMMARY
- Over 8 Years of Big Data experience in building highly scalable data analytics applications.
- Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka.
- Good hands - on experience working with various distributions of Hadoop like Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
- Good understanding of Distributed Systems architecture and design principles behind Parallel Computing.
- Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
- Strong experience troubleshooting failures in spark applications and fine-tuning spark applications and hive queries for better performance.
- Worked extensively on Hive for building complex data analytical applications.
- Strong experience writing complex map-reduce jobs including development of custom Input Formats and custom Record Readers.
- Sound Knowledge in map side join, reduce side join, shuffle & sort, distributed cache, compression techniques, multiple Hadoop Input & output formats.
- Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between systems.
- Good experience working with AWS Cloud services like S3, EMR, Redshift, Glue, Athena etc.,
- Deep understanding of performance tuning, partitioning for building scalable data lakes.
- Worked on building real time data workflows using Kafka, Spark streaming and HBase.
- Extensive knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
- Solid experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
- Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
- Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
- Experience in connecting various Hadoop sources like Hive, Impala, Phoenix to Tableau for reporting.
- Good knowledge in the core concepts of programming such as algorithms, data structures, collections.
- Developed core modules in large cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
- Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
- Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
- Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
- Worked on a fast paced Agile based work environment as well as a traditional waterfall model based development environment.
- Passionate about working and gaining more expertise on a variety of cutting-edge Big Data technologies.
- Ability to adapt quickly to evolving technology, strong sense of responsibility and accomplishment.
- Eager to update my knowledge base constantly and learn new skills according to business needs.
PROFESSIONAL EXPERIENCE
Sr. Data Engineer
Confidential, Bentonville, AR
Responsibilities:
- Worked on building centralized Data lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena.
- Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services.
- Built series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams.
- Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud.
- Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.
- Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines.
- Worked on a full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption.
- Worked on automating the Infrastructure setup, launching and termination EMR clusters etc.,
- Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce a series of aggregated datasets for downstream analysis.
- Build a real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift.
- Worked on creating Kafka producers using Kafka Java Producer Api for connecting to external Rest live stream applications and producing messages to Kafka topics.
Environment: AWS S3, EMR, Redshift, Athena, Glue, Spark, Scala, Python, Java, Hive, Kafka
Data Analytics Engineer
Confidential, NY
Responsibilities:
- Performed analytics on AWS S3 using Spark, Performed transformations and actions as per business requirements.
- Experience in ETL jobs and developing and managing data pipelines.
- Experience in tuning hive jobs by performing partitioning, bucketing and optimized joins on hive tables.
- Performed data transformations and analytics on a large dataset using Spark.
- Worked on building End to end data pipelines using spark and storing final data into Data Warehouse.
- Involved in designing Hive schemas, designing and developing normalized and denormalized data models.
- Tuning and optimizing the Spark jobs with partitioning/bucketing and memory management of driver and executor.
- Developed Hive queries and Sqoop data from RDBMS to data lake staging area.
- Optimizing the Hive Queries using the various files formats like PARQUET, JSON, AVRO.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Involved in the planning process of iterations under the Agile Scrum methodology.
- Experience with Partitions, bucketing concepts in Hive and designed both using Managed and External tables in Hive.
- Working closely with the Data science team and understanding the requirement clearly and creating a hive table on HDFS.
- Automated end to end data processing pipelines and scheduled various data workflows.
- Scheduling Spark jobs using Oozie workflow in Hadoop Cluster.
- Actively participating in the code reviews, meetings and troubleshoot technical issues and solving them.
Environment: Spark, AWS S3, Python, Sqoop, Hive, Kafka, Hadoop, HDFS, Agile, Unix UHC, Hyderabad, India
Hadoop Developer
Confidential, Patskala, Ohio
Responsibilities:
- Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase.
- Worked in Agile methodology and actively participated in standup calls, PI planning.
- Involved in Requirement gathering and prepared the Design documents.
- Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading with data, and writing Hive queries.
- Developed Hive queries and Sqoop data from RDBMS to Hadoop staging area.
- Handled importing of data from various data sources, performed transformations using Hive, and loaded data into data lake.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities.
- Processed data stored in data lake and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Developed dataflows and processes for the Data processing using SQL (SparkSQL& Dataframes).
- Designed and developed Map Reduce (hive) programs to analyze & evaluate multiple solutions, considering multiple cost factors across the business as well operational impact.
- Involved in the planning process of iterations under the Agile Scrum methodology.
- Working on Hive Metastore backup, Partitioning, and bucketing techniques in hive to improve the performance.
- Scheduling Spark jobs using Oozie workflow in Hadoop Cluster and Generated detailed design documentation for the source-to-target transformations.
Environment: Spark, Python, Sqoop, Hive, Hadoop, SQL, HBase, MapReduce, HDFS, Oozie, Agile
SQL Developer
Confidential
Responsibilities:
- Involved in interacting with the end-user (client) to gather business requirements and modeling the business requirements.
- Expertise in writing SQL query by using DDL, DML, Joins, Aggregations, Window functions.
- Created database objects like tables, Views, sequences, synonyms, Stored Procedures, functions and Triggers.
- Developed Database Triggers to enforce Data integrity and additional Referential Integrity.
- Created indexes on columns of tables to increase the performance of the queries.
- Developed and modified SQL code to make new enhancements or resolve problems as per the customer requirements.
- Analyzing tables and indexes for performance tuning.
- Knowledge of Normalization, Data Warehousing, and data transfer, documentation, preventive maintenance, code review, automation, store procedures and triggers.
- Coordinated with the front-end design team to provide the necessary stored procedures, packages necessary data and writing adhoc complex queries as requested for users and management.
- Good knowledge of indexes and query optimization for large volume data Troubleshooting and problem-solving using python Script and other programming languages.
- Perform SQL database sharing and indexing procedures as required to handle heavy traffic loads
- Performed database tuning, make complex SQL statements simple and easy to read, and make queries more efficient.
- Created Dashboards with interactive views, trends, and drill downs.
Environment: MySQL, SQL server, Power BI, ETL-Tools, PostgreSQL
Python Developer
Confidential
Responsibilities:
- Used Test driven approach for developing the application and implemented the unit tests using Python Unit test framework.
- Migrated successfully the Django database from SQLite to MySQL to PostgreSQL with complete data integrity.
- Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports like table, matrix, and chart report, web reporting by customizing URL Access.
- Performed API testing by utilizing POSTMAN tool for various request methods such as GET, POST, PUT, and DELETE on each URL to check responses and error handling.
- Created Python and Bash tools to increase efficiency of retail management application system and operations, data conversion scripts, AMQP/Rabbit M.
- Q, REST, JSON, and CRUD scripts for API Integration.
- Performed debugging and troubleshooting the web applications using Git as a version-controlling tool to collaborate and coordinate with the team members.
- Developed and executed various MySQL database queries from python using python -MySQL connector and MySQL database package.
- Designed and maintained databases using Python and developed Python based API (RESTful Web Service) using SQLAlchemy and PostgreSQL.
- Created a web application using Python scripting for data processing, MySQL for the database, and HTML CSS, jQuery and High Charts for data visualization of the served pages.
- Generated property list for every application dynamically using Python modules like math, glob, random, itertools, functools, NumPy, matplotlib, seaborn and pandas.
- Added the navigations and paginations and filtering columns and adding and removing the desired columns for view utilizing Python based GUI components.
Environment: python, MySQL, html, PostgreSQL, POSTMAN, jQuery, CSS, AMQP/RabbitMQ, REST, JSON, and CRUD scripts