We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

0/5 (Submit Your Rating)

Bentonville, AR

SUMMARY

  • Over 8 Years of Big Data experience in building highly scalable data analytics applications.
  • Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka.
  • Good hands - on experience working with various distributions of Hadoop like Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
  • Good understanding of Distributed Systems architecture and design principles behind Parallel Computing.
  • Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
  • Strong experience troubleshooting failures in spark applications and fine-tuning spark applications and hive queries for better performance.
  • Worked extensively on Hive for building complex data analytical applications.
  • Strong experience writing complex map-reduce jobs including development of custom Input Formats and custom Record Readers.
  • Sound Knowledge in map side join, reduce side join, shuffle & sort, distributed cache, compression techniques, multiple Hadoop Input & output formats.
  • Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between systems.
  • Good experience working with AWS Cloud services like S3, EMR, Redshift, Glue, Athena etc.,
  • Deep understanding of performance tuning, partitioning for building scalable data lakes.
  • Worked on building real time data workflows using Kafka, Spark streaming and HBase.
  • Extensive knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
  • Solid experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
  • Experience in connecting various Hadoop sources like Hive, Impala, Phoenix to Tableau for reporting.
  • Good knowledge in the core concepts of programming such as algorithms, data structures, collections.
  • Developed core modules in large cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
  • Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
  • Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
  • Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
  • Worked on a fast paced Agile based work environment as well as a traditional waterfall model based development environment.
  • Passionate about working and gaining more expertise on a variety of cutting-edge Big Data technologies.
  • Ability to adapt quickly to evolving technology, strong sense of responsibility and accomplishment.
  • Eager to update my knowledge base constantly and learn new skills according to business needs.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Bentonville, AR

Responsibilities:

  • Worked on building centralized Data lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena.
  • Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services.
  • Built series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams.
  • Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud.
  • Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.
  • Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines.
  • Worked on a full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption.
  • Worked on automating the Infrastructure setup, launching and termination EMR clusters etc.,
  • Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce a series of aggregated datasets for downstream analysis.
  • Build a real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift.
  • Worked on creating Kafka producers using Kafka Java Producer Api for connecting to external Rest live stream applications and producing messages to Kafka topics.

Environment: AWS S3, EMR, Redshift, Athena, Glue, Spark, Scala, Python, Java, Hive, Kafka

Data Analytics Engineer

Confidential, NY

Responsibilities:

  • Performed analytics on AWS S3 using Spark, Performed transformations and actions as per business requirements.
  • Experience in ETL jobs and developing and managing data pipelines.
  • Experience in tuning hive jobs by performing partitioning, bucketing and optimized joins on hive tables.
  • Performed data transformations and analytics on a large dataset using Spark.
  • Worked on building End to end data pipelines using spark and storing final data into Data Warehouse.
  • Involved in designing Hive schemas, designing and developing normalized and denormalized data models.
  • Tuning and optimizing the Spark jobs with partitioning/bucketing and memory management of driver and executor.
  • Developed Hive queries and Sqoop data from RDBMS to data lake staging area.
  • Optimizing the Hive Queries using the various files formats like PARQUET, JSON, AVRO.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Involved in the planning process of iterations under the Agile Scrum methodology.
  • Experience with Partitions, bucketing concepts in Hive and designed both using Managed and External tables in Hive.
  • Working closely with the Data science team and understanding the requirement clearly and creating a hive table on HDFS.
  • Automated end to end data processing pipelines and scheduled various data workflows.
  • Scheduling Spark jobs using Oozie workflow in Hadoop Cluster.
  • Actively participating in the code reviews, meetings and troubleshoot technical issues and solving them.

Environment: Spark, AWS S3, Python, Sqoop, Hive, Kafka, Hadoop, HDFS, Agile, Unix UHC, Hyderabad, India

Hadoop Developer

Confidential, Patskala, Ohio

Responsibilities:

  • Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase.
  • Worked in Agile methodology and actively participated in standup calls, PI planning.
  • Involved in Requirement gathering and prepared the Design documents.
  • Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading with data, and writing Hive queries.
  • Developed Hive queries and Sqoop data from RDBMS to Hadoop staging area.
  • Handled importing of data from various data sources, performed transformations using Hive, and loaded data into data lake.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities.
  • Processed data stored in data lake and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
  • Developed dataflows and processes for the Data processing using SQL (SparkSQL& Dataframes).
  • Designed and developed Map Reduce (hive) programs to analyze & evaluate multiple solutions, considering multiple cost factors across the business as well operational impact.
  • Involved in the planning process of iterations under the Agile Scrum methodology.
  • Working on Hive Metastore backup, Partitioning, and bucketing techniques in hive to improve the performance.
  • Scheduling Spark jobs using Oozie workflow in Hadoop Cluster and Generated detailed design documentation for the source-to-target transformations.

Environment: Spark, Python, Sqoop, Hive, Hadoop, SQL, HBase, MapReduce, HDFS, Oozie, Agile

SQL Developer

Confidential

Responsibilities:

  • Involved in interacting with the end-user (client) to gather business requirements and modeling the business requirements.
  • Expertise in writing SQL query by using DDL, DML, Joins, Aggregations, Window functions.
  • Created database objects like tables, Views, sequences, synonyms, Stored Procedures, functions and Triggers.
  • Developed Database Triggers to enforce Data integrity and additional Referential Integrity.
  • Created indexes on columns of tables to increase the performance of the queries.
  • Developed and modified SQL code to make new enhancements or resolve problems as per the customer requirements.
  • Analyzing tables and indexes for performance tuning.
  • Knowledge of Normalization, Data Warehousing, and data transfer, documentation, preventive maintenance, code review, automation, store procedures and triggers.
  • Coordinated with the front-end design team to provide the necessary stored procedures, packages necessary data and writing adhoc complex queries as requested for users and management.
  • Good knowledge of indexes and query optimization for large volume data Troubleshooting and problem-solving using python Script and other programming languages.
  • Perform SQL database sharing and indexing procedures as required to handle heavy traffic loads
  • Performed database tuning, make complex SQL statements simple and easy to read, and make queries more efficient.
  • Created Dashboards with interactive views, trends, and drill downs.

Environment: MySQL, SQL server, Power BI, ETL-Tools, PostgreSQL

Python Developer

Confidential

Responsibilities:

  • Used Test driven approach for developing the application and implemented the unit tests using Python Unit test framework.
  • Migrated successfully the Django database from SQLite to MySQL to PostgreSQL with complete data integrity.
  • Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports like table, matrix, and chart report, web reporting by customizing URL Access.
  • Performed API testing by utilizing POSTMAN tool for various request methods such as GET, POST, PUT, and DELETE on each URL to check responses and error handling.
  • Created Python and Bash tools to increase efficiency of retail management application system and operations, data conversion scripts, AMQP/Rabbit M.
  • Q, REST, JSON, and CRUD scripts for API Integration.
  • Performed debugging and troubleshooting the web applications using Git as a version-controlling tool to collaborate and coordinate with the team members.
  • Developed and executed various MySQL database queries from python using python -MySQL connector and MySQL database package.
  • Designed and maintained databases using Python and developed Python based API (RESTful Web Service) using SQLAlchemy and PostgreSQL.
  • Created a web application using Python scripting for data processing, MySQL for the database, and HTML CSS, jQuery and High Charts for data visualization of the served pages.
  • Generated property list for every application dynamically using Python modules like math, glob, random, itertools, functools, NumPy, matplotlib, seaborn and pandas.
  • Added the navigations and paginations and filtering columns and adding and removing the desired columns for view utilizing Python based GUI components.

Environment: python, MySQL, html, PostgreSQL, POSTMAN, jQuery, CSS, AMQP/RabbitMQ, REST, JSON, and CRUD scripts

We'd love your feedback!