Sr. Data Engineer Resume Bentonville, AR - Hire IT People

SUMMARY

Over 8 Years of Big Data experience in building highly scalable data analytics applications.
Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka.
Good hands - on experience working with various distributions of Hadoop like Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
Good understanding of Distributed Systems architecture and design principles behind Parallel Computing.
Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
Strong experience troubleshooting failures in spark applications and fine-tuning spark applications and hive queries for better performance.
Worked extensively on Hive for building complex data analytical applications.
Strong experience writing complex map-reduce jobs including development of custom Input Formats and custom Record Readers.
Sound Knowledge in map side join, reduce side join, shuffle & sort, distributed cache, compression techniques, multiple Hadoop Input & output formats.
Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between systems.
Good experience working with AWS Cloud services like S3, EMR, Redshift, Glue, Athena etc.,
Deep understanding of performance tuning, partitioning for building scalable data lakes.
Worked on building real time data workflows using Kafka, Spark streaming and HBase.
Extensive knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
Solid experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
Experience in connecting various Hadoop sources like Hive, Impala, Phoenix to Tableau for reporting.
Good knowledge in the core concepts of programming such as algorithms, data structures, collections.
Developed core modules in large cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
Worked on a fast paced Agile based work environment as well as a traditional waterfall model based development environment.
Passionate about working and gaining more expertise on a variety of cutting-edge Big Data technologies.
Ability to adapt quickly to evolving technology, strong sense of responsibility and accomplishment.
Eager to update my knowledge base constantly and learn new skills according to business needs.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Bentonville, AR

Responsibilities:

Worked on building centralized Data lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena.
Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services.
Built series of Spark Applications and Hive scripts to produce various analytical datasets needed for digital marketing teams.
Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud.
Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.
Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines.
Worked on a full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption.
Worked on automating the Infrastructure setup, launching and termination EMR clusters etc.,
Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce a series of aggregated datasets for downstream analysis.
Build a real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift.
Worked on creating Kafka producers using Kafka Java Producer Api for connecting to external Rest live stream applications and producing messages to Kafka topics.

Environment: AWS S3, EMR, Redshift, Athena, Glue, Spark, Scala, Python, Java, Hive, Kafka

Data Analytics Engineer

Confidential, NY

Responsibilities:

Performed analytics on AWS S3 using Spark, Performed transformations and actions as per business requirements.
Experience in ETL jobs and developing and managing data pipelines.
Experience in tuning hive jobs by performing partitioning, bucketing and optimized joins on hive tables.
Performed data transformations and analytics on a large dataset using Spark.
Worked on building End to end data pipelines using spark and storing final data into Data Warehouse.
Involved in designing Hive schemas, designing and developing normalized and denormalized data models.
Tuning and optimizing the Spark jobs with partitioning/bucketing and memory management of driver and executor.
Developed Hive queries and Sqoop data from RDBMS to data lake staging area.
Optimizing the Hive Queries using the various files formats like PARQUET, JSON, AVRO.
Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
Involved in the planning process of iterations under the Agile Scrum methodology.
Experience with Partitions, bucketing concepts in Hive and designed both using Managed and External tables in Hive.
Working closely with the Data science team and understanding the requirement clearly and creating a hive table on HDFS.
Automated end to end data processing pipelines and scheduled various data workflows.
Scheduling Spark jobs using Oozie workflow in Hadoop Cluster.
Actively participating in the code reviews, meetings and troubleshoot technical issues and solving them.

Environment: Spark, AWS S3, Python, Sqoop, Hive, Kafka, Hadoop, HDFS, Agile, Unix UHC, Hyderabad, India

Hadoop Developer

Confidential, Patskala, Ohio

Responsibilities:

Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase.
Worked in Agile methodology and actively participated in standup calls, PI planning.
Involved in Requirement gathering and prepared the Design documents.
Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading with data, and writing Hive queries.
Developed Hive queries and Sqoop data from RDBMS to Hadoop staging area.
Handled importing of data from various data sources, performed transformations using Hive, and loaded data into data lake.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities.
Processed data stored in data lake and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
Developed dataflows and processes for the Data processing using SQL (SparkSQL& Dataframes).
Designed and developed Map Reduce (hive) programs to analyze & evaluate multiple solutions, considering multiple cost factors across the business as well operational impact.
Involved in the planning process of iterations under the Agile Scrum methodology.
Working on Hive Metastore backup, Partitioning, and bucketing techniques in hive to improve the performance.
Scheduling Spark jobs using Oozie workflow in Hadoop Cluster and Generated detailed design documentation for the source-to-target transformations.

Environment: Spark, Python, Sqoop, Hive, Hadoop, SQL, HBase, MapReduce, HDFS, Oozie, Agile

SQL Developer

Confidential

Responsibilities:

Involved in interacting with the end-user (client) to gather business requirements and modeling the business requirements.
Expertise in writing SQL query by using DDL, DML, Joins, Aggregations, Window functions.
Created database objects like tables, Views, sequences, synonyms, Stored Procedures, functions and Triggers.
Developed Database Triggers to enforce Data integrity and additional Referential Integrity.
Created indexes on columns of tables to increase the performance of the queries.
Developed and modified SQL code to make new enhancements or resolve problems as per the customer requirements.
Analyzing tables and indexes for performance tuning.
Knowledge of Normalization, Data Warehousing, and data transfer, documentation, preventive maintenance, code review, automation, store procedures and triggers.
Coordinated with the front-end design team to provide the necessary stored procedures, packages necessary data and writing adhoc complex queries as requested for users and management.
Good knowledge of indexes and query optimization for large volume data Troubleshooting and problem-solving using python Script and other programming languages.
Perform SQL database sharing and indexing procedures as required to handle heavy traffic loads
Performed database tuning, make complex SQL statements simple and easy to read, and make queries more efficient.
Created Dashboards with interactive views, trends, and drill downs.

Environment: MySQL, SQL server, Power BI, ETL-Tools, PostgreSQL

Python Developer

Confidential

Responsibilities:

Used Test driven approach for developing the application and implemented the unit tests using Python Unit test framework.
Migrated successfully the Django database from SQLite to MySQL to PostgreSQL with complete data integrity.
Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports like table, matrix, and chart report, web reporting by customizing URL Access.
Performed API testing by utilizing POSTMAN tool for various request methods such as GET, POST, PUT, and DELETE on each URL to check responses and error handling.
Created Python and Bash tools to increase efficiency of retail management application system and operations, data conversion scripts, AMQP/Rabbit M.
Q, REST, JSON, and CRUD scripts for API Integration.
Performed debugging and troubleshooting the web applications using Git as a version-controlling tool to collaborate and coordinate with the team members.
Developed and executed various MySQL database queries from python using python -MySQL connector and MySQL database package.
Designed and maintained databases using Python and developed Python based API (RESTful Web Service) using SQLAlchemy and PostgreSQL.
Created a web application using Python scripting for data processing, MySQL for the database, and HTML CSS, jQuery and High Charts for data visualization of the served pages.
Generated property list for every application dynamically using Python modules like math, glob, random, itertools, functools, NumPy, matplotlib, seaborn and pandas.
Added the navigations and paginations and filtering columns and adding and removing the desired columns for view utilizing Python based GUI components.

Environment: python, MySQL, html, PostgreSQL, POSTMAN, jQuery, CSS, AMQP/RabbitMQ, REST, JSON, and CRUD scripts

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Bentonville, AR

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship