We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Boston, MA

SUMMARY

  • Over 10+ years of solid work experience in Data Engineering filed with skills in analysis, design, and development, testing and deploying various software applications.
  • Over 4+ years of working experience in ETL Development.
  • Excellent working experience in Scrum/Agile framework and Waterfall project execution methodologies.
  • Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Hands on working experience on cloud technologies like AWS & MS Azure (MS Azure, Azure Synapse, ADF, Blob Storage, Azure Data Bricks).
  • Good understanding and hands on experience with AWS S3, EC2 and Redshift.
  • Experience in data management and implementation of Big Data applications using Spark and Hadoop frameworks.
  • Excellent working experience & sound knowledge on Informatica and Talend ETL tool. Expertise in reusability, parameterization, workflow design, designing and developing ETL mappings and scripts.
  • Good understand teh ETL specifications and build teh ETL applications like Mappings on daily basis
  • Expertise in UNIX shell scripting
  • Extensive Knowledge of RDBMS concepts, PL/SQL, Stored Procedure and Normal Forms.
  • Strong experience and knowledge of NoSQL databases such as MongoDB, HBase, Azure SQL DB and Cassandra.
  • Experience in migrating teh data using Sqoop from HDFS and Hive to Relational Database System and vice - versa according to client's requirement.
  • Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
  • Expertise in setting up load strategy, dynamically passing teh parameters to mappings and workflows in Informatica & workflows & data flows in SAP Business Objects data services integration tools.
  • Demonstrated ability to lead projects from planning through completion under fast paced and time sensitive environments.
  • Excellent knowledge of planning, estimation, project coordination and leadership in managing large scale projects.

TECHNICAL SKILLS

Big Data & Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0ETL Tools: Informatica 10.1/9.6.1, (PowerCenter/PowerMart) (Designer, Workflow Manager, Workflow Monitor, Server Manager, Power Connect), Talend, IDQ, TOS, TIS.

NoSQL DB: HBase, Azure SQL DB, Cassandra 3.11, Big Table

Reporting Tools: Power BI, Tableau and Crystal Reports 9

Cloud Platforms: AWS, EC2, EC3, Redshift MS Azure, Azure Synapse, ADF, Blob Storage, Azure Data Bricks, GCP, Big query, Google DSK.

Programming Languages: PySpark, Python, SQL, PL/SQL, UNIX shell Scripting, AWK

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Operating Systems: Microsoft Windows 8 and 10, UNIX and Linux.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential - Boston, MA

Sr. Big Data Engineer

Responsibilities:

  • Working as Big Data Engineer to collaborate with other Product Engineering team members to develop, test and support data-related initiatives.
  • Assisted in leading teh plan, building, and running states within teh Enterprise Analytics Team.
  • Lead teh estimation, review teh estimates, identify teh complexities and communicate to all teh stakeholders.
  • Engaged in solving and supporting real business issues with you're Hadoop distributed File systems and Open Source framework knowledge.
  • Responsible for data governance rules and standards to maintain teh consistency of teh business element names in teh different data layers.
  • Built teh data pipelines that will enable faster, better, data-informed decision-making within teh business.
  • Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in pipeline using Azure HDInsight.
  • Performed detailed analysis of business problems and technical environments and use this data in designing teh solution and maintaining data architecture.
  • Migrated on-primes environment on Cloud using MS Azure.
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
  • Performed data flow transformation using teh data flow activity.
  • Performed ongoing monitoring, automation, and refinement of data engineering solutions.
  • Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
  • Developed mapping document to map columns from source to target.
  • Created azure data factory (ADF pipelines) using Azure polybase and Azure blob.
  • Performed ETL using Azure Data Bricks.
  • Wrote UNIX shell scripts to support and automate teh ETL process.
  • Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.
  • Used data integration to manage data with speed and scalability using teh Apache Spark engine in Azure Databricks.
  • Involved in various phases of development analyzed and developed teh system going through Agile Scrum methodology.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Performed Data transformations in Hive and used partitions, buckets for performance improvements.
  • Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.
  • Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
  • Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
  • Extensively involved in writing PL/SQL, stored procedures, functions and packages.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
  • Performed Data scrubbing and processing with Apache NiFi and for workflow automation and coordination.
  • Developed Simple to complex streaming jobs using Python and Hive.
  • Optimized Hive queries to extract teh customer information from HDFS.
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
  • Analyzed data using Hive teh partitioned and bucketed data and compute various metrics for reporting.
  • Built Azure Data Warehouse Table Data sets for Power BI Reports.
  • Working on BI reporting with At Scale OLAP for Big Data.
  • Developed customized classes for serialization and De-serialization in Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop, Spark, Kafka, Azure Data Bricks, ADF, Python, PySpark, HDFS, ETL, Agile & Scrum meetings

Confidential - New York, NY

Big Data Engineer

Responsibilities:

  • As a Big Data Engineer involved in Agile Scrum meetings to help, manage and organize a team of developers with regular code review sessions.
  • Participated in Code Reviews, Enhancement discussion, maintenance of existing pipelines & systems, testing and bug-fix activities on-going basis.
  • Worked closely with teh business analysts to convert teh Business Requirements into Technical Requirements and prepared low and high level documentation.
  • Used AWS Cloud with Infrastructure Provisioning / Configuration.
  • Worked on Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's
  • Developed ETL Processes in AWS Glue to migrate data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and Step Functions.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Involved in daily Scrum meetings to discuss teh development/progress and was active in making scrum meetings more productive.
  • Seamlessly worked on Python to build data pipelines after teh data got loaded from Kafka.
  • Used Kafka Streams to Configure Spark Streaming to get information and tan store it in HDFS.
  • Worked on loading data into Spark RDD's, perform advanced procedures like text analytics using in-memory data computation capabilities of Spark to generate teh Output response.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
  • Created AWS Lambda functions and assigned IAM roles to schedule python scripts using Cloud Watch Triggers to support teh infrastructure needs (SQS, Event Bridge, SNS)
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability.
  • Developed a python script to hit REST API’s and extract data to AWS S3
  • Conducted ETL Data Integration, Cleansing, and Transformations using AWS glue Spark script.
  • Developed Oozie workflow for scheduling and orchestrating teh ETL process.
  • Developed ETL mappings using different transform components.
  • Worked on functions inLambdathat aggregates teh data from incoming events, and tan stored result data in AmazonDynamo DB.
  • Deployed teh project on Amazon EMR with S3 connectivity for setting a backup storage.
  • Designed and Developed ETL jobs to extract data from oracle and load it in data mart in Redshift
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
  • Used JSON schema to define table and column mapping from S3 data to Redshift
  • Connected Redshift to Tableau for creating dynamic dashboard for analytics team
  • Used JIRA to track issues and Change Management
  • Involved in creating Jenkins jobs for CI/CD using GIT, Maven and Bash scripting.
  • Coordinated in all testing phases and worked closely with Performance testing team to create a baseline for teh new application.
  • Assisting application development teams during application design and development for highly complex and critical data projects.

Environment: Spark 3.3, AWS S3, Redshift, Glue, EMR, IAM, EC2, Tableau, Jenkins, Jira, Python, Kafka, Agile.

Confidential - Richmond, VA

Data Engineer

Responsibilities:

  • As a Data Engineer I was responsible to build a data lake as a cloud based solution in AWS using Apache Spark and Hadoop.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
  • Objective of this project is
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Used AWS Cloud and On-Premise environments with Infrastructure Provisioning/ Configuration.
  • Used EMR (Elastic Map Reducing) to perform big data operations in AWS.
  • Used teh Agile Scrum methodology to build teh different phases of Software development life cycle.
  • Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift and designed and implemented Near Real Time ETL and Analytics using Redshift.
  • Designed and customizing data models for Data warehouse supporting data from multiple source on real time.
  • Designed ETL strategies for load balance, exception handling and design processes that can satisfy high data volumes.
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a server less data pipeline which can be written to Glue Catalog and can be queried from Atana.
  • Contributed to teh development of key data integration and advanced analytics solutions leveraging Apache Hadoop.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist teh data into HDFS.
  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Developed teh code for Importing and exporting data into HDFS and Hive using Sqoop
  • Developed a data pipeline using Kafka, HBase, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Developed reconciliation process to make sureelasticsearchindex document count match to source records.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Implemented Sqoop to transform teh data from Oracle to Hadoop and load back in parquet format
  • Developed incremental and complete load Python processes to ingest data intoElasticSearchfrom oracle database
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on teh dashboard
  • Created Hive External tables to stage data and tan move teh data from Staging to main tables
  • Pulled teh data from data lake (HDFS) and massaging teh data with various RDD transformations.
  • Load teh data through HBase into Spark RDD and implement in memory data computation to generate teh output response.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.

Environment: Hadoop 2.7, Spark 2.7, Hive, Sqoop 1.4.6, AWS, HBase, Kafka 2.6.2, Python 3.6, HDFS, Elastic Search & Agile Methodology

Confidential - Arlington, VA

ETL/Informatica Developer

Responsibilities:

  • Designed analytics queries, rather TEMPthan transaction processing.
  • Part of SDLC (Software Development Life Cycle) Requirements, Analysis, Design, Testing, Deployment of Informatica Power Center.
  • Part of Informatica cloud integration with Amazon Redshift.
  • Implemented ETL as a commit-intensive process, having a separate queue with a small number of slots to mitigate this issue.
  • Involved in development of Informatica mappings and also tuned for better performance
  • Worked with various transformations such as Expression, Aggregator, Update Strategy, Look Up, Filter, Router, Joiner and Sequence generator in Informatica for new requirement.
  • Created a queue dedicated to ETL processes.
  • Configured this queue with a small number of slots (5 or fewer) using Amazon Redshift.
  • Incident resolution using ALM system, and production support, handling production failures and fix them within SLA.
  • Created/modifying Informatica Workflows and Mappings (power center and Cloud) also involved in unit testing, internal quality analysis procedures and reviews.
  • Validated and fine-tuned teh ETL logic coded into existing Power Center Mappings, leading to improved performance.
  • Loaded data in bulk ETL using AWS Redshift.
  • Wrote basic UNIX shell scripts and PL/SQL packages and procedures.
  • Involved in performance tuning of teh mappings, session and SQL queries.
  • Creating/modifying Informatica Workflows and Mappings.
  • Used different control flow control like for each loop container, sequence container, execute SQL task, send email task.
  • Created job lets in Talend for teh processes which can be used in most of teh jobs in a project like to Start Job and Commit job.
  • Used UNLOAD to extract large result sets.
  • Used event handing to send e-mail on error events at teh time of transformation
  • Used login feature for analysis purpose
  • Database and Log Backup, Restoration, Backup Strategies, Scheduling Backups.
  • Improved teh performance of teh SQL server queries using query plan, covering index, indexed views and by rebuilding and reorganizing teh indexes.
  • Performed tuning of SQL queries and stored procedures using SQL Profiler and Index Tuning Wizard.
  • Used Amazon Redshift Spectrum for ad hoc ETL processing.
  • Troubleshooting performance issues and fine-tuning queries and stored procedures.
  • Defined Indexes, Views, Constraints and Triggers to implement business rules.
  • Involved in Writing Complex T-SQL Queries.
  • Backing up master & system databases and restoring them.
  • Developed Stored Procedures and Functions to implement necessary business logic for interface and reports.
  • Involved in testing and debugging stored procedures.
  • Wrote teh DAX statements for Cube.

Environment: Informatica Power center 10, PL/SQL, UNIX shell scripting, SQL Server, Visual Studio, SSIS, SSRS, Talend, AWS

Confidential

Informatica Developer

Responsibilities:

  • Actively participated in understanding business requirements, analysis and designing ETL process.
  • Effectively applied all teh business requirements and transforming teh business rules into mappings.
  • Developed Mappings between source systems and Warehouse components.
  • Used Informatica designer to create complex mappings using different transformations to move data to a Data Warehouse.
  • Developed extract logic mappings and configured sessions.
  • Extensively used teh Filter Control & Expression on Source data base for filter out teh invalid data etc.
  • Extensively used ETL to load data from Flat files which involved both fixed width as well as Delimited files and also from teh relational database, which was Oracle.
  • Worked on Debugging, Troubleshooting and documentation of teh Data Warehouse.
  • Created reusable transformations and Mapplets to use in multiple mappings.
  • Handled teh performance tuning of Informatica mappings.
  • Developed Shell Scripts as per requirement.
  • Prepared PL/SQL scripts for data loading into Warehouse and Mart.
  • Fixed SQL errors within teh deadline.
  • Making appropriate changes to schedules when some jobs are delayed.
  • Self-Review of Unit test cases, Integration test cases of all teh assigned modules.

Environment: Informatica Power Center 8.6, Windows XP, Oracle 10g, UNIX/LINUX, SQL Server.

We'd love your feedback!