Sr. Big Data Engineer Resume Boston, MA - Hire IT People

SUMMARY

Over 10+ years of solid work experience in Data Engineering filed with skills in analysis, design, and development, testing and deploying various software applications.
Over 4+ years of working experience in ETL Development.
Excellent working experience in Scrum/Agile framework and Waterfall project execution methodologies.
Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
Hands on working experience on cloud technologies like AWS & MS Azure (MS Azure, Azure Synapse, ADF, Blob Storage, Azure Data Bricks).
Good understanding and hands on experience with AWS S3, EC2 and Redshift.
Experience in data management and implementation of Big Data applications using Spark and Hadoop frameworks.
Excellent working experience & sound knowledge on Informatica and Talend ETL tool. Expertise in reusability, parameterization, workflow design, designing and developing ETL mappings and scripts.
Good understand teh ETL specifications and build teh ETL applications like Mappings on daily basis
Expertise in UNIX shell scripting
Extensive Knowledge of RDBMS concepts, PL/SQL, Stored Procedure and Normal Forms.
Strong experience and knowledge of NoSQL databases such as MongoDB, HBase, Azure SQL DB and Cassandra.
Experience in migrating teh data using Sqoop from HDFS and Hive to Relational Database System and vice - versa according to client's requirement.
Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
Expertise in setting up load strategy, dynamically passing teh parameters to mappings and workflows in Informatica & workflows & data flows in SAP Business Objects data services integration tools.
Demonstrated ability to lead projects from planning through completion under fast paced and time sensitive environments.
Excellent knowledge of planning, estimation, project coordination and leadership in managing large scale projects.

TECHNICAL SKILLS

Big Data & Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0ETL Tools: Informatica 10.1/9.6.1, (PowerCenter/PowerMart) (Designer, Workflow Manager, Workflow Monitor, Server Manager, Power Connect), Talend, IDQ, TOS, TIS.

NoSQL DB: HBase, Azure SQL DB, Cassandra 3.11, Big Table

Reporting Tools: Power BI, Tableau and Crystal Reports 9

Cloud Platforms: AWS, EC2, EC3, Redshift MS Azure, Azure Synapse, ADF, Blob Storage, Azure Data Bricks, GCP, Big query, Google DSK.

Programming Languages: PySpark, Python, SQL, PL/SQL, UNIX shell Scripting, AWK

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Operating Systems: Microsoft Windows 8 and 10, UNIX and Linux.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential - Boston, MA

Sr. Big Data Engineer

Responsibilities:

Working as Big Data Engineer to collaborate with other Product Engineering team members to develop, test and support data-related initiatives.
Assisted in leading teh plan, building, and running states within teh Enterprise Analytics Team.
Lead teh estimation, review teh estimates, identify teh complexities and communicate to all teh stakeholders.
Engaged in solving and supporting real business issues with you're Hadoop distributed File systems and Open Source framework knowledge.
Responsible for data governance rules and standards to maintain teh consistency of teh business element names in teh different data layers.
Built teh data pipelines that will enable faster, better, data-informed decision-making within teh business.
Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in pipeline using Azure HDInsight.
Performed detailed analysis of business problems and technical environments and use this data in designing teh solution and maintaining data architecture.
Migrated on-primes environment on Cloud using MS Azure.
Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
Performed data flow transformation using teh data flow activity.
Performed ongoing monitoring, automation, and refinement of data engineering solutions.
Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
Developed mapping document to map columns from source to target.
Created azure data factory (ADF pipelines) using Azure polybase and Azure blob.
Performed ETL using Azure Data Bricks.
Wrote UNIX shell scripts to support and automate teh ETL process.
Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.
Used data integration to manage data with speed and scalability using teh Apache Spark engine in Azure Databricks.
Involved in various phases of development analyzed and developed teh system going through Agile Scrum methodology.
Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
Performed Data transformations in Hive and used partitions, buckets for performance improvements.
Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.
Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
Extensively involved in writing PL/SQL, stored procedures, functions and packages.
Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
Performed Data scrubbing and processing with Apache NiFi and for workflow automation and coordination.
Developed Simple to complex streaming jobs using Python and Hive.
Optimized Hive queries to extract teh customer information from HDFS.
Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
Analyzed data using Hive teh partitioned and bucketed data and compute various metrics for reporting.
Built Azure Data Warehouse Table Data sets for Power BI Reports.
Working on BI reporting with At Scale OLAP for Big Data.
Developed customized classes for serialization and De-serialization in Hadoop.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop, Spark, Kafka, Azure Data Bricks, ADF, Python, PySpark, HDFS, ETL, Agile & Scrum meetings

Confidential - New York, NY

Big Data Engineer

Responsibilities:

As a Big Data Engineer involved in Agile Scrum meetings to help, manage and organize a team of developers with regular code review sessions.
Participated in Code Reviews, Enhancement discussion, maintenance of existing pipelines & systems, testing and bug-fix activities on-going basis.
Worked closely with teh business analysts to convert teh Business Requirements into Technical Requirements and prepared low and high level documentation.
Used AWS Cloud with Infrastructure Provisioning / Configuration.
Worked on Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's
Developed ETL Processes in AWS Glue to migrate data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and Step Functions.
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
Involved in daily Scrum meetings to discuss teh development/progress and was active in making scrum meetings more productive.
Seamlessly worked on Python to build data pipelines after teh data got loaded from Kafka.
Used Kafka Streams to Configure Spark Streaming to get information and tan store it in HDFS.
Worked on loading data into Spark RDD's, perform advanced procedures like text analytics using in-memory data computation capabilities of Spark to generate teh Output response.
Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
Created AWS Lambda functions and assigned IAM roles to schedule python scripts using Cloud Watch Triggers to support teh infrastructure needs (SQS, Event Bridge, SNS)
Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
Integrated Kafka-Spark streaming for high efficiency throughput and reliability.
Developed a python script to hit REST API’s and extract data to AWS S3
Conducted ETL Data Integration, Cleansing, and Transformations using AWS glue Spark script.
Developed Oozie workflow for scheduling and orchestrating teh ETL process.
Developed ETL mappings using different transform components.
Worked on functions inLambdathat aggregates teh data from incoming events, and tan stored result data in AmazonDynamo DB.
Deployed teh project on Amazon EMR with S3 connectivity for setting a backup storage.
Designed and Developed ETL jobs to extract data from oracle and load it in data mart in Redshift
Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
Used JSON schema to define table and column mapping from S3 data to Redshift
Connected Redshift to Tableau for creating dynamic dashboard for analytics team
Used JIRA to track issues and Change Management
Involved in creating Jenkins jobs for CI/CD using GIT, Maven and Bash scripting.
Coordinated in all testing phases and worked closely with Performance testing team to create a baseline for teh new application.
Assisting application development teams during application design and development for highly complex and critical data projects.

Environment: Spark 3.3, AWS S3, Redshift, Glue, EMR, IAM, EC2, Tableau, Jenkins, Jira, Python, Kafka, Agile.

Confidential - Richmond, VA

Data Engineer

Responsibilities:

As a Data Engineer I was responsible to build a data lake as a cloud based solution in AWS using Apache Spark and Hadoop.
Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Objective of this project is
Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
Used AWS Cloud and On-Premise environments with Infrastructure Provisioning/ Configuration.
Used EMR (Elastic Map Reducing) to perform big data operations in AWS.
Used teh Agile Scrum methodology to build teh different phases of Software development life cycle.
Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift and designed and implemented Near Real Time ETL and Analytics using Redshift.
Designed and customizing data models for Data warehouse supporting data from multiple source on real time.
Designed ETL strategies for load balance, exception handling and design processes that can satisfy high data volumes.
Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a server less data pipeline which can be written to Glue Catalog and can be queried from Atana.
Contributed to teh development of key data integration and advanced analytics solutions leveraging Apache Hadoop.
Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist teh data into HDFS.
Developed Big Data solutions focused on pattern matching and predictive modeling.
Developed teh code for Importing and exporting data into HDFS and Hive using Sqoop
Developed a data pipeline using Kafka, HBase, Spark and Hive to ingest, transform and analyzing customer behavioral data.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Developed reconciliation process to make sureelasticsearchindex document count match to source records.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
Implemented Sqoop to transform teh data from Oracle to Hadoop and load back in parquet format
Developed incremental and complete load Python processes to ingest data intoElasticSearchfrom oracle database
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on teh dashboard
Created Hive External tables to stage data and tan move teh data from Staging to main tables
Pulled teh data from data lake (HDFS) and massaging teh data with various RDD transformations.
Load teh data through HBase into Spark RDD and implement in memory data computation to generate teh output response.
Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.

Environment: Hadoop 2.7, Spark 2.7, Hive, Sqoop 1.4.6, AWS, HBase, Kafka 2.6.2, Python 3.6, HDFS, Elastic Search & Agile Methodology

Confidential - Arlington, VA

ETL/Informatica Developer

Responsibilities:

Designed analytics queries, rather TEMPthan transaction processing.
Part of SDLC (Software Development Life Cycle) Requirements, Analysis, Design, Testing, Deployment of Informatica Power Center.
Part of Informatica cloud integration with Amazon Redshift.
Implemented ETL as a commit-intensive process, having a separate queue with a small number of slots to mitigate this issue.
Involved in development of Informatica mappings and also tuned for better performance
Worked with various transformations such as Expression, Aggregator, Update Strategy, Look Up, Filter, Router, Joiner and Sequence generator in Informatica for new requirement.
Created a queue dedicated to ETL processes.
Configured this queue with a small number of slots (5 or fewer) using Amazon Redshift.
Incident resolution using ALM system, and production support, handling production failures and fix them within SLA.
Created/modifying Informatica Workflows and Mappings (power center and Cloud) also involved in unit testing, internal quality analysis procedures and reviews.
Validated and fine-tuned teh ETL logic coded into existing Power Center Mappings, leading to improved performance.
Loaded data in bulk ETL using AWS Redshift.
Wrote basic UNIX shell scripts and PL/SQL packages and procedures.
Involved in performance tuning of teh mappings, session and SQL queries.
Creating/modifying Informatica Workflows and Mappings.
Used different control flow control like for each loop container, sequence container, execute SQL task, send email task.
Created job lets in Talend for teh processes which can be used in most of teh jobs in a project like to Start Job and Commit job.
Used UNLOAD to extract large result sets.
Used event handing to send e-mail on error events at teh time of transformation
Used login feature for analysis purpose
Database and Log Backup, Restoration, Backup Strategies, Scheduling Backups.
Improved teh performance of teh SQL server queries using query plan, covering index, indexed views and by rebuilding and reorganizing teh indexes.
Performed tuning of SQL queries and stored procedures using SQL Profiler and Index Tuning Wizard.
Used Amazon Redshift Spectrum for ad hoc ETL processing.
Troubleshooting performance issues and fine-tuning queries and stored procedures.
Defined Indexes, Views, Constraints and Triggers to implement business rules.
Involved in Writing Complex T-SQL Queries.
Backing up master & system databases and restoring them.
Developed Stored Procedures and Functions to implement necessary business logic for interface and reports.
Involved in testing and debugging stored procedures.
Wrote teh DAX statements for Cube.

Environment: Informatica Power center 10, PL/SQL, UNIX shell scripting, SQL Server, Visual Studio, SSIS, SSRS, Talend, AWS

Confidential

Informatica Developer

Responsibilities:

Actively participated in understanding business requirements, analysis and designing ETL process.
Effectively applied all teh business requirements and transforming teh business rules into mappings.
Developed Mappings between source systems and Warehouse components.
Used Informatica designer to create complex mappings using different transformations to move data to a Data Warehouse.
Developed extract logic mappings and configured sessions.
Extensively used teh Filter Control & Expression on Source data base for filter out teh invalid data etc.
Extensively used ETL to load data from Flat files which involved both fixed width as well as Delimited files and also from teh relational database, which was Oracle.
Worked on Debugging, Troubleshooting and documentation of teh Data Warehouse.
Created reusable transformations and Mapplets to use in multiple mappings.
Handled teh performance tuning of Informatica mappings.
Developed Shell Scripts as per requirement.
Prepared PL/SQL scripts for data loading into Warehouse and Mart.
Fixed SQL errors within teh deadline.
Making appropriate changes to schedules when some jobs are delayed.
Self-Review of Unit test cases, Integration test cases of all teh assigned modules.

Environment: Informatica Power Center 8.6, Windows XP, Oracle 10g, UNIX/LINUX, SQL Server.

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Boston, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship