Sr. Data Engineer Resume New York, NY - Hire IT People

SUMMARY

Over 8+ years of working experience in Data Engineering as well as in Data Analysis/Modeling including planning, analysis, design, development, testing and implementation.
Hands On experience on developing UDF, DATA Frames and SQL Queries in Spark SQL.
Experience in integrating Kafka with Spark streaming for high speed data processing.
Good understanding and exposure to Python programming.
Good experience in conducting Joint Application Development (JAD) sessions with SMEs, Stakeholders
Hands on experience in writing Ad - hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
Strong Experience in migrating data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Have good experience working with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
Good experience in Technical consulting and end-to-end delivery with data modeling, data governance.
Solid Excellent experience in creating cloud based solutions and architecture using Amazon Web services and Microsoft Azure.
Experience in conducting Joint Application Development (JAD) sessions with SMEs, Stakeholders and other project team members for requirements gathering and analysis.
Excellent proficiency in Software Development Life cycle (SDLC), Agile/Scrum and waterfall methodologies.
Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Power Designer.
Experience in writing Build Scripts using Shell Scripts, MAVEN and using CI (Continuous Integration) tools like Jenkins.
Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the HDFS.
Good Understanding of Hadoop architecture and underlying framework including storage management.
Experience in assisting the Deployment team in setting up Hadoop cluster and services.
Experience in Storage, Querying, Processing and analysis of big data's by utilizing the quality components of Hadoop.
Experienced working with JIRA for project management, GIT for source code management, Jenkins for continuous integration and Crucible for code reviews.

TECHNICAL SKILLS

Hadoop/Spark Ecosystem: Hadoop, CDH, MapReduce, Hive/impala, YARN, Kafka, Sqoop, Spark, Spark SQL, PySpark, Kusto, Scala

Databases: SQL Server, PostgreSQL, HBase, Snowflake, Cassandra, MongoDB, Azure Cloud DB, Dynamo DB.

BI Tools: Business Objects XI, Tableau 9.1, Power BI

Query Languages: SQL, PL/SQL, T-SQL

Scripting Languages: Unix, Python, PySpark

Cloud computing: AWS(Amazon Redshift, Athena, Glue, AWS lambda, EMR, S3), MS Azure(Azure blob storage, Azure Data Factory, Azure Synapse)

Data Modeling Tools: Erwin 9.7/9.6, ER Studio v17, and Power Designer.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model

Operating System: Windows and Linux

PROFESSIONAL EXPERIENCE

Confidential

Sr. Data Engineer

Responsibilities:

As a Senior Data Engineer understand current Production state of application and determine the impact of new implementation on existing business processes.
Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
Installed and configured various components of Hadoop ecosystem and maintained their integrity on Cloudera.
Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
Followed agile methodology and Scrum process.
Designed and Configured Azure Cloud relational servers and databases analyzing current and future business requirements.
Perform analyses on data quality and apply business rules in all layers of data extraction transformation and loading process.
Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
To meet specific business requirements wrote UDF’s in Scala and Pyspark.
Design and implement end-to-end data solutions (storage, integration, processing, and visualization) in Azure.
Exported the analyzed data to the relational databases using Sqoop.
Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
Involved in creating Azure Data Factory pipelines.
Built the data pipelines that will enable faster, better, data-informed decision-making within the business.
Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data.
Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Developed business intelligence solutions using SQL server data tools 2019 version and load data to SQL & Azure Cloud database.
Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
For Log analytics and for better query response used Kusto Explorer and created alerts using Kusto query language.
Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.
Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark Databricks cluster.
Developed dashboards and visualizations to help business users analyze data as well as providing data insight to upper management with a focus on Power BI.
Used JIRA to track issues and Change Management
Involved in creating Jenkins jobs for CI/CD using GIT and Maven.
Involved in daily Scrum meetings to discuss the development/progress and was active in making scrum meetings more productive.
Extensively working on Spark using Python for testing and development environments.
Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
Working with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform.

Tools: Hadoop 3.3, Agile/Scrum, Spark 3.1, Azure, Python 3.9, PySpark 3.0, Sql, Nifi, XML, Kafka 3.0, Sql Server DB 2019, CDH, Sqoop 1.4.7, Power BI

Confidential

Sr. Data Engineer

Responsibilities:

As a Data Engineer helped developers automatically build and deploy software into production multiple times a day safely while maintaining compliance in a highly regulated financial industry.
Interacted with business partners, Business Analysts and product owner to understand requirements and build scalable distributed data solutions using Hadoop ecosystem.
Worked with data governance team on data protection regulation projects.
Installed and configured Hadoop Ecosystem components.
Installed and configured Hive and written Hive UDFs.
Provided a streamlined developer experience for delivering small serverless applications to solve business problems The Platform is a Lambda-based platform.
Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB.
Integrated AWS Dynamo DB using AWS lambda to store values of items and backup to Dynamo DB.
Deployed AWS Lambda code from Amazon S3 buckets.
Created a Lambda Deployment function, and configured it to receive events from your S3 bucket.
Designed the data models to be used in data intensive AWS Lambda applications which are aimed to do complex analysis creating analytical reports for end-to-end traceability.
Created AWS Lambda functions using python for deployment management in AWS.
Designed, investigated and implemented public facing websites on Amazon Web Services and integrated it with other applications infrastructure.
Created different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function.
Designed and Developed ETL Processes in AWS Glue to migrate data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
Automated Datadog Dashboards with the stack through Terraform Scripts.
Created external tables with partitions using Hive, AWS Athena and Redshift.
Developed the PySpark code for AWS Glue jobs and for EMR.
Wrote AWS Glue scripts to extract, transform load the data.
Integrated lambda with SQS and Dynamo DB with step functions to iterate through list of messages and updated the status into Dynamo DB table.
Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
Responsible for Designing Logical and Physical data modelling for various data sources on Redshift.
Managed multiple Kubernetes clusters in a production environment.
Participated in daily stand-ups, bi-weekly scrums and PI panning.
Managed individual project priorities, deadlines and deliverables.
Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
Wrote reports using Tableau Desktop to extract data for analysis using filters based on the business use case.
Created Tableau dashboards, datasets, data sources and worksheets.
Wrote and deployed data applications within a streaming architecture to ingest, transform, and store data to be used in Machine Learning Models, developing prototypes quickly.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Tools: Hadoop 3.0, Hive, Dynamo DB, AWS Lambda, Amazon S3, AWS Glue, Redshift, Terraform, Athena, Redshift, AWS Glue, PySpark, SQS, Salesforce, Kubernetes, ETL, Tableau

Confidential - New York, NY

Sr. Data Modeler

Responsibilities:

As a Data Modeler understand the high-level design choices and the defined technical standards for software coding, tools and platforms and ensure adherence to the same.
Used Agile Methodology of Data Warehouse development using Kanbanize.
Analyze business requirements and build logical data models that describe all the data and relationships between the data
Migrated SQL Server Database to Microsoft Azure SQL Database.
Provided suggestion to implement multitasking for existing Hive Architecture in Hadoop also suggested UI customization in Hadoop.
Integrated data from multiples sources including HDFS to Hive Data warehouse.
Involved in Planning, Defining and Designing database using ER/Studio on business requirement and provided documentation.
Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow Flake Schemas.
Translate business and data requirements into logical data models in support of Enterprise Data Models, Operational Data Structures and Analytical systems.
Partner with DBAs to transform logical data models into physical database designs while optimizing the performance and maintainability of the physical database
Work with Data Management to establish governance processes around metadata to ensure an integrated definition of data for enterprise information, and to ensure the accuracy, validity, and reusability of metadata.
Developed Full life cycle of Data Lake, Data Warehouse with Big data technologies like Spark and Hadoop.
Applied all phases of the Software Development Life Cycle, which include requirements definition, analysis, review of design and development, and integration and test of solution into the operational environment
Worked on Azure Power BI Embedded to integrate the reports to application.
Createddataschema and architecture ofdatawarehouse for standardizeddatastorage and access
Used data profiling automation to uncover the characteristics of the data and the relationships between data sources before any data-driven.
Used Azure reporting services to upload and download reports
Developed test scripts for testing sourced data and their validation and transformation when persisting in data stores that are physical representations of the data models
Designed and documented Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object Oriented Design) using UML and Visio.
Completed enhancement for MDM (Masterdatamanagement) and suggested the implementation for hybrid MDM (MasterDataManagement)
Designed processes and jobs to source data from Mainframe sources to HDFS staging zone.
Lead database level tuning and optimization in support of application development teams on an ad-hoc basis.

Tools: ER Studio 9.0, Hive, Hadoop, MDM, MS Azure, HDFS, PL/SQL, Sql Server, Azure SQL DB, UNIX

Confidential - Bellevue, WA

Data Analyst

Responsibilities:

Massively involved as Data Analyst role to review business requirement and compose source to target data mapping documents.
Extensively used Agile Method for daily scrum to discuss the project related information.
Experienced in Python to manipulate data for data loading and extraction and worked with python libraries.
Connected to AWS Redshift through Tableau to extract live data for real time analysis.
Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
Created Logical & Physical Data Model on Relational (OLTP) on Star schema for Fact and Dimension tables using Erwin.
Resolve AML related issues to ensure adoption of standards, guidelines in the organization.
Resolution of day-to-day issues and worked with the users and testing team towards resolution of issues and fraud incident related tickets.
Used Erwin tool to develop a Conceptual Model based on business requirements analysis.
Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
Worked on normalization techniques. Normalized the data into 3rd Normal Form (3NF).
Generated various reports using SQL Server Report Services (SSRS) for business analysts and the management team.
Involved in Data profiling in order to detect and correct inaccurate data and maintain the data quality.
Extensively used Star Schema methodologies in building and designing the logicaldata model into Dimensional Models.
Updated Python scripts to match training data with our database stored in AWS Cloud.
Normalized the database based on the new model developed to put them into the 3NF of thedatawarehouse.
In-depth Data quality analysis and Profiling of the input data, of the business units and new businesses within Charles Schwab, scheduled to come on board the AML applications, to assess issues with data quality
Involved in extensiveDatavalidation by writing several complex SQL queries.
Performeddatacleaning anddatamanipulation activities using NZSQL utility.
Designed thedatamarts using the Ralph Kimball's DimensionalDataMart modeling methodology using Erwin.
Implemented Snowflake schema to ensure no redundancy in the database.
Worked with MDM system steam with respect to technical aspects and generating reports.
Extracted Mega Data from Redshift AWS, and Elastic Search engine using SQL Queries to create reports.
Worked onDatagovernance,dataquality,datalineage establishment processes.
Executed change management processes surrounding new releases of SAS functionality
Worked in importing and cleansing of data from various sources.
Performed Data Cleaning, features scaling, features engineering using packages in python.
Created and developed the stored procedures, triggers to handle complex business rules, history data and audit analysis.
Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required
Used Informatica to extract transform & load source data from transaction systems.
Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
Created various types of data visualizations using Python and Tableau.
Written and executed unit, system, integration and UAT scripts in a data warehouse project.

Tools: Erwin 9.0, Agile, OLTP, OLAP, SSIS, SSRS, AWS, MDM, SAS, SQL, PL/SQL.

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship