Sr. Data Engineer Resume
4.00/5 (Submit Your Rating)
Charlotte, NC
SUMMARY
- Around 8+ years of Real Time Hands - on experience IT experience in the Analysis, design, development, testing and Implementation of ETL Informatica developer & Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, SQL SERVER, MSBI, Power BI and Azure Data Engineer.
- Extensively worked on system analysis, design, development, testing and implementation of projects and capable of handling responsibilities independently as well as a proactive team member.
- Excellent noledge of entire Software Development Life Cycle (SDLC) methodologies like Agile, Scrum, Waterfall and Project Management Methodologies.
- Worked on Data Virtualization using Teiid and Spark, RDF graph Data, Solr Search and Fuzzy Algorithm.
- Experience in designing, developing, and deploying Business Intelligence solutions using SSIS, SSRS, SSAS, Power BI.
- Hands on expertise wif AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis)
- Responsible for designing and building a DataLake using Hadoop and its ecosystem components.
- Built a data warehouse on SQL Server & Azure Database.
- Working experience in creating real time data streaming solutions using Apache Spark/Spark Streaming & Kafka and built Spark Data Frames using Python.
- Experience wif ETL workflow Management tools like Apache Airflow and has significant experience in writing the python scripts to implement the workflow.
- Hands on Experience working wif Azure Data Lake Analytics to analyze the structured, non-structured data from various sources.
- Experienced working wif various services in Azure like Data lake to store and analyze the data.
- Experience in developing OLAP Cubes by using SQL Server Analysis Services (SSAS), and defined data sources, data source views, Dimensions, Measures, Hierarchies, Attributes, Calculations using multi - dimensional expression (MDX), Perspectives and Roles.
- Extensive experience in Dynamic SQL, Records, Arrays and Exception handling, data sharing, Data Caching, Data Pipelining. Complex processing using nested Arrays and Collections.
- Building and publishing POWER BI reports utilizing complex calculated fields, table calculations, filters, parameters.
- Designed and developed matrix and tabular reports wif drill down, drill through using SSRS.
- Involved in migration of legacy data by creating various SSIS packages.
- Expert in Data Extraction, Transforming and Loading (ETL) using various tools such as SQL Server Integration Services (SSIS), DTS, Bulk Insert, UNIX shell scripting, SQL, PL/SQL, SQL Loader and BCP
- Expertise in developing Parameterized, Chart, Graph, Linked, Dashboard, Scorecards, Report on SSAS Cube using MDX, Drill-down, Drill-through and Cascading reports using SSRS.
- Experience in Handling Heterogeneous data sources and databases Oracle, Teradata, and csv and XML files using SSIS.
- Hands-on Real time experience in utilizing databases like MongoDB, MySQL and
- Extensively developed Complex mappings using various transformations such as Unconnected/ Connected lookups, Router, Filter, Expression, Aggregator, Joiner, Update Strategy, Union and more.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
- Experience wif ETL tool Informatica in designing and developing complex Mappings, Mapplets, Transformations, Workflows, Worklets, and scheduling the Workflows and sessions.
- Experience in integrating databases like MongoDB, MySQL wif webpages like HTML, PHP and CSS to update, insert, delete and retrieve data wif simple ad-hoc queries.
- Developed heavy load Spark Batch processing on top of Hadoop for massive parallel computing.
- Strong analytical, problem-solving, communication, learning and team skills.
- Experience in using Automation Scheduling tools like Auto-sys and Control-M.
- Developed Spark RDD and Spark DataFrame API for Distributed Data Processing.
PROFESSIONAL EXPERIENCE
Sr. Data Engineer
Confidential, Charlotte, NC
Responsibilities:
- Data migration from Oracle and Netezza to Hive.
- Worked alongside a team of developers in designing, developing, and implementation of BI solutions for various projects.
- Involved in extensive data validation by writing several complex SQL queries data modeling and involved in back-end testing and worked wif data quality issues.
- Designed and deployed multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
- Used AWS utilities such as EMR, S3 and Cloud Watch to run and monitor Hadoop and Spark jobs on AWS.
- Built big data ecommerce solutions to improve business processes using kinesis agent to ingest data, kinesis data streams to stream data in real time, Lambda to process the data, populating the data into DynamoDB and sending notifications by email.
- Improved data warehousing and visualization using serverless services such as Glue on S3 data lakes for data catalog,
- Contributed to the design and implement infrastructure in AWS by launching and configuring EC2, S3, IAM, VPC, Security groups, auto scaling and Load Balancers (ELBs) using Terraform and Ansible.
- Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to DataMarts (SQL Server) and Credit Edge server.
- Maintained AWS Data pipeline as web service to process and move data between Amazon S3, Amazon EMR and Amazon RDS resources.
- Implemented rapid-provisioning and life-cycle management for Linux using Amazon EC2, Python Boto3 and custom Bash scripts.
- Developed Tableau data visualizations using cross map, scatter plots, geographic map, heat maps, combination charts, page trails, and density chart.
- Utilized the analyzed data and injected into Tableau and show the regression, trend, and forecast in the dashboard.
- Performed data development on all data stages - Ingestion, Exploration, Prepare, Train and Consume.
- Collected and aggregated large amounts of log data using Flume and staging data in HDFS for analysis.
- Developed several complex SQL Scripts.
- Implemented PySpark and SparkSQL for testing and processing of data. Worked on migrating Oracle queries and Alteryx workflows into PySpark transformation.
- Developed PySpark applications using Data frames and Spark SQL API for faster processing of data.
- Created RDD, Data frames for the required data and did transformations using Spark RDDs and Spark SQL.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Transform and analyze the data using Pyspark, HIVE, based on ETL mappings
- Developed pyspark programs and created the data frames and worked on transformations.
- Validated Sqoop jobs, Shell Scripts and perform data validation to check if data is loaded correctly wifout any discrepancy.
- Performed migration and testing of static data and transaction data from one core system to another.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the Data science team.
- Coordinating wif the Data science team in creating PySpark jobs.
- Developed model and mapping for fact loading from various dimension tables.
Environment: Spark, Spark Streaming, Apache Kafka, Apache NiFi, Hive, AWS, ETL, Linux, Tableau, PySpark, Teradata, Pig, Sqoop, Hadoop, Oozie, MongoDB, Scala, Python, GIT.
Sr. Data Engineer
Confidential, Pataskala, Ohio
Responsibilities:
- Worked closely wif a project team for gathering the business requirements and interacted wif business analysts to translate business requirements into technical specifications.
- Conducted independent data analysis, gap analysis, write mid-level SQL queries wif interpretation and generated reports on graphs as per specifications
- Extensive experience in Text Analytics, generating Data Visualization using Python and R creating dashboards using tools like Tableau
- Performed Data analysis on data-set of more than 100,000 rows using R-Studio and generated financial report analysis using ggplot2 and lattice packages
- Predicted the net profit of the next quarter using linear regression which halped in the expansion of the company by estimating the budget for the next year.
- Designed and implemented data integration modules for Extract/Transform/Load (ETL) functions.
- Involved in data warehouse design.
- Worked wif internal architects in the development of current and target state data architectures.
- Worked wif project team representatives to ensure dat logical and physical ER/Studio data.
- Involved in defining the source to target data mappings, business rules and data definitions.
- Responsible for defining the key identifiers for each mapping/interface.
- Worked on Data modeling, Data mapping., Data cleansing, Data visualization.
- Under supervision of Sr. Data Scientist performed Data Transformation method for Rescaling.
- Used SQL queries on the internal database and performed CRUD operations to maintain the database for data tracking purposes
- Gatheird requirements and created Use Cases, Use Case Diagrams, Activity Diagrams using MS Visio
- Performed Gap Analysis to check the compatibility of the existing system infrastructure wif the new business requirements.
- Worked wif Hadoop eco system covering HDFS, HBase, YARN and MapReduce
- Worked wif Oozie Workflow Engine in running workflow jobs wif actions dat run Hadoop MapReduce, Hive, Spark jobs
- Performed Data Mapping, Data design (Data Modeling) to integrate data across multiple databases in to EDW
- Responsible for design and development of advanced Python programs to prepare transform and harmonize data sets in preparation for modeling
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing, and analysis of data
- Developed Spark/Scala, Python for regular expression (RegEx) project in Hadoop/Hive environment for big data resources.
- Automated the monthly data validation process to validate the data for nulls and duplicates and created reports and metrics to share it wif business teams
- Used clustering techniques like K-means to identify outliers and to classify unlabeled data
- Implemented classification algorithms such as Logistic Regression, K-NN neighbors and Random Forests to predict Customer churn and Customer interface
- Performed data visualization and Designed dashboards and generated complex reports, including charts, summaries, and graphs to interpret findings to team and stakeholders
Big Data Developer
Confidential, Indianapolis, IN
Responsibilities:
- Handled the release management of Azure Data Lake Analytics (Cosmos) for smoother production deployment.
- Migrated the SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
- Worked wif Hadoop infrastructure to store data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
- Worked in Azure Storage Explorer to manage data - Azure blobs, files and used ARM templates.
- Wrote SQL queries, stored procedures and create PowerBI reports and publish the data to region and service owners.
- Created ingestion jobs in Data Studio and scheduled the jobs using feed and frequency based.
- Worked on Scala code base related to Apache Spark performing the Actions, Transformations on RDDs, Data Frames and Datasets using SparkSQL and Spark Streaming Contexts.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance wif HiveQL queries.
- Developed spark Scala notebook to perform data cleaning and transformation on various tables
- Implemented PySpark using Python and utilizing data frames and temporary table SQL for faster processing of data.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Developed simple and complex Map Reduce programs in Python for Data Analysis on different data formats
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Loaded and transformed large sets of structured, semi-structured, and unstructured data using Hadoop/Big Data
- Performed Data transformations in Hive and used partitions, buckets for performance improvements.
- Worked in automation of administrative tasks like creating maintenance plans, creating jobs, alerts using SQL agent and monitoring and troubleshooting job failures.
Environment: Azure, PowerBI, HDFS, Data Studio, Azure, SQL, Hive, Pig, Scala, Shell scripting, Linux, Agile Methodology.
Python Developer
Confidential
Responsibilities:
- Used Test driven approach for developing the application and implemented the unit tests using Python Unit test framework.
- Migrated successfully the Django database from SQLite to MySQL to PostgreSQL wif complete data integrity.
- Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports like table, matrix, and chart report, web reporting by customizing URL Access.
- Developed views and templates wif Python and Django's view controller and templating language to create a user-friendly website interface.
- Performed API testing by utilizing POSTMAN tool for various request methods such as GET, POST, PUT, and DELETE on each URL to check responses and error handling.
- Created Python and Bash tools to increase efficiency of retail management application system and operations; data conversion scripts, AMQP/Rabbit M
- Q, REST, JSON, and CRUD scripts for API Integration.
- Performed debugging and troubleshooting the web applications using Git as a version-controlling tool to collaborate and coordinate wif the team members.
- Developed and executed various MySQL database queries from python using python -MySQL connector and MySQL database package.
- Designed and maintained databases using Python and developed Python based API (RESTful Web Service) using SQL Alchemy and PostgreSQL.
- Created a web application using Python scripting for data processing, MySQL for the database, and HTML CSS, jQuery and High Charts for data visualization of the served pages.
- Generated property list for every application dynamically using Python modules like math, glob, random, itertools, functools, NumPy, MatPlotLib, seaborn and pandas.
- Added the navigations and paginations and filtering columns and adding and removing the desired columns for view utilizing Python based GUI components.
Data Engineer
Confidential
Responsibilities:
- Developing End to End ETL Data pipeline dat take the data from surge and loading it into the RDBMS using the Spark.
- Working on a live node Hadoop Cluster running Cloudera Distribution Platform (CDH 5.9) and as cloud deployed AWS EMR persistent clusters and configure the cluster.
- Designed SSIS (ETL) Packages to extract data from various heterogeneous data sources such as Access database, Excel spreadsheet and flat files into SQL Server and maintain the data.
- Developing Data load functions, which reads the schema of the input data and load the data into a table.
- Writing Scala Applications which runs on Amazon EMR cluster dat fetches data from the Amazon S3/one lake location and queue it in the Amazon SQS (simple Queue Services) queue.
- Working on the Spark SQL for analyzing and applying the transformations on data frames created from the SQS queue and loads them into Database tables and querying.
- Working on Amazon S3 for persisting the transformed Spark Data Frames in S3 buckets and using Amazon S3 as a Data-lake to the data pipeline running on spark and Map-Reduce.
- Developing logging functions in Scala which stores logs of the pipeline in Amazon S3 buckets.
- Developing Email reconciliation reports for ETL load in Scala using Python Libraries in Spark framework.
- Expertise in building PySpark, and Scala applications for interactive analysis, batch processing, stream processing.
- Proficient Experience in writing Spark scripts in Python, Scala and SQL for development and analysis.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Developed analytical component using Scala, Spark and Spark Stream.
- Configured Spark executor memory to speed-up spark jobs, developed unit tests for PySpark jobs, and perform tuning by analyzing Spark logs and job metrics.