Sr. Data Engineer Resume
SUMMARY
- Overall 7+ years of professional experience in IT and around 5 years of expertise in BIGDATA using HADOOP framework and Analysis, Design, Development, Documentation, Deployment and Integration using SQL and Big Data technologies.
- Experience in implementing various Big Data Analytical, Cloud Data engineering, Data Warehouse/ Data Mart, Data Visualization, Reporting, Data Quality, and Data virtualization solutions.
- Have proven track record of working as Data Engineer on Amazon cloud services, Bigdata/Hadoop Applications and product development.
- Experience in designing the Conceptual, Logical and Physical data modeling using Erwin and E/R Studio Data modeling tools.
- Well versed with Big data on AWS cloud services i.e. EC2, S3, Glue, Anthena, DynamoDB and RedShift
- Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys
- Defined and deployed monitoring, metrics, and logging systems on AWS.
- Experience working on creating and running Docker images with multiple micro - services.
- Good experience in deploying, managing and developing with MongoDB clusters.
- Docker container orchestration using ECS, ALB and lambda.
- Experience with Unix/Linux systems with scripting experience and building data pipelines.
- Responsible for migration of application running on premise onto Azure cloud.
- Experience in detailed system design using use case analysis, functional analysis, modeling program with class & sequence, activity and state diagrams using UML and rational rose.
- Experience on Cloud Databases and Data warehouses ( SQL Azure and Confidential Redshift/RDS)
- Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE and MS SQL Server.
- Played a key role in migrating Cassandra, Hadoop cluster on AWS and defined different read/write strategies
- Strong SQL development skills including writing Stored Procedures, Triggers, Views, and User Defined functions.
- Expert in developing SSIS/DTS Packages to extract, transform and load (ETL) data into data warehouse/ data marts from heterogeneous sources.
- Good understanding of software development methodologies, including Agile (Scrum).
- Expertise in development of various reports, dashboards using various Tableau Visualizations
- Hands on experience with different programming languages such as Java, Python, R, SAS
- Experience in using different Hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, Hbase, Kafka, and Crontab tools.
- Expert in creating HIVE UDFs using java in order to analyze data sets for complex aggregate requirements.
- Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, PySpark, Spark-Sql, and Pig.
- Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive.
- Experience on MS SQL Server, including SSRS, SSIS, and T-SQL.
TECHNICAL SKILLS
Modeling Tools: IBM Infosphere, SQL Power Architect, Oracle Designer, Erwin 9.6/9.5, ER/Studio 9.7, Sybase Power Designer.
Database Tools: Oracle 12c/11g, MS Access, Microsoft SQL Server 2014/2012 Teradata 15/14, Poster SQL, Netezza.
Big Data Technologies: Hadoop, HDFS 2, Hive, Pig, HBase, Sqoop, Flume.
Cloud Platform: AWS, EC2, S3, SQS, Azure.
Operating System: Windows, Dos, Unix, Linux.
BI Tools: SSIS, SSRS, SSAS.
Reporting Tools: Business Objects, Crystal Reports.
Tools: & Software: TOAD, MS Office, BTEQ, Teradata SQL Assistant.
ETL Tools: Pentaho, Informatica Power 9.6, SAP Business Objects XIR3.1/XIR2, Web Intelligence.
Other tools: TOAD, SQL PLUS, SQL LOADER, MS Project, MS Visio and MS Office, Have worked on C++, UNIX, PL/SQL etc.
PROFESSIONAL EXPERIENCE
Confidential
Sr. Data Engineer
Responsibilities:
- Analyzing large amounts of data sets to determine optimal way to aggregate and report on these data sets.
- Designed and Implemented Big Data Analytics architecture, transferring data from Oracle.
- Created DDL's for tables and executed them to create tables in the warehouse for ETL data loads.
- Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
- Design, Implement and maintain Database Schema, Entity relationship diagrams, Data modeling, Tables, Stored procedures, Functions and Triggers, Constraints, clustered and non-clustered indexes, partitioning tables, Schemas, Functions, Views, Rules, Defaults and complex SQL statement for business requirements and enhancing performance.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce and Spark to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Exporting the analyzed and processed data to the RDBMS using Sqoop for visualization and for generation of reports for the BI team.
- Responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
- Worked on designing, building, deploying and maintaining Mongo DB.
- Design SSIS packages to bring data from existing OLTP databases to new data warehouse using various transformations and tasks like Sequence Containers, Script, For loop and Foreach Loop Container, Execute SQL/Package, Send Mail, File System, Conditional Split, Data Conversion, Derived Column, Lookup, Merge Join, Union All, OLE DB source and destination, excel source and destination with multiple data flow tasks.
- Developed ETL framework using Spark and Hive (including daily runs, error handling, and logging) to useful data.
- Coordinated with team and Developed framework to generate Daily adhoc, Report's and Extracts from enterprise data and automated using Oozie.
- Improve the performance of SSIS packages by implementing parallel execution, removing unnecessary sorting and using optimized queries and stored procedures.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Developed pipeline for POC to compare performance/efficiency while running pipeline using the AWS EMR Spark cluster and Cloud Dataflow on GCP.
- Configure and manage data sources, data source views, cubes, dimensions, mining structures, roles, defined hierarchy and usage-based aggregations with SSAS.
- Responsible for maintaining and tuning existing cubes using SSAS and Power BI.
- Worked on cloud deployments using maven, docker and Jenkins.
- Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets.
- Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch
- Used AWS Glue for the data transformation, validate and data cleansing.
- Used python Boto 3 to configure the services AWS glue, EC2, S3
Environment: Erwin 9.6, Oracle 12c, MS-Office, SQL, SQL Loader, PL/SQL, DB2, SharePoint, Talend, MS-Office, Redshift,SQL Server, Hadoop, Spark, AWS.
Confidential
Data Engineer
Responsibilities:
- Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases.
- Implement software enhancements to port legacy software systems to Spark and Hadoop ecosystems on Azure Cloud.
- Used Pig as ETL tool to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
- Involved in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams with all related entities and relationship with each entity based on the rules provided by the business manager using ER Studio.
- Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.
- Designed and Implemented Sharding and Indexing Strategies for MongoDB servers.
- Optimizing pig scripts, user interface analysis, performance tuning and analysis.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Implemented the import and export of data using XML and SSIS.
- Involved in Planning, Defining and Designing data base using ER Studio on business requirement and provided documentation.
- Responsible for migration of application running on premise onto Azure cloud.
- Used SSIS to build automated multi-dimensional cubes.
- Wrote indexing and data distribution strategies optimized for sub-second query response
- Developed a statistical model using artificial neural networks for ranking the students to better assist the admission process.
- Utilize Power BI and SSRS to produce parameter driven, matrix, sub-reports, drill-down, drill-through, dashboards, and integrated report hyperlink functionality to access external applications and make dashboards available in Web clients and mobile apps.
- Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools like ER Studio.
- Performed Data cleaning and Preparation on XML files.
- Robotic Process Automation of data cleaning and preparation in Python.
- Prepared and uploaded SSRS reports. Manages database and SSRS permissions.
- Develop SQL queries using stored procedures, common table expressions (CTEs), temporary table to support SSRS and Power BI reports.
- Built analytical dashboards to track the student records and GPAs across the board.
- Used deep learning frameworks like MXNet, Caffe 2, Tensorflow, Theano, CNTK and Keras to help clients build Deep learning models
- Participated in requirements meetings and data mapping sessions to understand business needs.
Environment: ER Studio, AWS, OLTP, Teradata, Sqoop, MongoDB, MySQL, HDFS, Linux, Shell, scripts, SSIS, SSAS, HBase, Azure, MDM.
Confidential
Data Engineer
Responsibilities:
- Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day
- Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
- Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.
- Developed SSRS reports, SSIS packages to Extract, Transform and Load data from various source systems
- Implementing and Managing ETL solutions and automating operational processes.
- Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
- Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin.
- Created ad hoc queries and reports to support business decisions SQL Server Reporting Services ( SSRS).
- Analyze the existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance.
- Involved in the Forward Engineering of the logical models to generate the physical model using Erwin and generate Data Models using ERwin and subsequent deployment to Enterprise Data Warehouse.
- Wrote various data normalization jobs for new data ingested into Redshift .
- Created various complex SSIS/ETL packages to Extract, Transform and Load data
- Advanced knowledge on Confidential Redshift and MPP database concepts.
- Migrated on premise database structure to Confidential Redshift data warehouse
- Was responsible for ETL and data validation using SQL Server Integration Services.
- Defined and deployed monitoring, metrics, and logging systems on AWS.
- Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer-running adhoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
- Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
- Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.
Environment: SQL Server, Erwin, Oracle, Redshift, Informatica, RDS, NOSQL, Snow Flake Schema, MySQL, PostgreSQL.
Confidential
Data Analyst
Responsibilities:
- Developed stored procedures in MS SQL to fetch the data from different servers using FTP and processed these files to update the tables.
- Responsible for Designing Logical and Physical data modeling for various data sources on Confidential Redshift.
- Performed logical data modeling, physical Data modeling (including reverse engineering) using the Erwin Data modeling tool.
- Created dimensional model for the reporting system by identifying required dimensions and facts using Erwin.
- Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
- Involved in performance tuning, stored procedures, views, triggers, cursors, pivot, unpivot functions, CTE's
- Developed and delivered dynamic reporting solutions using SSRS.
- Extensively used Erwin for Data modeling. Created Staging and Target Models for the Enterprise Data Warehouse.
- Involved in Normalization / De normalization techniques for optimum performance in relational and dimensional database environments.
- Resolved the data type inconsistencies between the source systems and the target system using the Mapping Documents and analyzing the database using SQL queries.
- Worked on ETL testing, and used SSIS tester automated tool for unit and integration testing.
- Designed and created SSIS/ETL framework from ground up.
- Created new Tables, Sequences, Views, Procedure, Cursors and Triggers for database development.
- Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
- Involved in using SAP and transactions done in SAP - SD Module for handling customers of the client and generating the sales reports.
- Creating reports using SQL Reporting Services (SSRS) for customized and ad-hoc Queries
- Coordinated with clients directly to get data from different databases.
- Worked on MS SQL Server, including SSRS, SSIS, and T-SQL.
- Designed and developed schema data models.
- Documented business workflows for stakeholder review.
Environment: ER Studio, SQL Server 2008, SSIS, Oracle, Business Objects XI, Rational Rose, Data stage, MS Visio, SQL, Crystal Reports 9