Sr. Data Engineer Resume
Highland Park, NJ
SUMMARY
- Overall 8 years of experience in multiple technology methodologies like Big Data, Python and Data warehouse.
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Hands on experience working on NoSQL databases including HBase and its integration with Hadoop cluster.
- Developed various shell scripts and python scripts to address various production issues.
- Developed and designed automation framework using Python and Shell scripting.
- Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Expertise in using Version Control systems like GIT.
- Created Azure SQL database, performed monitoring and restoring of Azure SQL database.
- Excellent knowledge and extensively using NOSQL databases (HBase).
- Performed migration of Microsoft SQL server to Azure SQL database.
- Hands on experience on data modeling with Star schema and Snowflake schema.
- Experience in implementing various Big Data Analytical, Cloud Data engineering, and Data Warehouse / Data Mart, Data Visualization, Reporting, Data Quality, and Data virtualization solutions.
- Experience in designing the Conceptual, Logical and Physical data modeling using Erwin and E/R Studio Data modeling tools.
- Experience working with Azure Blob Storage, Azure Data Lake, Azure Data Factory, Azure SQL, Azure SQL Data warehouse, Azure Analytics, Azure HDInsight, Azure Databricks.
- Involved in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas for OLAP and Operational data store (ODS) applications.
- Performed complex data analysis and provided critical reports to support various departments.
- Good knowledge of stored procedures, functions, etc. using SQL and PL/SQL.
- Worked in multiple Hadoop distributions like Hortonworks, Cloudera and MapR.
- Experience in end to end implementation of project like Data Lake.
- Extensively worked in developing ETL program for supporting Data Extraction, transformations and loading using SSIS, Informatica Power Center.
- Expert in data ingestion tools like Sqoop, Flume, and Kafka.
- Experience in data cleansing scripts like Spark, MapReduce and Pig.
- Experience in developing pipelines in spark using Scala and python.
- Experience in implementing optimization techniques in Hive and Spark.
- Experience working with cloud tools like Amazon Web Services and Azure.
- Experience in developing SQL and PL/SQL scripts.
- Development level experience in Microsoft Azure providing data movement and scheduling functionality to cloud - based technologies such as Azure Blob Storage and Azure SQL Database.
- Expert in developing SSIS/DTS Packages to extract, transform and load (ETL) data into data warehouse/ data marts from heterogeneous sources.
- Expertise in development of various reports, dashboards using various Tableau Visualizations
- Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive.
TECHNICAL SKILLS
Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, MS Access, RDBMS, MySQL, DB2, Hive, Microsoft Azure SQL Database
Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
Other Tools: TOAD, BTEQ, MS-Office suite (Word, Excel, Project and Outlook).
Cloud Management: AWS (Amazon Web Services), MS Azure
PROFESSIONAL EXPERIENCE
Confidential - Highland park, NJ
Sr. Data Engineer
Responsibilities:
- Working as Data Engineer to Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
- Developed understanding of key business, product and user questions.
- Followed agile methodology for the entire project.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Worked in Azure environment for development and deployment of Custom Hadoop Applications.
- Used Python to implement Data Processing pipelines.
- Involved in Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms.
- Worked on reading multiple data formats on HDFS using python.
- Implemented Spark using Python (pySpark) and SparkSQL for faster testing and processing of data.
- Created dimensional model based on star schemas and snowflake schemas and designed them using Erwin.
- Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in the pipeline using Azure HDInsight and NiFi.
- Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.
- Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
- Worked for improve the performance for Data stage jobs by tuning them, changing the job designs, changing the SQL queries.
- Built Azure Data Warehouse Table Data sets for Power BI Reports.
- Developed customized classes for serialization and De-serialization in Hadoop.
- Analyzed the SQL scripts and designed the solution to implement using python.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Performed data profiling and transformation on the raw data using Pig, Python.
- Created Hive External tables to stage data and then move the data from Staging to main tables. Designed and built a Data Discovery Platform for a large system integrator using Azure HdInsight components.
- Created dashboard to monitor and managed data flow using NiFi.
- Designed ETL strategies for load balance, exception handling and design processes that can satisfy high data volumes.
- Developed data warehouse model in Snowflake for over 100 datasets.
- Designed and implemented a fully operational production grade large scale data solution on Snowflake Data Warehouse.
- Worked with ETL tools to migrate data from various OLTP and OLAP databases to the data mart.
- Importing and exporting data into HDFS from MySQL and vice versa using Sqoop and manage the data coming from different sources.
- Used Reverse Engineering approach to redefine entities, relationships and attributes in the data model.
- Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.
- Automated data flow between nodes using Apache NiFI.
- Used Azure data factory and data Catalog to ingest and maintain data sources.
- Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark data bricks cluster. Defined the business objectives comprehensively through discussions with business stakeholders, functional analysts and participating in requirement collection sessions.
Environment: ETL, Kafka 1.0.1, Hadoop 3.0, HDFS, Agile, MS Azure, Apache NiFi, Pyspark, Spark 2.3, SQL, Python, ETL, Erwin 9.8, CI, CD, Hive 2.3, NoSQL, HBase 1.2, Pig 0.17, OLTP, HDFS, MySQL, Sqoop 1.4, OLAP
Confidential - Nashville, TN
Data Engineer
Responsibilities:
- Worked as a Data Engineer to review business requirement and compose source to target data mapping documents.
- Worked on Agile methodology in driving the team's success collaboratively in mitigating the Infrastructure security.
- Participated in JAD meetings to gather the requirements and understand the End Users System.
- Architected, Designed and Developed Business applications and Data marts for reporting.
- Worked with SME and conducted JAD sessions documented the requirements using UML and use case diagrams.
- Performed data profiling and transformation on the raw data using Pig, Python.
- Configured Apache Mahout Engine.
- Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
- Architected, Designed and Developed Business applications and Data marts for reporting.
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Developed Big Data solutions focused on pattern matching and predictive modeling
- Worked on Amazon Redshift and AWS a solution to load data, create data models and run BI on it.
- Used forward engineering approach for designing and creating databases for OLAP model.
- Developed various operational Drill-through and Drill-down reports using SSRS.
- Used advanced features of T-SQL in order to design and tune T-SQL to interface with the Database
- Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
- Designed both 3NF data models for OLTP systems and dimensional data models using star and snow flake Schemas.
- Developed framework for converting existing Power Center mappings and to PySpark (Python and Spark) Jobs.
- Generated various presentable reports and documentation using report designer and pinned reports in Erwin.
- Established process on the work flow to create Work flow diagrams using Microsoft Visio.
- Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Created the cubes with Star Schemas using facts and dimensions through SQL Server Analysis Services (SSAS).
- Used SSIS to create reports, customized Reports, on-demand reports, ad-hoc reports and involved in analyzing multi-dimensional reports in SSRS.
- Designed OLTP system environment and maintained documentation of Metadata.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Optimized the Pyspark jobs to run on Kubernetes Cluster for faster data processing.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
- Worked on AWS S3 bucket integration for application and development projects.
- Created HBase tables to store various data formats of PII data coming from different portfolios
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Worked with MDM systems team with respect to technical aspects and generating reports.
- Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift.
- Developed Hive and MapReduce tools to design and manage HDFS data blocks and data distribution methods.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive, and Sqoop
Environment: Apache Mahout 0.14, Hadoop 3.0, HBase 2.2, Flume 1.9, Sqoop, Agile, Hive 2.3, AWS, Amazon Redshift, T-SQL, SSRS, OLAP, OLTP, PL/SQL, MDM, SSIS, Python, Pig, Pyspark
Confidential - Ashburn, VA
Data Analyst/Data Modeler
Responsibilities:
- Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
- Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
- Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
- Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
- Participated in JAD sessions involving the discussion of various reporting needs.
- Reverse Engineering the existing data marts and identified the Data Elements (in the source systems), Dimensions, Facts and Measures required for reports.
- Conduct Design discussions and meetings to come out with the appropriate Data Warehouse at the lowest level of grain for each of the Dimensions involved.
- Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints.
- Involved in designing and developing SQL server objects such as Tables, Views, Indexes (Clustered and Non-Clustered), Stored Procedures and Functions in Transact-SQL.
- Designed a Star schema for sales data involving shared dimensions (Conformed) for other subject areas using Erwin Data Modeler.
- Created and maintained Logical Data Model (LDM) for the project. Includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
- Ensured the feasibility of the logical and physical design models.
- Worked on the Snow-flaking the Dimensions to remove redundancy.
- Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
- Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Involved in Data profiling and performed Data Analysis based on the requirements, which helped in catching many Sourcing Issues upfront.
- Developed Data mapping, Data Governance, Transformation and Cleansing rules for the Data Management involving OLTP, ODS and OLAP.
- Normalized the database based on the new model developed to put them into the 3NF of the data warehouse.
- Created SSIS package for daily email subscriptions using the ODBC driver and Postgress SQL database.
- Constructed complex SQL queries with sub-queries, inline views as per the functional needs in the Business Requirements Document (BRD).
Environment: PL/SQL, Erwin 8.5, MS SQL 2012, OLTP, ODS, OLAP, SSIS, Transact-SQL, Teradata SQL Assistant