Sr. Data Engineer Resume
Eden Prairie, MN
SUMMARY
- Over 7 years of experience in IT with legacy data base systems and data model implementation and maintenance.
- Experience in design the Data Pipeline which can capture data from streaming web data as well as RDBMS source data
- Good experience in Data Modeling and Data Analysis as a Proficient in gathering business requirements and handling requirements management.
- Experience in developing Conceptual, logical models and physical database design for OLTP and OLAP systems using ER/studio, ERwin and Sybase Power Designer.
- Good knowledge of Software Development Life Cycle (SDLC) methodologies like Waterfall and Agile.
- Effectively using open source languages Python and Scala
- Experience in building Azure stack( ADW, Data Factory, CosmosDB, Event Hub, Stream Analytics)
- Experience in data management and implementation of Big Data applications using Spark and Hadoop frameworks.
- Highly proficient with AWS resources such as EMR, S3 buckets, EC2 instances, RDS and others.
- Expert in writing SQL queries and optimizing the queries in Oracle, SQL Server.
- Responsible for Data cleansing & transformation of time series data with PySpark
- Working knowledge in Importing and exporting data into HDFS and Hive using Sqoop
- Excellent Experience with Big Data technologies (e.g., Hadoop, BIgQuery, Hive, Hbase etc.)
- Solid in - depth understanding of Information security concepts, Data modeling and RDBMS concepts.
- Experience in data analysis using Hive and Pig Latin.
- Expertise in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
- Extensive experience in development of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
- Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
- Excellent experience in troubleshooting test scripts, SQL queries, ETL jobs, data warehouse/data mart/data store models.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Experience in writing and executing unit, system, integration and UAT scripts in a data warehouse projects.
- Involve in analysis, development and migration of Stored Procedures, Triggers, Views and other related database objects
- Strong experience in using Excel and MS Access to dump the data and analyze based on business needs.
- Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
TECHNICAL SKILLS
Big Data tools: Hadoop 3.0, HDFS, Hive 2.3, Pig 0.17, Scala, HBase 1.2, Sqoop 1.4, Kafka1.1, Oozie4.3
Data Modeling Tools: Erwin 9.8/9.7, Sybase Power Designer, Oracle Designer, ER/Studio V17
Database Tools: Oracle 12c/11g, Teradata 15/14 and
ETL Tools: SSIS, Informatica v10.
Reporting tools: (SSRS), Tableau, Crystal Reports
Project Execution Methodologies: JAD, Agile, SDLC.
Programming Languages: SQL, T-SQL, Python, UNIX shells scripting, PL/SQL.
Operating Systems: Microsoft Windows 10/8/7, UNIX, and Linux
PROFESSIONAL EXPERIENCE
Confidential - Eden Prairie, MN
Sr. Data Engineer
Responsibilities:
- Worked on importing and exporting data from Oracle and Teradata into HDFS and HIVE using Sqoop.
- Extensively Created data pipelines in cloud using Azure Data Factory.
- Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the project life cycle.
- Migrated SQL database to Azure Data Lake, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access.
- Resolved the revolving issues by conducting and participating in JAD sessions with the users, modelers and developers.
- Built the model on Azure platform using Python and Spark for the model development and Dash by plotly for visualizations.
- Created dimensional model for the reporting system by identifying required dimensions and facts using Erwin.
- Exported analyzed data to HDFS using Sqoop for generating reports.
- Developed solutions using Spark SQL, Spark streaming, Kafka to process web feeds and server logs.
- Worked with PySpark framework
- Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System.
- Used Sqoop to channel data from different sources of HDFS and RDBMS.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Involved in the design of Data-warehouse using Star-Schema methodology and converted data from various sources to Sql tables.
- Implemented of Azure cloud solution using HDInsight, Event Hubs, CosmosDB, cognitive services and KeyVault.
- Implemented best income logic using Pig scripts and Joins to transform data to AutoZone custom formats.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Involved in Normalization / De-normalization, Normal Form and database design methodology.
- Maintained PL/SQL objects like packages, triggers, procedures etc.
- Used sing Sqoop and Azure Data Factory from HDFS to Relational Database Systems and vice-versa.
- Involved in several facets of MDM implementations including Data Profiling, Metadata acquisition and data migration.
- Work with ETL members to implement data acquisition logic and resolve data defects.
Environment: Erwin9.8, Oracle12c, Teradata15, HDFS, HIVE2.3 Sqoop1.4, Azure Data Factory, Spark SQL, Kafka1.1 Oozie4.3, Hadoop3.0 CosmosDB, Pig0.17, Map Reduce, HBase1.2, PL/SQL, MDM, ETL, Python.
Confidential - Franklin Lakes, NJ
Data Engineer
Responsibilities:
- Translated business concepts into XML vocabularies by designing XML Schemas with UML
- Involved in Agile methodologies, daily scrum meetings, planning's.
- AWS provided a secure global infrastructure, plus a range of features that use to secure the data in the cloud
- Communicated and presented default customers profiles along with reports using Python
- Worked extensively on ER Studio in several projects in both OLAP and OLTP applications.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Extracted the data from Oracle into HDFS using Sqoop.
- Created customized report using OLAP Tools such as Crystal Report for business use.
- Involved in managing and reviewing Hadoop log files.
- Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Facilitated in developing testing procedures, test cases and User Acceptance Testing (UAT)
- Wrote MapReduce jobs to discover trends in data usage by users.
- Written Hive queries for data analysis to meet the business requirements.
- Exported data using Sqoop from HDFS to Teradata on regular basis.
- Worked on Normalization and De-normalization concepts and design methodologies.
- Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
- Worked and transformed structured, semi structured and unstructured data and loaded into Hbase.
- Participated in the creation of Business Objects Universes using complex and advanced database features.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
- Reviewed Complex ETL Mappings and Sessions based on business user requirements.
- Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports
Environment: ER Studio, XML, Python, OLAP, OLTP, AWS, HDFS, Sqoop, Hadoop, MapReduce, Teradata, T - SQL, HBase, Hive, ETL.
Confidential - Rockville, MD
Data Analyst/Data Engineer
Responsibilities:
- Used Python to preprocess data and attempt to find insights.
- Exactly connected to AWS Redshift through Tableau to extract live data for real time analysis.
- Worked at conceptual/logical/physical data model level using Erwin according to requirements.
- Created customized report using OLAP Tools such as Crystal Report for business use.
- Defined the Primary Keys PKs and Foreign Keys FKs for the Entities, created dimensions model star and snowflake schemas using Kimball methodology
- Worked on debugging and identifying the unexpected real-time issues in the production server SSIS packages.
- Involved in loading data from UNIX file system to HDFS.
- Created multiple automated reports and dashboards sourced from data warehouse using Tableau/SSRS
- Designed SSIS Packages to transfer data from flat files to SQL database tables.
- Developed the batch program in PL/SQL for the OLTP processing and used Unix Shell scripts to run in corn tab.
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files, with high volume data
- Worked on Informatica Utilities Source Analyzer, warehouse Designer, Mapping Designer, Mapplet Designer and Transformation Developer.
- Written complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR.
- Created various PL/SQL stored procedures for dropping and recreating indexes on target tables.
- Created data masking mappings to mask the sensitive data between production and test environment.
- Collected, analyze and interpret complex data for reporting and/or performance trend analysis
- Wrote T-SQL procedures/Views to facilitate data access for reporting needs
Environment: Erwin, Redshift, Python, OLAP, UNIX, HDFS, PL/SQL, OLTP, SQL, SSRS, SSIS, Tableau, Teradata, Oracle, Informatica, T-SQL.
Confidential - Sparks, NV
Data Analyst/Data Modeler
Responsibilities:
- Performed data analysis and profiling of source data to better understand the sources.
- Created ER diagrams using Power Designer modeling tool for the relational and dimensional data modeling.
- Involved in writing, testing, and implementing triggers, stored procedures and functions at Database level using PL/SQL.
- Used forward engineering approach for designing and creating databases for OLAP model
- Ensured the feasibility of the logical and physical design models.
- Provided investigation and root cause analysis support for operational issues.
- Designed a STAR schema for the detailed data marts and Plan data marts involving confirmed dimensions.
- Created tables, views, sequences, indexes, constraints and generated SQL scripts for implementing physical data model.
- Created ETL Jobs and Custom Transfer Components to move data from Oracle Source Systems to SQL Server usingSSIS.
- Worked on the Snow-flaking the Dimensions to remove redundancy.
- Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
- Created or modified the T-SQL queries as per the business requirements.
- Used Teradata utilities fastload, multiload, tpump to load data.
- Generated Sub-Reports, Cross-tab, Conditional, Drill down reports, Drill through reports and Parameterized reports using SSRS.
- Used data profiling tools and techniques to ensure data quality for data requirements.
- Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP systems.
- Designed and produced client reports using Access, Tableau and SAS.
Environment: Power Designer, OLAP, PL/SQL, ETL, Oracle, T-SQL, Teradata, SSRS, SQL, SAS, Access, Tableau, OLTP.
