Sr. Big Data Engineer Resume
Merrimack, NH
SUMMARY:
- Over 8 years of experience as a Sr. Big Data Engineer with skills in analysis, design, development, testing and deploying various software applications.
- In depth knowledge of software development lifecycle (SDLC), Waterfall, Iterative and Incremental and Agile/Scrum methodologies.
- Excellent Knowledge and working experience on big data tools like Hadoop, Azure, Data Lake, AWS Redshift.
- Good Working with Big Data Ecosystem components like HBase, Sqoop, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Good Writing Hive join query to fetch info from multiple tables, writing multiple jobs to collect output from Hive
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio and Teradata.
- Excellent experience in system Analysis, ER Dimensional Modeling, Data Design and implementing RDBMS specific features.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
- Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT and Dimension tables.
- Extensive experience in Technical consulting and end - to-end delivery with data modeling, data governance.
- Implemented a distributing messaging queue to integrate with Cassandra using Apache Kafka.
- Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Good Working on Apache Nifi as ETL tool for batch processing and real time processing.
- Extensive experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent Performing in data validation and transformation using Python and Hadoop streaming.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Responsible for providing primary leadership in designing, leading, and directing all aspects of UAT Testing for the Oracle data warehouse
- Involved in the process of data acquisition, data pre-processing and data exploration of tele-communication project in Scala.
TECHNICAL SKILLS:
Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Apache Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j.
Data Modeling Tools: Erwin R9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16 and Oracle 12c.
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access
BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports.
Project Execution Methodologies: Agile, Ralph Kimball and Bill-Inmon’s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)
Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server
IDEs: Eclipse, RAD, WASD, Net Beans.
Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.
Version Tool: VSS, SVN, CVS.
PROFESSIONAL EXPERIENCE:
Confidential, Merrimack, NH
Sr. Big Data Engineer
Responsibilities:
- Worked with Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of big data and conventional RDBMS.
- Used the Agile Scrum methodology to build the different phases of Software development lifecycle.
- Build Data Warehouse in Azure platform using Azure data bricks and data factory.
- Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX and a variety of portfolios.
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Developed the Conceptual Data Models, Logical Data models and transformed them to creating schema using Erwin.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Worked on Normalization and De-Normalization techniques for both OLTP and OLAP systems.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
- Used SQL Profiler for monitoring and troubleshooting performance issues in T-SQL code and stored procedures.
- Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.
- Created several types of data visualizations using Python and Tableau.
- Extracted data from Oracle and upload to Teradata tables using Teradata utilities FASTLOAD & Multiload.
- Involved in Design and Development of technical specifications using Hadoop technologies.
- Developed PL/SQL programs, stored procedures for data loading and data validations.
- Worked on Cube structure optimization for MDM query performance in Analysis Services (SSAS).
- Wrote Python scripts to parse XML documents and load the data in database.
- Worked on normalization techniques. Normalized the data into 3rd Normal Form (3NF).
- Designed Star and Snowflake Data Models for Enterprise Data Warehouse using Erwin.
- Performed data profiling and transformation on the raw data using Pig, and Python.
- Developed standardized Python/Scala code and practices for ingesting data from numerous database system
- Managed real-time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
- Designed and deployed scalable, highly available, and fault tolerant systems on Azure.
- Used Microsoft Excel tools like pivot tables, graphs, charts, solver to perform quantitative analysis.
- Used windows Azure SQL reporting services to create reports with tables, charts and maps.
- Initiated Use Case Analysis using UML, which provided the framework for potential use case deliverables and their inter-relationships.
- Created external tables pointing to HBase to access table with huge number of columns.
- Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Created various types of reports such as drill down & drill through reports, Matrix reports, Sub reports and Charts using SQL Server Reporting Services (SSRS).
- Developed various Qlikview Data Models by extracting and using the data from various sources files Excel, Flat Files and Big data.
Environment: Agile, Azure, Hadoop 3.0, Pig 0.17, Zookeeper 3.4, Sqoop 1.4, MapReduce, Scala 2.12, Spark 2.4, HBase 1.4, Erwin 9.7, Hive 2.3, Python 3.7, NoSQL, PL/SQL, MDM
Confidential, Bentonville, AR
Data Engineer
Responsibilities:
- As a Sr. Data Engineer, Responsible for building scalable distributed data solutions using Hadoop.
- Involved in all phases of SDLC and participated in daily scrum meetings with cross teams.
- Extensively used Agile methodology as the Organization Standard to implement the data Models.
- Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift.
- Worked on migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
- Designed Data Flow Diagrams, E/R Diagrams and enforced all referential integrity constraints.
- Created SSIS packages for data Importing, Cleansing, and Parsing etc. Extracted, cleaned and validated
- Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
- Developed an automation system to handle data related tasks using SQL, Python, Salesforce, and AWS
- Extensively used ETL load scripts to manipulate, concatenate and clean source data.
- Involved in Normalization and De-Normalization of existing tables for faster query retrieval.
- Collected large amounts of log data using Apache Flume and aggregating using Pig/Hive in HDFS for further analysis.
- Experienced in Python to manipulate data for data loading and extraction and worked with python libraries.
- Gathered and documented requirements of a Qlikview application from users.
- Written Hive Queries for analyzing data in Hive warehouse using Hive Query Language.
- Worked on Mongo DB, HBase databases which differ from classic relational databases
- Involved in several facets of MDM implementations including Data Profiling, Metadata acquisition and data migration.
- Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
- Wrote SQL complex queries for implementing business rules and transformations.
- Created logical data model from the conceptual model and its conversion into the physical database design using Erwin
- Installed and configured Pig for ETL jobs and made sure we had Pig scripts with a regular expression for data cleaning.
- Integrated with business stake holders, gathering requirements and managing the delivery, covering the entire Tableau development life cycle.
- Involved in database development by creating PL/SQL Functions, Procedures, and Packages, Cursors, Error handling and views.
- Managed real-time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Performed data analysis in Hive by creating tables, loading it with data and writing Hive queries which will run internally in a MapReduce way.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
Environment: Agile, AWS, Sqoop 1.4, Tableau, SSIS, SQL, Python 3.7, Apache Flume 1.8, Pig 0.17, Hive 2.3, Mongo DB, MDM, HDFS, Oozie 4.3, HBase 1.4, MapReduce, Teradata
Confidential, Plano, TX
Data Analyst/Data Engineer
Responsibilities:
- Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
- Worked on Amazon Redshift and AWS a solution to load data creates data models
- Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
- Involved in logical and physical designs and transform logical models into physical implementations for Oracle and Teradata.
- Gathered requirements and performed data mapping to understand the key information by creating tables
- Worked on data integration and workflow application on SSIS platform and responsible for testing all new.
- Extensively used cursors, ref cursors, User defined object types, Records in PL/SQL Programming.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Integrated NoSQL database like HBase with MapReduce to move bulk amount of data into HBase.
- Analyzed and gather user requirements and create necessary documentation of their data migration.
- Performed reverse engineering of the dashboard requirements to model the required data marts.
- Evaluated data mining request requirements and help develop the queries for the requests.
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Involved with Transact SQL (T-SQL) Coding, writing queries, cursors, functions, views, & triggers.
- Worked on all data management activities on the project data sources, data migration.
- Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.
Environment: AWS, Amazon Redshift, SQL, Oracle 11g, Teradata r12, SSIS, PL/SQL, Pig 0.15, Sqoop, OLAP, RDBMS, Sqoop, NoSQL, HBase 1.8, MapReduce, OLTP, Hive 2.1, T-SQL
Confidential
Data Analyst/Data Modeler
Responsibilities:
- Extensively involved in Data Modeler/Analyst role to review business requirement and compose source to target data mapping documents.
- Conducted JAD sessions, wrote meeting minutes and also documented the requirements.
- Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
- Normalized the database based on the new model developed to put them into the 3NF of the data warehouse.
- Collected requirements from business users and analyzed based on the requirements.
- Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
- Created Use Case Diagrams using UML to define the functional requirements of the application.
- Involved in the data transfer creating tables from various tables, coding using PL/SQL, Stored Procedures and Packages.
- Designed and developed cubes using SQL Server Analysis Services (SSAS) using Microsoft Visual Studio
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
- Performed Data Analysis for building the reports and building Enterprise Data warehouse (EDW).
- Designed and developed OLTP models and OLAP model for the reporting requirements using ER/Studio.
- Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
- Involved in SQL Server and T-SQL in constructing Tables, Normalization and De-normalization techniques on database Tables.
- Designed the data marts in dimensional data modeling using star and snowflake schemas.
- Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns as part of data analysis responsibilities.
- Used E/R Studio for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
- Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports
- Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
Environment: 3NF, SQL, SSIS, PL/SQL, SSAS, Microsoft Visual Studio 2012, OLTP, OLAP, ER/Studio v15, T-SQL