Sr. Big Data Engineer Resume Merrimack, NH - Hire IT People

SUMMARY:

Over 8 years of experience as a Sr. Big Data Engineer with skills in analysis, design, development, testing and deploying various software applications.
In depth knowledge of software development lifecycle (SDLC), Waterfall, Iterative and Incremental and Agile/Scrum methodologies.
Excellent Knowledge and working experience on big data tools like Hadoop, Azure, Data Lake, AWS Redshift.
Good Working with Big Data Ecosystem components like HBase, Sqoop, Oozie, Hive and Pig with Cloudera Hadoop distribution.
Good Writing Hive join query to fetch info from multiple tables, writing multiple jobs to collect output from Hive
Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
Extensive experience in using ER modeling tools such as Erwin and ER/Studio and Teradata.
Excellent experience in system Analysis, ER Dimensional Modeling, Data Design and implementing RDBMS specific features.
Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT and Dimension tables.
Extensive experience in Technical consulting and end - to-end delivery with data modeling, data governance.
Implemented a distributing messaging queue to integrate with Cassandra using Apache Kafka.
Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
Good Working on Apache Nifi as ETL tool for batch processing and real time processing.
Extensive experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
Excellent Performing in data validation and transformation using Python and Hadoop streaming.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Responsible for providing primary leadership in designing, leading, and directing all aspects of UAT Testing for the Oracle data warehouse
Involved in the process of data acquisition, data pre-processing and data exploration of tele-communication project in Scala.

TECHNICAL SKILLS:

Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Apache Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j.

Data Modeling Tools: Erwin R9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16 and Oracle 12c.

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports.

Project Execution Methodologies: Agile, Ralph Kimball and Bill-Inmon’s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server

IDEs: Eclipse, RAD, WASD, Net Beans.

Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.

Version Tool: VSS, SVN, CVS.

PROFESSIONAL EXPERIENCE:

Confidential, Merrimack, NH

Sr. Big Data Engineer

Responsibilities:

Worked with Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of big data and conventional RDBMS.
Used the Agile Scrum methodology to build the different phases of Software development lifecycle.
Build Data Warehouse in Azure platform using Azure data bricks and data factory.
Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX and a variety of portfolios.
Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
Developed the Conceptual Data Models, Logical Data models and transformed them to creating schema using Erwin.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
Worked on Normalization and De-Normalization techniques for both OLTP and OLAP systems.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
Used SQL Profiler for monitoring and troubleshooting performance issues in T-SQL code and stored procedures.
Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.
Created several types of data visualizations using Python and Tableau.
Extracted data from Oracle and upload to Teradata tables using Teradata utilities FASTLOAD & Multiload.
Involved in Design and Development of technical specifications using Hadoop technologies.
Developed PL/SQL programs, stored procedures for data loading and data validations.
Worked on Cube structure optimization for MDM query performance in Analysis Services (SSAS).
Wrote Python scripts to parse XML documents and load the data in database.
Worked on normalization techniques. Normalized the data into 3rd Normal Form (3NF).
Designed Star and Snowflake Data Models for Enterprise Data Warehouse using Erwin.
Performed data profiling and transformation on the raw data using Pig, and Python.
Developed standardized Python/Scala code and practices for ingesting data from numerous database system
Managed real-time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
Designed and deployed scalable, highly available, and fault tolerant systems on Azure.
Used Microsoft Excel tools like pivot tables, graphs, charts, solver to perform quantitative analysis.
Used windows Azure SQL reporting services to create reports with tables, charts and maps.
Initiated Use Case Analysis using UML, which provided the framework for potential use case deliverables and their inter-relationships.
Created external tables pointing to HBase to access table with huge number of columns.
Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
Created various types of reports such as drill down & drill through reports, Matrix reports, Sub reports and Charts using SQL Server Reporting Services (SSRS).
Developed various Qlikview Data Models by extracting and using the data from various sources files Excel, Flat Files and Big data.

Environment: Agile, Azure, Hadoop 3.0, Pig 0.17, Zookeeper 3.4, Sqoop 1.4, MapReduce, Scala 2.12, Spark 2.4, HBase 1.4, Erwin 9.7, Hive 2.3, Python 3.7, NoSQL, PL/SQL, MDM

Confidential, Bentonville, AR

Data Engineer

Responsibilities:

As a Sr. Data Engineer, Responsible for building scalable distributed data solutions using Hadoop.
Involved in all phases of SDLC and participated in daily scrum meetings with cross teams.
Extensively used Agile methodology as the Organization Standard to implement the data Models.
Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift.
Worked on migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
Designed Data Flow Diagrams, E/R Diagrams and enforced all referential integrity constraints.
Created SSIS packages for data Importing, Cleansing, and Parsing etc. Extracted, cleaned and validated
Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
Developed an automation system to handle data related tasks using SQL, Python, Salesforce, and AWS
Extensively used ETL load scripts to manipulate, concatenate and clean source data.
Involved in Normalization and De-Normalization of existing tables for faster query retrieval.
Collected large amounts of log data using Apache Flume and aggregating using Pig/Hive in HDFS for further analysis.
Experienced in Python to manipulate data for data loading and extraction and worked with python libraries.
Gathered and documented requirements of a Qlikview application from users.
Written Hive Queries for analyzing data in Hive warehouse using Hive Query Language.
Worked on Mongo DB, HBase databases which differ from classic relational databases
Involved in several facets of MDM implementations including Data Profiling, Metadata acquisition and data migration.
Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
Wrote SQL complex queries for implementing business rules and transformations.
Created logical data model from the conceptual model and its conversion into the physical database design using Erwin
Installed and configured Pig for ETL jobs and made sure we had Pig scripts with a regular expression for data cleaning.
Integrated with business stake holders, gathering requirements and managing the delivery, covering the entire Tableau development life cycle.
Involved in database development by creating PL/SQL Functions, Procedures, and Packages, Cursors, Error handling and views.
Managed real-time data processing and real time Data Ingestion in MongoDB and Hive using Storm.
Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
Performed data analysis in Hive by creating tables, loading it with data and writing Hive queries which will run internally in a MapReduce way.
Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.

Environment: Agile, AWS, Sqoop 1.4, Tableau, SSIS, SQL, Python 3.7, Apache Flume 1.8, Pig 0.17, Hive 2.3, Mongo DB, MDM, HDFS, Oozie 4.3, HBase 1.4, MapReduce, Teradata

Confidential, Plano, TX

Data Analyst/Data Engineer

Responsibilities:

Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
Worked on Amazon Redshift and AWS a solution to load data creates data models
Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
Involved in logical and physical designs and transform logical models into physical implementations for Oracle and Teradata.
Gathered requirements and performed data mapping to understand the key information by creating tables
Worked on data integration and workflow application on SSIS platform and responsible for testing all new.
Extensively used cursors, ref cursors, User defined object types, Records in PL/SQL Programming.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
Integrated NoSQL database like HBase with MapReduce to move bulk amount of data into HBase.
Analyzed and gather user requirements and create necessary documentation of their data migration.
Performed reverse engineering of the dashboard requirements to model the required data marts.
Evaluated data mining request requirements and help develop the queries for the requests.
Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Involved with Transact SQL (T-SQL) Coding, writing queries, cursors, functions, views, & triggers.
Worked on all data management activities on the project data sources, data migration.
Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.

Environment: AWS, Amazon Redshift, SQL, Oracle 11g, Teradata r12, SSIS, PL/SQL, Pig 0.15, Sqoop, OLAP, RDBMS, Sqoop, NoSQL, HBase 1.8, MapReduce, OLTP, Hive 2.1, T-SQL

Confidential

Data Analyst/Data Modeler

Responsibilities:

Extensively involved in Data Modeler/Analyst role to review business requirement and compose source to target data mapping documents.
Conducted JAD sessions, wrote meeting minutes and also documented the requirements.
Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
Normalized the database based on the new model developed to put them into the 3NF of the data warehouse.
Collected requirements from business users and analyzed based on the requirements.
Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
Created Use Case Diagrams using UML to define the functional requirements of the application.
Involved in the data transfer creating tables from various tables, coding using PL/SQL, Stored Procedures and Packages.
Designed and developed cubes using SQL Server Analysis Services (SSAS) using Microsoft Visual Studio
Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
Performed Data Analysis for building the reports and building Enterprise Data warehouse (EDW).
Designed and developed OLTP models and OLAP model for the reporting requirements using ER/Studio.
Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
Involved in SQL Server and T-SQL in constructing Tables, Normalization and De-normalization techniques on database Tables.
Designed the data marts in dimensional data modeling using star and snowflake schemas.
Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns as part of data analysis responsibilities.
Used E/R Studio for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports
Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.

Environment: 3NF, SQL, SSIS, PL/SQL, SSAS, Microsoft Visual Studio 2012, OLTP, OLAP, ER/Studio v15, T-SQL

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Merrimack, NH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship