Sr. Big Data Engineer Resume Shelton, CT - Hire IT People

SUMMARY:

Over 8+ years of experience as a Sr. Big Data Engineer with skills in analysis, design, development, testing and deploying various software applications.
Extensive experience in Technical consulting and end - to-end delivery with data modeling, data governance and design - development - implementation of solutions.
Proficient in developing Entity-Relationship diagrams, Star/Snow Flake schema designs
Good Writing Hive join query to fetch info from multiple tables, writing multiple jobs to collect output from Hive
Excellent experience in system Analysis, ER Dimensional Modeling, Data Design and implementing RDBMS specific features.
Hands on experience in importing, cleaning, transforming, and validating data and making conclusions from the data for decision-making purposes.
Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
Experience with data modeling and design of both OLTP and OLAP systems.
Good Working with Big Data Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
In depth knowledge of software development life cycle (SDLC), Waterfall, Iterative and Incremental, RUP, evolutionary prototyping and Agile/Scrum methodologies.
Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
Proficient Knowledge on creating dashboards/reports using reporting tools like Tableau, QlikView.
Involve in the process of data acquisition, data pre-processing and data exploration of tele-communication project in Scala.
Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
Implementing a distributing messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
Experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
Excellent Performing in data validation and transformation using Python and Hadoop streaming.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT & Dimension tables.
Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
Experience working with Microsoft Server tools like SSAS, SSIS and in generating on-demand scheduled reports using SQL Server Reporting Services (SSRS).
Well experience in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.
Responsible for providing primary leadership in designing, leading, and directing all aspects of UAT Testing for the Oracle data warehouse
Excellent understanding of Microsoft BI toolset including Excel, Power BI, SQL Server Analysis Services, Visio, Access.
Good Working on Apache Nifi as ETL tool for batch processing and real time processing.
Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata and BTEQ.

TECHNICAL SKILLS:

Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Apache Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j.

Data Modeling Tools: Erwin R9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16 and Oracle 12c.

Databases: Oracle 12c, DB2, SQL Server.

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports.

Project Execution Methodologies: Agile, Ralph Kimball and BillInmon s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server

IDEs: Eclipse, RAD, WASD, Net Beans.

Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.

Version Tool: VSS, SVN, CVS.

PROFESSIONAL EXPERIENCE:

Confidential - Shelton, CT

Sr. Big Data Engineer

Responsibilities:

As a Sr. Big Data Engineer I am responsible for developing, troubleshooting and implementing programs.
Developed in scheduling Oozie workflow engine to run multiple Hives and pig jobs.
Installed and configured Big Data ecosystem like HBase, Flume, Pig and Sqoop.
Primarily involved in Data Migration process using Azure by integrating with Bit bucket repository.
Participated in daily SCRUM meetings and give the daily status report.
Involved in all the phases of the Software Development Life Cycle (SDLC) methodologies such as Agile-SCRUM.
Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
Created and maintain the metadata (data dictionary) for the data models.
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
Worked on Apache Nifi as ETL tool for batch processing and real time processing.
Involved in Reverse engineering on existing Data model to understand the data flow and business flow
Performed data profiling and analysis applied various data cleansing rules designed data standards and designed the relational models.
Involved in Creation of Microsoft Azure Cloud SQL Servers and Replication Severs.
Involved in Manipulating, cleansing & processing data using Excel, and SQL and responsible for loading, extracting and validation of client data.
Implemented monitoring and established best practices around usage of Elastic search.
Utilized Integration Services (SSIS) to produce a Data Mapping and Data Mart for reporting.
Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
Worked with MDM systems team with respect to technical aspects and generating reports.
Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
Implemented highly scalable and reliable distributed data design using NoSQL/Cassandra technology.
Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
Created Hive tables on top of the loaded data and writing hive queries for ad-hoc analysis.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.

Environment: Azure, Pig 0.17, Hadoop 3.0, Agile, HBase 1.2, Flume 1.8, Sqoop 1.4, Data Migration, MDM, Oracle 12c, PL/SQL, Hive 2.3, HDFS, NoSQL, SQL, Elastic search, Cassandra 3.0, Apache Nifi 1.6, ETL, MongoDB

Confidential - Arlington, VA

Sr. Data Engineer

Responsibilities:

As a Sr. Data Engineer, Responsible for building scalable distributed data solutions using Hadoop.
Involved in all phases of SDLC and participated in daily scrum meetings with cross teams.
Used Agile Methodology of Data Warehouse development using Kanbanize.
Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
Used Sqoop to import and export data into Hadoop distributed file system for further processing.
Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Loaded multiple NOSQL databases including MongoDB, HBase and Cassandra.
Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
Worked on partitioning Hive tables and running scripts parallel to reduce run time of the scripts.
Worked with the analysis teams and management teams and supported them based on their requirements.
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
Provided technical support during delivery of MDM (Master Data Management) components.
Implemented Partitioning, Dynamic Partitions and Buckets in Hive for increasing performance benefit and helping in organizing data in a logical fashion.
Implemented python scripts to parse XML documents and load the data in the databases.
Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, and loaded final data into HDFS.
Worked on configuring and managing disaster recovery and backup on Cassandra Data.
Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
Utilized Hadoop, Hive and SQL technologies and moved data sets to production to be utilized by business teams to make business decisions.
Designed and Developed Real time Stream processing Application using Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
Worked closely with business analyst for requirement gathering and translating into technical documentation.
Implemented a proof of concept deploying this product in Amazon Web Services AWS.
Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
Wrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive
Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
Involved in reports development using reporting tools like Tableau.
Wrote SQL scripts to run ad-hoc queries, Stored Procedures & Triggers and prepare reports to the management.
Designed and developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
Developed MapReduce modules for machine learning & predictive analytics in Hadoop.
Created and executed SQL scripts to validate, verify and compare the source data to target table data.

Environment: Hadoop 3.0, Oracle 12c, Apache Hive 2.3, HDFS, AWS, Sqoop 1.4, SQL, ETL, MapReduce, Tableau, Agile, Scala, Kafka 1.1, SAS, HBase 1.2, MongoDB, Cassandra 3.0.

Confidential - Medford, Oregon

Data Analyst/Data Engineer

Responsibilities:

Worked with the analysis teams and management teams and supported them based on their requirements.
Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
Conducted JAD sessions with stakeholders and software development team to analyze the feasibility of needs.
Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS.
Performed Data Analysis and Data Manipulation of source data from SQL Server and other data structures to support the business organization.
Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
Designed and Developed ETL jobs to extract data from Sales force replica and load it in data mart in Redshift.
Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.
Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
Extensively used SAS procedures like means, frequency and other statistical calculations for Data validation.
Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
Worked with Business Analyst during requirements gathering and business analysis to prepare high level Logical Data Models and Physical Data Models.
Conducted Design discussions and meetings to come out with the appropriate Data Mart using Kimball Methodology.
Worked extensively on Data Quality (running Data Profiling, Examine Profile outcome) Metadata management
Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
Worked with reversed engineer Data Model from Database instance and Scripts.
Worked on data mapping and data mediation between the source data table and target data tables using MS Excel.
Performed data extraction, data analysis, data manipulation and prepared various production and ad-hoc reports to support cost optimization initiatives and strategies.
Gathered business requirements from the users and transformed and implemented into database schemas.

Environment: Erwin 9.2, Hadoop 2.3, ETL, PL/SQL, OLTP, Apache Hive 2.1, Informatica, SAS, HDFS, NoSQL, ODS, MS Excel 2016.

Confidential - Peoria, IL

Data Analyst Data Modeler

Responsibilities:

As a Sr. Data Modeler/Data Analyst I was responsible for all data related aspects of a project.
Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
Designed the ER diagrams, logical model and physical database as per business requirements using Erwin
Assisted project with analytical techniques including data modeling, data mining techniques, regression.
Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
Performed Data Analysis on both source data and target data after transfer to Data Warehouse.
Developed Data mapping Transformation and Cleansing rules for the Master Data Management involving OLTP and OLAP.
Designed and developed the data dictionary and Metadata of the models and maintain them.
Involved in extensive Data validation using SQL queries and back-end testing
Provided PL/SQL queries to developer as source queries to identify the data provided logic to assign.
Designed the data marts in dimensional data modeling using star and snowflake schemas.
Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
Analyzed and presented the gathered information in graphical format for the ease of business managers.
Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required.
Performed GAP analysis to analyze the difference between the system capabilities and business requirements.
Used reverse engineering to connect to existing database and create graphical representation (E-R diagram).
Developed Data Migration and Cleansing rules for the Integration Architecture using OLTP.
Designed of Redshift Data model, Redshift Performance improvements/analysis
Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
Developed complex PL/SQL procedures and packages using views and SQL joins.
Performed various ad-hoc analyses by extracting data from multiple source systems and creating comprehensive reports for end users.

Environment: E/R Diagram, SQL, PL/SQL, OLAP, OLTP, Metadata, SQL

Confidential

Data Analyst

Responsibilities:

Worked with Data Analysts to understand Business logic and User Requirements.
Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
Created reports for the Data Analysis using SQL Server Reporting Services.
Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
Created SQL queries to simplify migration progress reports and analyses.
Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
Validated Data to check for the proper conversion of the data. Data cleansing to identify unnecessary data and clean, data profiling for accuracy, completeness, consistency.
Used Python, Tableau to analyze the number of products per customer and sales in a category for sales optimization.
Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
Collected, analyze and interpret complex data for reporting and/or performance trend analysis
Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
Responsible for analyzing business requirements and developing Reports using PowerPoint, Excel to provide data analysis solutions to business clients.
Designed and developed weekly, monthly reports by using MS Excel Techniques (Charts, Graphs, Pivot tables) and Power point presentations.
Performed Data analysis and Data profiling using complex SQL on various sources systems.
Provided on-demand ad-hoc reports used to assist in long and short-term budgeting and planning.

Environment: SQL Server, SSRS, Business Intelligence, MS Excel 2010, OLAP, OLTP, Tableau.

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Shelton, CT

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship