We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Shelton, CT

SUMMARY:

  • Over 8+ years of experience as a Sr. Big Data Engineer with skills in analysis, design, development, testing and deploying various software applications.
  • Extensive experience in Technical consulting and end - to-end delivery with data modeling, data governance and design - development - implementation of solutions.
  • Proficient in developing Entity-Relationship diagrams, Star/Snow Flake schema designs
  • Good Writing Hive join query to fetch info from multiple tables, writing multiple jobs to collect output from Hive
  • Excellent experience in system Analysis, ER Dimensional Modeling, Data Design and implementing RDBMS specific features.
  • Hands on experience in importing, cleaning, transforming, and validating data and making conclusions from the data for decision-making purposes.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
  • Experience with data modeling and design of both OLTP and OLAP systems.
  • Good Working with Big Data Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
  • In depth knowledge of software development life cycle (SDLC), Waterfall, Iterative and Incremental, RUP, evolutionary prototyping and Agile/Scrum methodologies.
  • Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Proficient Knowledge on creating dashboards/reports using reporting tools like Tableau, QlikView.
  • Involve in the process of data acquisition, data pre-processing and data exploration of tele-communication project in Scala.
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Implementing a distributing messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Excellent Performing in data validation and transformation using Python and Hadoop streaming.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT & Dimension tables.
  • Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
  • Experience working with Microsoft Server tools like SSAS, SSIS and in generating on-demand scheduled reports using SQL Server Reporting Services (SSRS).
  • Well experience in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Responsible for providing primary leadership in designing, leading, and directing all aspects of UAT Testing for the Oracle data warehouse
  • Excellent understanding of Microsoft BI toolset including Excel, Power BI, SQL Server Analysis Services, Visio, Access.
  • Good Working on Apache Nifi as ETL tool for batch processing and real time processing.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata and BTEQ.

TECHNICAL SKILLS:

Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Apache Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j.

Data Modeling Tools: Erwin R9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16 and Oracle 12c.

Databases: Oracle 12c, DB2, SQL Server.

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports.

Project Execution Methodologies: Agile, Ralph Kimball and BillInmon s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server

IDEs: Eclipse, RAD, WASD, Net Beans.

Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.

Version Tool: VSS, SVN, CVS.

PROFESSIONAL EXPERIENCE:

Confidential - Shelton, CT

Sr. Big Data Engineer

Responsibilities:

  • As a Sr. Big Data Engineer I am responsible for developing, troubleshooting and implementing programs.
  • Developed in scheduling Oozie workflow engine to run multiple Hives and pig jobs.
  • Installed and configured Big Data ecosystem like HBase, Flume, Pig and Sqoop.
  • Primarily involved in Data Migration process using Azure by integrating with Bit bucket repository.
  • Participated in daily SCRUM meetings and give the daily status report.
  • Involved in all the phases of the Software Development Life Cycle (SDLC) methodologies such as Agile-SCRUM.
  • Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Created and maintain the metadata (data dictionary) for the data models.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
  • Worked on Apache Nifi as ETL tool for batch processing and real time processing.
  • Involved in Reverse engineering on existing Data model to understand the data flow and business flow
  • Performed data profiling and analysis applied various data cleansing rules designed data standards and designed the relational models.
  • Involved in Creation of Microsoft Azure Cloud SQL Servers and Replication Severs.
  • Involved in Manipulating, cleansing & processing data using Excel, and SQL and responsible for loading, extracting and validation of client data.
  • Implemented monitoring and established best practices around usage of Elastic search.
  • Utilized Integration Services (SSIS) to produce a Data Mapping and Data Mart for reporting.
  • Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
  • Worked with MDM systems team with respect to technical aspects and generating reports.
  • Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
  • Implemented highly scalable and reliable distributed data design using NoSQL/Cassandra technology.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Created Hive tables on top of the loaded data and writing hive queries for ad-hoc analysis.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.

Environment: Azure, Pig 0.17, Hadoop 3.0, Agile, HBase 1.2, Flume 1.8, Sqoop 1.4, Data Migration, MDM, Oracle 12c, PL/SQL, Hive 2.3, HDFS, NoSQL, SQL, Elastic search, Cassandra 3.0, Apache Nifi 1.6, ETL, MongoDB

Confidential - Arlington, VA

Sr. Data Engineer

Responsibilities:

  • As a Sr. Data Engineer, Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in all phases of SDLC and participated in daily scrum meetings with cross teams.
  • Used Agile Methodology of Data Warehouse development using Kanbanize.
  • Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
  • Used Sqoop to import and export data into Hadoop distributed file system for further processing.
  • Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Loaded multiple NOSQL databases including MongoDB, HBase and Cassandra.
  • Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
  • Worked on partitioning Hive tables and running scripts parallel to reduce run time of the scripts.
  • Worked with the analysis teams and management teams and supported them based on their requirements.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Provided technical support during delivery of MDM (Master Data Management) components.
  • Implemented Partitioning, Dynamic Partitions and Buckets in Hive for increasing performance benefit and helping in organizing data in a logical fashion.
  • Implemented python scripts to parse XML documents and load the data in the databases.
  • Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, and loaded final data into HDFS.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Utilized Hadoop, Hive and SQL technologies and moved data sets to production to be utilized by business teams to make business decisions.
  • Designed and Developed Real time Stream processing Application using Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Worked closely with business analyst for requirement gathering and translating into technical documentation.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Wrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive
  • Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
  • Involved in reports development using reporting tools like Tableau.
  • Wrote SQL scripts to run ad-hoc queries, Stored Procedures & Triggers and prepare reports to the management.
  • Designed and developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
  • Developed MapReduce modules for machine learning & predictive analytics in Hadoop.
  • Created and executed SQL scripts to validate, verify and compare the source data to target table data.

Environment: Hadoop 3.0, Oracle 12c, Apache Hive 2.3, HDFS, AWS, Sqoop 1.4, SQL, ETL, MapReduce, Tableau, Agile, Scala, Kafka 1.1, SAS, HBase 1.2, MongoDB, Cassandra 3.0.

Confidential - Medford, Oregon

Data Analyst/Data Engineer

Responsibilities:

  • Worked with the analysis teams and management teams and supported them based on their requirements.
  • Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
  • Conducted JAD sessions with stakeholders and software development team to analyze the feasibility of needs.
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS.
  • Performed Data Analysis and Data Manipulation of source data from SQL Server and other data structures to support the business organization.
  • Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
  • Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
  • Designed and Developed ETL jobs to extract data from Sales force replica and load it in data mart in Redshift.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.
  • Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Extensively used SAS procedures like means, frequency and other statistical calculations for Data validation.
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Worked with Business Analyst during requirements gathering and business analysis to prepare high level Logical Data Models and Physical Data Models.
  • Conducted Design discussions and meetings to come out with the appropriate Data Mart using Kimball Methodology.
  • Worked extensively on Data Quality (running Data Profiling, Examine Profile outcome) Metadata management
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
  • Worked with reversed engineer Data Model from Database instance and Scripts.
  • Worked on data mapping and data mediation between the source data table and target data tables using MS Excel.
  • Performed data extraction, data analysis, data manipulation and prepared various production and ad-hoc reports to support cost optimization initiatives and strategies.
  • Gathered business requirements from the users and transformed and implemented into database schemas.

Environment: Erwin 9.2, Hadoop 2.3, ETL, PL/SQL, OLTP, Apache Hive 2.1, Informatica, SAS, HDFS, NoSQL, ODS, MS Excel 2016.

Confidential - Peoria, IL

Data Analyst Data Modeler

Responsibilities:

  • As a Sr. Data Modeler/Data Analyst I was responsible for all data related aspects of a project.
  • Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
  • Designed the ER diagrams, logical model and physical database as per business requirements using Erwin
  • Assisted project with analytical techniques including data modeling, data mining techniques, regression.
  • Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
  • Performed Data Analysis on both source data and target data after transfer to Data Warehouse.
  • Developed Data mapping Transformation and Cleansing rules for the Master Data Management involving OLTP and OLAP.
  • Designed and developed the data dictionary and Metadata of the models and maintain them.
  • Involved in extensive Data validation using SQL queries and back-end testing
  • Provided PL/SQL queries to developer as source queries to identify the data provided logic to assign.
  • Designed the data marts in dimensional data modeling using star and snowflake schemas.
  • Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
  • Analyzed and presented the gathered information in graphical format for the ease of business managers.
  • Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required.
  • Performed GAP analysis to analyze the difference between the system capabilities and business requirements.
  • Used reverse engineering to connect to existing database and create graphical representation (E-R diagram).
  • Developed Data Migration and Cleansing rules for the Integration Architecture using OLTP.
  • Designed of Redshift Data model, Redshift Performance improvements/analysis
  • Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
  • Developed complex PL/SQL procedures and packages using views and SQL joins.
  • Performed various ad-hoc analyses by extracting data from multiple source systems and creating comprehensive reports for end users.

Environment: E/R Diagram, SQL, PL/SQL, OLAP, OLTP, Metadata, SQL

Confidential

Data Analyst

Responsibilities:

  • Worked with Data Analysts to understand Business logic and User Requirements.
  • Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
  • Created reports for the Data Analysis using SQL Server Reporting Services.
  • Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
  • Created SQL queries to simplify migration progress reports and analyses.
  • Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
  • Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
  • Validated Data to check for the proper conversion of the data. Data cleansing to identify unnecessary data and clean, data profiling for accuracy, completeness, consistency.
  • Used Python, Tableau to analyze the number of products per customer and sales in a category for sales optimization.
  • Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
  • Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
  • Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Collected, analyze and interpret complex data for reporting and/or performance trend analysis
  • Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
  • Responsible for analyzing business requirements and developing Reports using PowerPoint, Excel to provide data analysis solutions to business clients.
  • Designed and developed weekly, monthly reports by using MS Excel Techniques (Charts, Graphs, Pivot tables) and Power point presentations.
  • Performed Data analysis and Data profiling using complex SQL on various sources systems.
  • Provided on-demand ad-hoc reports used to assist in long and short-term budgeting and planning.

Environment: SQL Server, SSRS, Business Intelligence, MS Excel 2010, OLAP, OLTP, Tableau.

We'd love your feedback!