We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Redmond, WA

SUMMARY:

  • 8+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
  • Experience in cloud development architecture on Amazon AWS, Redshift and Basic on Azure.
  • Having working experience on Hortonworks, Cloudera and Amazon AWS distributions using Cent OS and RHEL Linux environment.
  • Proficient in gathering business requirements and handling requirements management.
  • Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of big data.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, and MDM.
  • Excellent Knowledge of Ralph Kimball and Bill Inmon's approaches to Data Warehousing.
  • Experience in analyzing data using Big Data Ecosystem including HDFS, Hive, HBase, Zookeeper, PIG, Sqoop, and Flume.
  • Strong Knowledge of Data Warehouse and Star Schema, Snow flake Schema, FACT and Dimensional Tables.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System.
  • Experience on developing MapReduce jobs for data cleaning and data manipulation as required for the business.
  • Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
  • Expertise in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) De-normalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Good knowledge in using apache Nifi to automate the data movement between different Hadoop systems.
  • Extensive experience on usage of ETL & Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS)
  • Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
  • Experience in Teradata SQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and Fast Export.
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Experience in development and support knowledge on Oracle, SQL, PL/SQL, T-SQL queries.
  • Extensively experience on Excel Pivot tables to run and analyze the result data set and perform UNIX scripting.
  • Experience with Object Oriented Analysis and Design (OOAD) using UML, Rational Unified Process (RUP), Rational Rose and MS Visio.
  • Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization.
  • An excellent team player & technically strong person who has capability to work with business users, project managers, team leads, architects and peers, thus maintaining healthy environment in the project.

TECHNICAL SKILLS:

Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Apache Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Cassandra 3.11

Data Modeling Tools: Erwin R9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16 and Oracle 12c

Databases: Oracle 12c, DB2, SQL Server.

RDBMS: Confidential SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports

Project Execution Methodologies: Agile, Ralph Kimball and Bill-Inmon s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

Packages: Confidential Office 2019, Confidential Project, SAP and Confidential Visio 2019, Share point Portal Server

Version Tool: VSS, SVN, CVS.

Operating Systems: Confidential Windows 7/8 and 10, UNIX, and Linux.

PROFESSIONAL EXPERIENCE:

Confidential, Redmond, WA

Sr. Big Data Engineer

Responsibilities:

  • As a Big data Engineer involved in Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Participated in Code Reviews, Enhancement discussion, maintenance of existing pipelines & systems, testing and bug-fix activities on-going basis.
  • Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation.
  • Interacted with ETL Team to understand Ingestion of data from ETL to Azure Data Lake to develop Predictive analytics.
  • Built a prototype Azure Data Lake application that accesses 3rd party data services via Web Services.
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Created various Documents such as Source-To-Target Data mapping Document, Unit Test, Cases and Data Migration Document.
  • Imported data from structured data source into HDFS using Sqoop incremental imports.
  • Created Hive tables, partitions and implemented incremental imports to perform ad-hoc queries on structured data.
  • Worked with Azure ExpressRoute to create private connections between Azure datacenters and infrastructure for on premises and in co-location environment.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.
  • Build Data Sync job on Windows Azure to synchronize data from SQL 2012 databases to SQL Azure.
  • Developed SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
  • Supported MapReduce Programs those are running on the cluster and also wrote MapReduce jobs using Java API.
  • Wrote complex SQL and PL/SQL queries for stored procedures.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developing data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts
  • Created Azure Event Hubs for Application instrumentation and for User experience or work flow processing.
  • Implemented Security in Web Applications using Azure and Deployed Web Applications to Azure.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager

Environment: Agile, Hive, MS Sql 2012, Sqoop, Azure Data Lake, Storm, Kafka, HDFS, AWS, Data mapping, Hadoop, YARN, MapReduce, RDBMS, Data Lake, Python, Scala, Dynamo DB, Flume, Pig

Confidential, Research Triangle Park, NC

Sr. Big Data Engineer

Responsibilities:

  • As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams.
  • Used Spark, hive for implementing the transformations need to join the daily ingested data to historic data.
  • Built Workflows and sub workflows for calling Sqoop, spark, impala tasklets.
  • Implemented the automated workflows for all the jobs using Oozie.
  • Created data integration and technical solutions for Azure Data Lake for providing analytics and reports for improving marketing strategies.
  • Conducted JAD sessions with management, vendors, users and other stakeholders for open and pending issues to develop specifications.
  • Worked with Python, to develop analytical jobs using PySpark API of spark.
  • Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Developed OLTP system by designing Logical and eventually Physical Data Model from the Conceptual Data Model.
  • Used Erwin tool to develop a Conceptual Model based on business requirements analysis.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Worked with MDM systems team with respect to technical aspects and generating reports.
  • Used Apache Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Build and maintain SQL scripts, Indexes, and complex queries for data analysis and extraction.
  • Worked on normalization techniques, normalized the data into 3rd Normal Form (3NF).
  • Configured Azure SQL database with Azure storage Explorer and with SQL server.
  • Created of Fast export, Multiload, Tpump, Fast load scripts for extracting data from various production systems.
  • Worked on Normalization and De-Normalization techniques for both OLTP and OLAP systems.
  • Implemented Forward Engineering by using DDL scripts and generating indexing strategies to develop the logical data model using Erwin.
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Rendered and delivered reports in desired formats by using reporting tools such as Tableau.
  • Collected large amounts of log data using Apache Flume and aggregating using PIG/HIVE in HDFS for further analysis.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Generated various reports using SQL Server Report Services (SSRS) for business analysts and the management team.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Designed and Developed Oracle and UNIX Shell Scripts for Data Import/Export and Data Conversions.
  • Worked on forward and reverse engineering the DDL for the SQL Server, and Teradata environments.
  • Created linked services to connect to Azure Storage, on-premises SQL Server and Azure HDInsight
  • Designed and implemented business intelligence to support sales and operations functions to increase customer satisfaction.
  • Developed and implemented different Pig UDFs to write ad-hoc and scheduled reports as required by the Business team.

Environment: Hadoop 3.0, Agile, Azure, Hive 2.3, HDFS, Oozie 4.3, Pig 0.17, Sqoop 1.4, Erwin 9.7, NoSQL, MDM, MapReduce, Kafka 2.1, Scala 2.12, Apache Flume 1.8, Cassandra 3.11, PL/SQL, HBase 1.2

Confidential, Bellevue, WA

Big Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop components.
  • Responsible for understanding business needs, requirement gathering, analyzing functional requirements.
  • Designed and developed Big Data Analytics platform that supports next-gen analytics.
  • Processed large datasets using Spark transformations and actions on RDDs.
  • Extensively performed real-time streaming jobs using Spark Streaming to analyze large amounts of data on a regular window time interval coming from source: Kafka.
  • Used Spark SQL to perform complex data manipulations, and to work with large amounts of structured and semi-structured data stored in a cluster using Data frames/ Datasets.
  • Involved in converting Hive/SQL queries into spark transformations using spark RDD’s, Python.
  • Performed efficient and effective joins, tuned and optimized Spark applications for better performance.
  • Wrote many HiveQL queries and extended Hive functionality by writing custom UDFs, UDAFs, UDTFs to process large amounts of data sitting on HDFS.
  • Imported data from relational databases to HDFS/Hive, performed operations and exported the results back using Sqoop.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • Designed the schema, configured and deployed AWS Redshift for optimal storage and fast retrieval of data.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Resolved the data related issues such as assessing data quality, data consolidation, evaluating existing data sources.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC
  • Implemented Spark using Python Spark SQL for faster testing and processing of data.
  • Designed data flows that (ETL) extract, transform, and load data by optimizing SSIS performance.
  • Developed packages and stored procedures which form the part of the daily batch process.
  • Analyzed the Business information requirements and examined the OLAP source systems to identify the measures, dimensions and facts required for the reports.
  • Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required.
  • Involved in Data Validation and fixing discrepancies by working in coordination with the Data Integration.
  • Worked with OLTP to find the daily transactions and type of transactions occurred and the amount of resource used.
  • Developed Pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Designed and deployed scalable, highly available, and fault tolerant systems on AWS.
  • Executed change management processes surrounding new releases of SAS functionality
  • Worked with Business Analyst to understand the user requirements, layout, and look of the interactive dashboard to be developed in tableau.
  • Created the cubes with Star Schemas using facts and dimensions through SQL Server Analysis Services (SSAS).
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Designed the data marts in dimensional data modeling using star and snowflake schemas.
  • Performed data analysis of the source data coming from point of sales systems (POS) and legacy systems.
  • Developed complex stored procedures using T-SQL to generate Ad-hoc reports within SQL Server reporting services.
  • Developed and maintain sales reporting using in MS Excel queries, SQL in Teradata, and MS Access.

Environment: Hadoop 3.0, Agile, Sqoop 1.4, PL/SQL, AWS, RDBMS, ETL, OLAP, SQL, Pig 0.17, Oozie 4.3, Hive 2.3, Oozie 4.3, T-SQL, MS Excel 2018, MS Access 2018

Confidential, Merrimack, NH

Data Modeler

Responsibilities:

  • Massively involved in Data Modeler role to review business requirement and compose source to target data mapping documents.
  • Participated in JAD session with business users, sponsors and subject matter experts to understand the business requirement document.
  • Translated business requirements into detailed, production-level technical specifications, new features, and created conceptual modeling.
  • Created logical data model from the conceptual model and its conversion into the physical database design using Erwin.
  • Reverse Engineered DB2 databases and then forward engineered them to Teradata using Erwin.
  • Analyzed, retrieved and aggregated data from multiple datasets to perform data mapping.
  • Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
  • Involved in extensive Data Analysis on Teradata, and Oracle Systems Querying and Writing in SQL and Toad.
  • Established process on the work flow to create Work flow diagrams using Confidential Visio.
  • Conducted user interviews, gathering requirements, analyzing the requirements using Rational Rose, Requisite pro RUP.
  • Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports
  • Worked on Data governance, data quality, data lineage establishment processes.
  • Created, managed, and modified logical and physical data models using a variety of data modeling philosophies and techniques including Inmon or Kimball
  • Worked on Normalization and De-Normalization techniques for both OLTP and OLAP systems.
  • Documented a whole process of working with Tableau Desktop, installing Tableau Server and evaluating Business Requirements.
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and bulk collects.
  • Involved in extracting, cleansing, transforming, integrating and loading data into different Data Marts using Data Stage Designer.
  • Forward Engineering the Data Models, Reverse Engineering on the existing Data Models and updates the data models.
  • Facilitated in developing testing procedures, test cases and User Acceptance Testing (UAT).
  • Dealt with different data sources ranging from flat files, Excel, Oracle, and SQL Server.
  • Used External Loaders like Multi Load, T Pump and Fast Load to load data into Teradata Database analysis, development, testing, implementation and deployment.

Environment: DB2, Teradata r14, SQL, Toad, Oracle 11g, Confidential Visio 2016, Rational Rose

Confidential, Malvern, PA

Data Analyst/Data Modeler

Responsibilities:

  • Worked as a Data Analysts/Data Modeler to understand Business logic and User Requirements.
  • Conducted data modeling JAD sessions and communicated data-related standards.
  • Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
  • Designed and implemented basic PL/SQL queries for testing and report/data validation.
  • Developed the stored Procedures, SQL Joins, SQL queries for data retrieval, accessed for analysis.
  • Used forward engineering approach for designing and creating databases for OLAP model.
  • Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
  • Developed optimized stored procedures, T-SQL queries, User Defined Functions (UDF), Cursors, Views and Triggers, SQL Joins and other statements for reporting.
  • Conducted data mining and data modeling in coordination with finance manager.
  • Involved in logical and physical designs and transform logical models into physical implementations for Oracle and Teradata.
  • Used advanced features of T-SQL in order to design and tune T-SQL to interface with the Database.
  • Applied conditional formatting in SSRS to highlight key areas in the report data.
  • Developed the required data warehouse model using Star schema for the generalized model.
  • Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
  • Used E/R Studio for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Created Use Case Diagrams, Activity Diagrams, Sequence Diagrams in Rational Rose.
  • Designed the data marts in dimensional data modeling using snowflake schemas.
  • Created Project Plan documents, Software Requirement Documents, Environment Configuration and UML diagrams.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Gathered requirements and performed data mapping to understand the key information by creating tables
  • Performed Gap Analysis on existing data models and helped in controlling the gaps identified.
  • Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.

Environment: SQL, PL/SQL, SSIS, T-SQL, Teradata r12, Oracle 11g, E/R Studio v15, XML, Python

Confidential, Louisville, KY

Data Analyst

Responsibilities:

  • Worked with Data Analyst for requirements gathering, business analysis and project coordination.
  • Designed and developed complex SQL scripts in SQL server database for creating tables for tableau reporting.
  • Established a process for data review and remediation of Data Quality issues
  • Generated SQL scripts to extract relevant data and developed SSIS packages for data migration.
  • Created T/SQL statements (select, insert, update, delete) and stored procedures.
  • Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
  • Created complex queries to automate data profiling process needed to define the structure of the pre staging and staging area.
  • Worked in generating and documenting Metadata while designing OLTP and OLAP systems environment.
  • Created PL/SQL procedures in order to aid business functionalities like bidding and allocation of inventory to the shippers.
  • Generated Data dictionary reports for publishing on the internal site and giving access to different users
  • Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.
  • Produced report using SQL Server Reporting Services (SSRS) and creating various types of reports.
  • Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Analyzed and build proof of concepts to convert SAS reports into tableau or use SAS dataset in Tableau.
  • Acted as liaison between Business Intelligence and Business User groups to relay change requests.
  • Extensively worked on flat files, mainframe files and involved in creation of UNIX shell
  • Created data flow, process documents and ad-hoc reports to derive requirements for existing system enhancements.

Environment: SQL, SSIS, T/SQL, PL/SQL, SSRS, SAS, Tableau, UNIX

We'd love your feedback!