We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Mt Laurel, NJ

SUMMARY:

  • Overall 9+ years of profession experience in Software Systems Development, Business Systems including designing and developing with Big Data Engineer/Data Modeler/Data Analyst.
  • Good experience in all phases of SDLC and participated in daily scrum meetings with cross teams.
  • Excellent experience in developing and designing data integration and migration solutions in MS Azure.
  • Excellent understanding and hands on experience with AWS, AWS S3 and EC2.
  • Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inman Approach.
  • Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Good experience in using SSIS and SSRS in creating and managing reports for an organization.
  • Proficient knowledge in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
  • Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Excellent working with data modeling tools like Erwin, Power Designer and ER Studio.
  • Proficient working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
  • Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
  • Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Designing and Developing Oracle PL/SQL and Shell Scripts, Data Conversions and Data Cleansing.
  • Participating in requirements sessions to gather requirements along with business analysts and product owners.
  • Experience in designing a component using UML Design - Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
  • Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log.
  • Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
  • Strong experience in using Excel and MS Access to dump the data and analyze based on business needs.
  • Experience in designing the Data Mart and creation of Cubes.
  • Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
  • Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
  • Experience in using SSIS in solving complex business problems.
  • Proficient in writing DDL, DML commands using SQL developer and Toad.
  • Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.

TECHNICAL SKILLS:

Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Hadoop 3.0, Cassandra 3.11

Data Modeling Tools: Erwin r9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16

BI Tools: Tableau 10, SAP Business Objects, Crystal Reports

Methodologies: Agile, SDLC, Ralph Kimball data warehousing methodology, Joint Application Development (JAD)

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.

Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server

Version Tool: VSS, SVN, CVS.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.

PROFESSIONAL EXPERIENCE:

Confidential - Mt Laurel, NJ

Sr. Big Data Engineer

Responsibilities:

  • As a Big Data Engineer developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
  • Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, Shell Scripting, Hive.
  • Worked with Microsoft Azure Cloud services, Storage Accounts, Azure date storage and Azure Data Factory.
  • Used Agile Methodology of Data Warehouse development using Kanbanize.
  • Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Integrated Oozie with Pig, Hive, and Sqoop and developed Oozie workflow for scheduling and orchestrating the ETL process within the Cloudera Hadoop.
  • Collaborated with other data modeling team members to ensure design consistency and integrity.
  • Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Developed customized classes for serialization and Deserialization in Hadoop.
  • Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
  • Worked closely with business analyst for requirement gathering and translating into technical documentation.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Worked in MongoDB and UNIX environment to non-SQL data clean-up grouping and create the analysis reports.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Created external tables pointing to HBase to access table with huge number of columns.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
  • Extensively used Erwin for developing data model using star schema methodologies.
  • Maintained MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Developed, planed and migrated servers, relational databases (SQL) and websites to Microsoft Azure.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Generated multiple ad-hoc Python tools and scripts to facilitate map generation and data manipulation.

Environment: Big Data, Hadoop 3.0, Agile, Hive 2.3, HDFS, Oracle 12c, HBase 1.2, Flume 1.8, Pig 0.17, Oozie 4.3, SSIS, SSRS, SQL, PL/SQL, Cassandra 3.11, MongoDB, ETL, Sqoop

Confidential - Lowell, AR

Data Engineer

Responsibilities:

  • Worked as a Data Engineer on several Hadoop Ecosystem components like HBase, Sqoop, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
  • Worked on migrating PIG scripts programs to Spark and Spark SQL to improve performance.
  • Extensively involved in writing Oracle, PL/SQL, stored procedures, functions and packages.
  • Loaded data from different source (database & files) into Hive using Talend tool.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Worked on interviewing business users to gather requirements and documenting the requirements.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Imported and exported data into HDFS and Hive using Sqoop and Flume.
  • Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Developed and maintained stored procedures, implemented changes to database design including tables.
  • Ingested data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, HBase, and Hive.
  • Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
  • Involved in reports development using reporting tools like Tableau.
  • Loaded and transformed huge sets of structured, semi structured and unstructured data.
  • Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
  • Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
  • Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
  • Designed and developed cubes using SQL Server Analysis Services (SSAS) using Microsoft Visual Studio.
  • Performed performance tuning of OLTP and Data warehouse environments using SQL.
  • Created data structure to store the dimensions in an effective way to retrieve, delete and insert the data.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Implemented referential integrity using primary key and foreign key relationships.
  • Developed Staging jobs where in using data from different sources like flat files, Excel files, Oracle database

Environment: HBase, Oozie 4.3, Hive 2.3, Sqoop 1.4, SDLC, OLTP, SSAS, SQL, Oracle 12c, PL/SQL, ETL, AWS, Sqoop, Flume

Confidential - Hartford, CT

Data Analyst/Data Engineer

Responsibilities:

  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
  • Designed the HBase schemes based on the requirements and HBase data migration and validation
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Worked on moving all log files generated from various sources to HDFS for further processing
  • Wrote Hive with Scala scripts to analyze data according to business requirement.
  • Generate metadata, create Talend jobs, mappings to load data warehouse, Data Lake.
  • Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Translated business requirements into working logical and physical data models for OLTP & OLAP systems.
  • Optimized the performance of queries with modification in T-SQL queries, established joins and created clustered indexes.
  • Created HBase tables to store various data formats of data coming from different sources.
  • Created the system for single truth of source on Hadoop file system (HDFS), while enabling transparent data movement and access at various layers.
  • Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Developed SAS macros for data cleaning, reporting and to support routing processing.
  • Embed SQL queries in Excel and used Excel functions to calculate parameters like standard deviation,
  • Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ACCESS and SAS/EXCEL, Pivot Tables and Graphs

Environment: Erwin 9.5, SAS, SQL, HBase, Scala, T-SQL, AWS, Oozie, Hive 1.9, HDFS, PL/SQL, Excel.

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Worked on translating the business requirements into detailed, production-level technical specifications.
  • Involved in regular interactions with Business Analysts and participated in data modeling JAD sessions.
  • Developed data mapping documents between Legacy, Production, and User Interface Systems.
  • Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
  • Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
  • Participated in meetings & JAD sessions to gather and collect requirements from the business end users.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
  • Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
  • Developed Master data management strategies for storing reference data.
  • Worked with Data Stewards and Business analysts to gather requirements for MDM Project.
  • Designed logical, physical, relational and dimensional data models.
  • Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
  • Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages all environments.
  • Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries define source fields and its definitions.
  • Created Mappings, Mapplets, Sessions and Workflows to replace the existing Stored Procedures
  • Handled performance requirements for databases in OLTP and OLAP models.
  • Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
  • Created Schema objects like Indexes, Views, and Sequences, triggers, grants, roles, Snapshots.
  • Identified the entities and relationship between the entities to develop Conceptual Model using ER/Studio.
  • Developed the Data warehouse model (Kimball's) with several data marts and conformed dimensions for the proposed model in the Project.
  • Worked on attributes and relationships in the reverse engineered model to remove unwanted tables and columns.
  • Created data masking mappings to mask the sensitive data between production and test environment.

Environment: ER/Studio v14, PL/SQL, SQL, OLTP, OLAP, 3NF, SSIS

Confidential

Data Analyst

Responsibilities:

  • Worked with Data Analysts to understand Business logic and User Requirements.
  • Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
  • Created reports for the Data Analysis using SQL Server Reporting Services.
  • Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
  • Created SQL queries to simplify migration progress reports and analyses.
  • Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
  • Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
  • Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
  • Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
  • Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
  • Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
  • Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
  • Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
  • With V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
  • Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
  • Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
  • Extracted data from different sources performing Data Integrity and quality checks.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Collected, analyze and interpret complex data for reporting and/or performance trend analysis
  • Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.

Environment: SQL Server, MS Excel 2010, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint

We'd love your feedback!