Sr. Big Data Engineer Resume
Mt Laurel, NJ
SUMMARY:
- Overall 9+ years of profession experience in Software Systems Development, Business Systems including designing and developing with Big Data Engineer/Data Modeler/Data Analyst.
- Good experience in all phases of SDLC and participated in daily scrum meetings with cross teams.
- Excellent experience in developing and designing data integration and migration solutions in MS Azure.
- Excellent understanding and hands on experience with AWS, AWS S3 and EC2.
- Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inman Approach.
- Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Good experience in using SSIS and SSRS in creating and managing reports for an organization.
- Proficient knowledge in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
- Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Excellent working with data modeling tools like Erwin, Power Designer and ER Studio.
- Proficient working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
- Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Designing and Developing Oracle PL/SQL and Shell Scripts, Data Conversions and Data Cleansing.
- Participating in requirements sessions to gather requirements along with business analysts and product owners.
- Experience in designing a component using UML Design - Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
- Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
- Strong experience in using Excel and MS Access to dump the data and analyze based on business needs.
- Experience in designing the Data Mart and creation of Cubes.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
- Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
- Experience in using SSIS in solving complex business problems.
- Proficient in writing DDL, DML commands using SQL developer and Toad.
- Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.
TECHNICAL SKILLS:
Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Hadoop 3.0, Cassandra 3.11
Data Modeling Tools: Erwin r9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16
BI Tools: Tableau 10, SAP Business Objects, Crystal Reports
Methodologies: Agile, SDLC, Ralph Kimball data warehousing methodology, Joint Application Development (JAD)
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access
Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.
Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server
Version Tool: VSS, SVN, CVS.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.
PROFESSIONAL EXPERIENCE:
Confidential - Mt Laurel, NJ
Sr. Big Data Engineer
Responsibilities:
- As a Big Data Engineer developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, Shell Scripting, Hive.
- Worked with Microsoft Azure Cloud services, Storage Accounts, Azure date storage and Azure Data Factory.
- Used Agile Methodology of Data Warehouse development using Kanbanize.
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Integrated Oozie with Pig, Hive, and Sqoop and developed Oozie workflow for scheduling and orchestrating the ETL process within the Cloudera Hadoop.
- Collaborated with other data modeling team members to ensure design consistency and integrity.
- Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Developed customized classes for serialization and Deserialization in Hadoop.
- Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
- Worked closely with business analyst for requirement gathering and translating into technical documentation.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Worked in MongoDB and UNIX environment to non-SQL data clean-up grouping and create the analysis reports.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Created external tables pointing to HBase to access table with huge number of columns.
- Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
- Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
- Extensively used Erwin for developing data model using star schema methodologies.
- Maintained MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases.
- Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Developed, planed and migrated servers, relational databases (SQL) and websites to Microsoft Azure.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Generated multiple ad-hoc Python tools and scripts to facilitate map generation and data manipulation.
Environment: Big Data, Hadoop 3.0, Agile, Hive 2.3, HDFS, Oracle 12c, HBase 1.2, Flume 1.8, Pig 0.17, Oozie 4.3, SSIS, SSRS, SQL, PL/SQL, Cassandra 3.11, MongoDB, ETL, Sqoop
Confidential - Lowell, AR
Data Engineer
Responsibilities:
- Worked as a Data Engineer on several Hadoop Ecosystem components like HBase, Sqoop, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
- Worked on migrating PIG scripts programs to Spark and Spark SQL to improve performance.
- Extensively involved in writing Oracle, PL/SQL, stored procedures, functions and packages.
- Loaded data from different source (database & files) into Hive using Talend tool.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Worked on interviewing business users to gather requirements and documenting the requirements.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Imported and exported data into HDFS and Hive using Sqoop and Flume.
- Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Developed and maintained stored procedures, implemented changes to database design including tables.
- Ingested data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, HBase, and Hive.
- Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator.
- Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
- Involved in reports development using reporting tools like Tableau.
- Loaded and transformed huge sets of structured, semi structured and unstructured data.
- Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
- Designed and developed cubes using SQL Server Analysis Services (SSAS) using Microsoft Visual Studio.
- Performed performance tuning of OLTP and Data warehouse environments using SQL.
- Created data structure to store the dimensions in an effective way to retrieve, delete and insert the data.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Implemented referential integrity using primary key and foreign key relationships.
- Developed Staging jobs where in using data from different sources like flat files, Excel files, Oracle database
Environment: HBase, Oozie 4.3, Hive 2.3, Sqoop 1.4, SDLC, OLTP, SSAS, SQL, Oracle 12c, PL/SQL, ETL, AWS, Sqoop, Flume
Confidential - Hartford, CT
Data Analyst/Data Engineer
Responsibilities:
- Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
- Designed the HBase schemes based on the requirements and HBase data migration and validation
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Worked on moving all log files generated from various sources to HDFS for further processing
- Wrote Hive with Scala scripts to analyze data according to business requirement.
- Generate metadata, create Talend jobs, mappings to load data warehouse, Data Lake.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Translated business requirements into working logical and physical data models for OLTP & OLAP systems.
- Optimized the performance of queries with modification in T-SQL queries, established joins and created clustered indexes.
- Created HBase tables to store various data formats of data coming from different sources.
- Created the system for single truth of source on Hadoop file system (HDFS), while enabling transparent data movement and access at various layers.
- Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Developed SAS macros for data cleaning, reporting and to support routing processing.
- Embed SQL queries in Excel and used Excel functions to calculate parameters like standard deviation,
- Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ACCESS and SAS/EXCEL, Pivot Tables and Graphs
Environment: Erwin 9.5, SAS, SQL, HBase, Scala, T-SQL, AWS, Oozie, Hive 1.9, HDFS, PL/SQL, Excel.
Confidential
Data Analyst/Data Modeler
Responsibilities:
- Worked on translating the business requirements into detailed, production-level technical specifications.
- Involved in regular interactions with Business Analysts and participated in data modeling JAD sessions.
- Developed data mapping documents between Legacy, Production, and User Interface Systems.
- Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
- Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
- Participated in meetings & JAD sessions to gather and collect requirements from the business end users.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
- Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
- Developed Master data management strategies for storing reference data.
- Worked with Data Stewards and Business analysts to gather requirements for MDM Project.
- Designed logical, physical, relational and dimensional data models.
- Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
- Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages all environments.
- Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries define source fields and its definitions.
- Created Mappings, Mapplets, Sessions and Workflows to replace the existing Stored Procedures
- Handled performance requirements for databases in OLTP and OLAP models.
- Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
- Created Schema objects like Indexes, Views, and Sequences, triggers, grants, roles, Snapshots.
- Identified the entities and relationship between the entities to develop Conceptual Model using ER/Studio.
- Developed the Data warehouse model (Kimball's) with several data marts and conformed dimensions for the proposed model in the Project.
- Worked on attributes and relationships in the reverse engineered model to remove unwanted tables and columns.
- Created data masking mappings to mask the sensitive data between production and test environment.
Environment: ER/Studio v14, PL/SQL, SQL, OLTP, OLAP, 3NF, SSIS
Confidential
Data Analyst
Responsibilities:
- Worked with Data Analysts to understand Business logic and User Requirements.
- Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
- Created reports for the Data Analysis using SQL Server Reporting Services.
- Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
- Created SQL queries to simplify migration progress reports and analyses.
- Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
- Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
- Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
- Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
- Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
- Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
- Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
- Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
- With V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
- Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
- Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
- Extracted data from different sources performing Data Integrity and quality checks.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Collected, analyze and interpret complex data for reporting and/or performance trend analysis
- Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
Environment: SQL Server, MS Excel 2010, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint