Sr. Big Data Engineer Resume
Franklin Lakes, NJ
SUMMARY:
- Above 9+ years of experience as Big Data Engineer/Data Engineer/Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Hands on experiecne in SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
- Hands on experience in SQL queries and optimizing the queries in Oracle, SQL Server, DB2, and Netezza & Teradata.
- Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
- Well versed with Data Migration, Data Conversions, Data Extraction/ Transformation/Loading (ETL)
- Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
- Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
- Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop
- Good experience in using SSRS and Cognos in creating and managing reports for an organization.
- Excellent experienced on NoSQL databases like MongoDB, Cassandra.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Expertise in Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
- Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
PROFESSIONAL EXPERIENCE:
Confidential - Franklin Lakes, NJ
Sr. Big Data Engineer
Responsibilities:
- As a Sr. Big Data Engineer worked on Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
- Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Worked on Hive Table creation and Partitioning
- Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Worked on Hive queries to categorize data of different wireless applications and security systems.
- Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
- Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
- Extensively involved in writing PL/SQL, stored procedures, functions and packages.
- Involved in Data Architecture, Data profiling, Data analysis, data mapping and Data architecture artifacts design.
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Managed the Metadata associated with the ETL processes used to populate the Data Warehouse.
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
- Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools like Erwin.
- Designed and develop end to end ETL processing from Oracle to AWS using Amazon S3, EMR, and Spark.
- Designed class and activity diagrams using Power Designer and UML tools like Visio.
- Developed Spark streaming application to pull data from cloud to Hive table.
- Written SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
- Involved in Manipulating, cleansing & processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.
- Created sheet selector to accommodate multiple chart types (Pie, Bar, Line etc) in a single dashboard by using parameters.
Environment: Hive 2.3, MapReduce, Hadoop 3.0, HDFS, Oracle, Spark 2.3, HBase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, Python, PL/SQL, NoSQL, SSIS, SSRS, Visio, AWS Redshift, Teradata, Python, SQL, PostgreSQL, EC2, S3, Windows, Pl/Sql
Confidential - Merrimack, NH
Sr. Data Engineer
Responsibilities:
- Worked as a Big Data implementation engineer within a team of professionals.
- Used Agile Methodology of Data Warehouse development using Kanbanize.
- Worked with Hadoop Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
- Worked with the analysis teams and management teams and supported them based on their requirements.
- Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL) using SAS/SQL, SAS/macros.
- Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
- Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
- Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
- Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.
- Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
- Wrote Python scripts to parse XML documents and load the data in database.
- Used Python to extract weekly information from XML files.
- Managed Hadoop jobs using Oozie workflow scheduler.
- Worked on AWS to aggregate clean files in Amazon S3 and also on Amazon EC2 Clusters to deploy files into Buckets.
- Provided technical support during delivery of MDM (Master Data Management) components.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Designed reports in SSRS to create, execute, and deliver tabular reports using shared data source and specified data source. Also, Debugged and deployed reports in SSRS.
- Optimized the performance of queries with modification in TSQL queries, established joins and created clustered indexes.
- Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading.
- Performed File system management and monitoring on Hadoop log files.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Worked with Data Governance, Data Quality and Metadata Management team to understand project.
- Used Spark SQL to process the huge amount of structured data.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Created HBase tables to store various data formats of data coming from different sources.
- Responsible for importing log files from various sources into HDFS using Flume.
- Development of routines to capture and report data quality issues and exceptional scenarios.
- Developed dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Worked on QA the data and adding Data sources, snapshot, caching to the report
- Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.
- Published Workbooks by creating user filters so that only appropriate teams can view it.
- Worked on SAS Visual Analytics & SAS Web Report Studio for data presentation and reporting.
- Extensively used SAS/Macros to parameterize the reports so that the user could choose the summary and sub-setting variables to be used from the web application.
- Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures.
- Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
Environment: SAS, SQL, Oracle 12c, PL/SQL, UNIX, XML, Python 3.7, AWS, SSRS, T-SQL, Hive 2.3, Sqoop 1.4
Confidential - St. Louis, MO
Data Analyst/Data Engineer
Responsibilities:
- Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
- Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
- Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
- Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
- Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
- Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
- Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
- Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
- Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL
- Conducted meetings with business and development teams for data validation and end-to-end data mapping.
- Responsible for Metadata Management, keeping up to date centralized metadata repositories using Erwin modeling tools.
- Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Sql database.
- Lead data migration from legacy systems into modern data integration frameworks from conception to completion.
- Generated DDL and created the tables and views in the corresponding architectural layers.
- Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop
- Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract the data from SQL Database.
- Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
- Created a high-level industry standard, generalized data model to convert it into logical and physical model at later stages of the project using Erwin and Visio.
- Participate in code/design reviews and provide input into best practices for reports and universe development.
Environment: Erwin 9.5, HDFS, HBase, Hadoop, Metadata, MS Visio, SQL Server 2016, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.
Confidential - Florham Park, NJ
Sr. Data Analyst/Data Modeler
Responsibilities:
- Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
- Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
- Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
- Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
- Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
- Gather all the analysis reports prototypes from the business analysts belonging to different Business units; Participated in JAD sessions involving the discussion of various reporting needs.
- Reverse Engineering the existing data marts and identified the Data Elements (in the source systems), Dimensions, Facts and Measures required for reports.
- Conduct Design discussions and meetings to come out with the appropriate Data Warehouse at the lowest level of grain for each of the Dimensions involved.
- Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints.
- Involved in designing and developing SQL server objects such as Tables, Views, Indexes (Clustered and Non-Clustered), Stored Procedures and Functions in Transact-SQL.
- Designed a Star schema for sales data involving shared dimensions (Conformed) for other subject areas using Erwin Data Modeler.
- Created and maintained Logical Data Model (LDM) for the project. Includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
- Validated and updated the appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
- Conduct Design reviews with the business analysts and content developers to create a proof of concept for the reports.
- Ensured the feasibility of the logical and physical design models.
- Worked on the Snow-flaking the Dimensions to remove redundancy.
- Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
- Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Involved in Data profiling and performed Data Analysis based on the requirements, which helped in catching many Sourcing Issues upfront.
- Developed Data mapping, Data Governance, Transformation and Cleansing rules for the Data Management involving OLTP, ODS and OLAP.
- Created data masking mappings to mask the sensitive data between production and test environment.
- Normalized the database based on the new model developed to put them into the 3NF of the data warehouse.
- Used SQL tools like Teradata SQL Assistant and TOAD to run SQL queries and validate the data in warehouse.
- Created SSIS package for daily email subscriptions using the ODBC driver and PostgreSQL database.
- Constructed complex SQL queries with sub-queries, inline views as per the functional needs in the Business Requirements Document (BRD).
Environment: PL/SQL, Erwin 8.5, MS SQL 2012, OLTP, ODS, OLAP, SSIS, Transact-SQL, Teradata SQL Assistant
Confidential
Data Analyst
Responsibilities:
- Maintained numerous monthly scripts, executed on monthly basis, produces reports and submitted on time for business review.
- Worked with Data Analysts to understand Business logic and User Requirements.
- Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
- Created reports for the Data Analysis using SQL Server Reporting Services.
- Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
- Created SQL queries to simplify migration progress reports and analyses.
- Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
- Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
- Analyzed data using data visualization tools and reported key features using statistic tools.
- Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
- Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
- Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
- Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
- Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
- Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
- Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
- Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
- Extracted data from different sources performing Data Integrity and quality checks.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Worked on V-lookups, Pivot tables, and Macros in Excel developed ad-hoc.
- Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
- Worked on creating Excel Reports which includes Pivot tables and Pivot charts.
- Collected, analyze and interpret complex data for reporting and/or performance trend analysis
Environment: SQL Server, MS Excel 2010, V-Look, T-SQL, SSRS, SSIS, OLAP, MS Power Point 2010
TECHNICAL SKILLS:
Big Data & Hadoop Ecosystem: Hadoop 3.0, Hbase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Cassandra 3.11
Web Services: SOAP, Restful
Data Modeling Tools: Erwin R9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16 and Oracle 12c
BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports
Project Execution Methodologies: Agile, Ralph Kimball and BillInmon’s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)
Databases: Oracle 12c, DB2, SQL Server.
IDEs: Eclipse, RAD, WASD, Net Beans.
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access
Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.
Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server
Version Tool: VSS, SVN, CVS.