Sr. Big Data Engineer Resume
West Point, PA
SUMMARY:
- Above 7+ years of experience as Data Engineer/Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Hands on experience in SQL queries and optimizing the queries in Oracle, SQL Server, DB2, and Netezza & Teradata.
- Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
- Well versed with Data Migration, Data Conversions, Data Extraction/ Transformation/Loading (ETL)
- Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
- Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
- Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
- Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop
- Good experience in using SSRS and Cognos in creating and managing reports for an organization.
- Excellent experienced on NoSQL databases like MongoDB, Cassandra.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
TECHNICAL SKILLS:
Data Modeling Tools: ER/Studio 9.7/9.0, Erwin 9.7/9.6
AWS tools: EC2, S3 Bucket, AMI, RDS, Redshift.
Big Data: MapReduce, HBase, Pig, Hive, Impala, Sqoop, Pig, Hive
Reporting Tools: SSRS, Power BI, Tableau, SSAS, MS-Excel, SAS BI Platform.
Operating System:: Windows, Unix, Linux
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau and Pentaho.
Statistics:: Decision Trees, Regression Models, KNN, K Means Clustering, PCA, Na ve Bayes
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server, Netezza.
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model
PROFESSIONAL EXPERIENCE:
Confidential, West Point, PA
Sr. Big Data Engineer
Responsibilities:
- As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Designed and develop end to end ETL processing from Oracle to AWS using Amazon S3, EMR, and Spark.
- Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
- Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
- Implemented business logic by writing UDFs and configuring CRON Jobs.
- Extensively involved in writing PL/SQL, stored procedures, functions and packages.
- Created logical and physical data models using Erwin and reviewed these models with business team and data architecture team.
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures.
- Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
- Managed the Metadata associated with the ETL processes used to populate the Data Warehouse.
- Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools like Erwin.
- Developed the Star Schema/Snowflake Schema for proposed warehouse models to meet the requirements.
- Designed class and activity diagrams using Power Designer and UML tools like Visio.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Developed Spark streaming application to pull data from cloud to Hive table.
- Used Spark SQL to process the huge amount of structured data.
- Used Talend for Big data Integration using Spark and Hadoop
- Used Microsoft Windows server and authenticated client server relationship via Kerberos protocol.
- Assigned name to each of the columns using case class option in Scala.
- Created data mapping documents mapping Logical Data Elements to Physical Data Elements and Source Data Elements to Destination Data Elements.
- Written SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
- Involved in Manipulating, cleansing & processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.
- Created sheet selector to accommodate multiple chart types (Pie, Bar, Line etc) in a single dashboard by using parameters.
- Published Workbooks by creating user filters so that only appropriate teams can view it.
- Worked on SAS Visual Analytics & SAS Web Report Studio for data presentation and reporting.
- Extensively used SAS/Macros to parameterize the reports so that the user could choose the summary and sub-setting variables to be used from the web application.
- Resolved the data related issues such as: assessing data quality, testing dashboards, evaluating existing data sources.
- Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams
Environment: Hive 2.3, MapReduce, Hadoop 3.0, HDFS, Oracle, Spark 2.3, HBase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, Python, PL/SQL, NoSQL, SSIS, SSRS, Visio, AWS Redshift, Teradata, Python, SQL, PostgreSQL, EC2, S3, Windows, Pl/Sql
Confidential - Washington, DC
Sr. Data Analyst/Data Engineer
Responsibilities:
- Worked with the analysis teams and management teams and supported them based on their requirements.
- Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL/Teradata) using SAS/SQL, SAS/macros.
- Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
- Used Agile (SCRUM) methodologies for Software Development.
- Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
- Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.
- Developed live reports in a drill down mode to facilitate usability and enhance user interaction
- Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
- Wrote Python scripts to parse XML documents and load the data in database.
- Used Python to extract weekly information from XML files.
- Developed Python scripts to clean the raw data.
- Worked on AWS CLI to aggregate clean files in Amazon S3 and also on Amazon EC2 Clusters to deploy files into Buckets.
- Used AWS CLI with IAM roles to load data to Redshift cluster,
- Responsible for in depth data analysis and creation of data extract queries in both Netezza and Teradata databases
- Extensive development in Netezza platform using PL SQL and advanced SQLs.
- Validated regulatory finance data and created automated adjustments using advanced SAS Macros, PROC SQL, UNIX (Korn Shell) and various reporting procedures.
- Designed reports in SSRS to create, execute, and deliver tabular reports using shared data source and specified data source. Also, Debugged and deployed reports in SSRS.
- Optimized the performance of queries with modification in TSQL queries, established joins and created clustered indexes
- Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading.
- Development of routines to capture and report data quality issues and exceptional scenarios.
- Creation of Data Mapping document and data flow diagrams.
- Developed Linux Shell scripts by using Nzsql/Nzload utilities to load data from flat files to Netezza database.
- Involved in generating dual-axis bar chart, Pie chart and Bubble chart with multiple measures and data blending in case of merging different sources.
- Developed dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
- Created Dashboards style of reports using QlikView components like List box Slider, Buttons, Charts and Bookmarks.
- Coordinated with Data Architects and Data Modelers to create new schemas and view in Netezza for to improve reports execution time, worked on creating optimized Data-Mart reports.
- Worked on QA the data and adding Data sources, snapshot, caching to the report
- Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.
Environment: SAS, SQL, Teradata, Oracle, PL/SQL, UNIX, XML, Python, AWS, SSRS, TSQL, Hive, Sqoop
Confidential - Houston, TX
Sr. Data Analyst/Data Modeler
Responsibilities:
- Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
- Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
- Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
- Gather all the analysis reports prototypes from the business analysts belonging to different Business units; Participated in JAD sessions involving the discussion of various reporting needs.
- Reverse Engineering the existing data marts and identified the Data Elements (in the source systems), Dimensions, Facts and Measures required for reports.
- Conduct Design discussions and meetings to come out with the appropriate Data Warehouse at the lowest level of grain for each of the Dimensions involved.
- Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints.
- Designed a STAR schema for sales data involving shared dimensions (Conformed) for other subject areas using Erwin Data Modeler.
- Created and maintained Logical Data Model (LDM) for the project. Includes documentation of all entities, attributes, data relationships, primary and foreign key structures, allowed values, codes, business rules, glossary terms, etc.
- Validated and updated the appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change.
- Conduct Design reviews with the business analysts and content developers to create a proof of concept for the reports.
- Ensured the feasibility of the logical and physical design models.
- Worked on the Snow-flaking the Dimensions to remove redundancy.
- Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
- Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Involved in Data profiling and performed Data Analysis based on the requirements, which helped in catching many Sourcing Issues upfront.
- Developed Data mapping, Data Governance, Transformation and Cleansing rules for the Data Management involving OLTP, ODS and OLAP.
- Created data masking mappings to mask the sensitive data between production and test environment.
- Normalized the database based on the new model developed to put them into the 3NF of the data warehouse.
- Used SQL tools like Teradata SQL Assistant and TOAD to run SQL queries and validate the data in warehouse.
- Created SSIS package for daily email subscriptions to alert Tableau subscription failure using the ODBC driver and PostgreSQL database.
- Designed logical and physical data models, Reverse engineering, Complete compare for Oracle and SQL server objects using Erwin.
- Construct complex SQL queries with sub-queries, inline views as per the functional needs in the Business Requirements Document (BRD).
- Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
- Involved in designing and developing SQL server objects such as Tables, Views, Indexes (Clustered and Non-Clustered), Stored Procedures and Functions in Transact-SQL.
- Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
Environment: PL/SQL, Erwin8.5, MS SQL 2008, OLTP, ODS, OLAP, OLTP, ODS, OLAP, SSIS, Tableau, ODBC, Transact-SQL, TOAD, Teradata SQL Assistant
Confidential
Data Analyst
Responsibilities:
- Worked with Data Analysts to understand Business logic and User Requirements.
- Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
- Created reports for the Data Analysis using SQL Server Reporting Services.
- Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
- Created SQL queries to simplify migration progress reports and analyses.
- Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
- Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
- Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
- Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
- Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
- Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
- Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
- Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
- With V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
- Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
- Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
- Extracted data from different sources performing Data Integrity and quality checks.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Collected, analyze and interpret complex data for reporting and/or performance trend analysis
- Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
Environment: SQL Server, MS Excel, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint