Sr. Big Data Engineer Resume
Shelton, CT
SUMMARY:
- Over 8+ years of experience as a Sr. Big Data Engineer with skills in analysis, design, development, testing and deploying various software applications.
- Extensive experience in Technical consulting and end - to-end delivery with data modeling, data governance and design - development - implementation of solutions.
- Proficient in developing Entity-Relationship diagrams, Star/Snow Flake schema designs
- Good Writing Hive join query to fetch info from multiple tables, writing multiple jobs to collect output from Hive
- Excellent experience in system Analysis, ER Dimensional Modeling, Data Design and implementing RDBMS specific features.
- Hands on experience in importing, cleaning, transforming, and validating data and making conclusions from the data for decision-making purposes.
- Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
- Experience with data modeling and design of both OLTP and OLAP systems.
- Good Working with Big Data Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
- In depth knowledge of software development life cycle (SDLC), Waterfall, Iterative and Incremental, RUP, evolutionary prototyping and Agile/Scrum methodologies.
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Proficient Knowledge on creating dashboards/reports using reporting tools like Tableau, QlikView.
- Involve in the process of data acquisition, data pre-processing and data exploration of tele-communication project in Scala.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Implementing a distributing messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Excellent Performing in data validation and transformation using Python and Hadoop streaming.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT & Dimension tables.
- Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
- Experience working with Microsoft Server tools like SSAS, SSIS and in generating on-demand scheduled reports using SQL Server Reporting Services (SSRS).
- Well experience in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Responsible for providing primary leadership in designing, leading, and directing all aspects of UAT Testing for the Oracle data warehouse
- Excellent understanding of Microsoft BI toolset including Excel, Power BI, SQL Server Analysis Services, Visio, Access.
- Good Working on Apache Nifi as ETL tool for batch processing and real time processing.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata and BTEQ.
TECHNICAL SKILLS:
Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Apache Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j.
Data Modeling Tools: Erwin R9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16 and Oracle 12c.
Databases: Oracle 12c, DB2, SQL Server.
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access
BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports.
Project Execution Methodologies: Agile, Ralph Kimball and BillInmon s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)
Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server
IDEs: Eclipse, RAD, WASD, Net Beans.
Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.
Version Tool: VSS, SVN, CVS.
PROFESSIONAL EXPERIENCE:
Confidential - Shelton, CT
Sr. Big Data Engineer
Responsibilities:
- As a Sr. Big Data Engineer I am responsible for developing, troubleshooting and implementing programs.
- Developed in scheduling Oozie workflow engine to run multiple Hives and pig jobs.
- Installed and configured Big Data ecosystem like HBase, Flume, Pig and Sqoop.
- Primarily involved in Data Migration process using Azure by integrating with Bit bucket repository.
- Participated in daily SCRUM meetings and give the daily status report.
- Involved in all the phases of the Software Development Life Cycle (SDLC) methodologies such as Agile-SCRUM.
- Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
- Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
- Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Created and maintain the metadata (data dictionary) for the data models.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
- Worked on Apache Nifi as ETL tool for batch processing and real time processing.
- Involved in Reverse engineering on existing Data model to understand the data flow and business flow
- Performed data profiling and analysis applied various data cleansing rules designed data standards and designed the relational models.
- Involved in Creation of Microsoft Azure Cloud SQL Servers and Replication Severs.
- Involved in Manipulating, cleansing & processing data using Excel, and SQL and responsible for loading, extracting and validation of client data.
- Implemented monitoring and established best practices around usage of Elastic search.
- Utilized Integration Services (SSIS) to produce a Data Mapping and Data Mart for reporting.
- Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
- Worked with MDM systems team with respect to technical aspects and generating reports.
- Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented highly scalable and reliable distributed data design using NoSQL/Cassandra technology.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Created Hive tables on top of the loaded data and writing hive queries for ad-hoc analysis.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Environment: Azure, Pig 0.17, Hadoop 3.0, Agile, HBase 1.2, Flume 1.8, Sqoop 1.4, Data Migration, MDM, Oracle 12c, PL/SQL, Hive 2.3, HDFS, NoSQL, SQL, Elastic search, Cassandra 3.0, Apache Nifi 1.6, ETL, MongoDB
Confidential - Arlington, VA
Sr. Data Engineer
Responsibilities:
- As a Sr. Data Engineer, Responsible for building scalable distributed data solutions using Hadoop.
- Involved in all phases of SDLC and participated in daily scrum meetings with cross teams.
- Used Agile Methodology of Data Warehouse development using Kanbanize.
- Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
- Used Sqoop to import and export data into Hadoop distributed file system for further processing.
- Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Loaded multiple NOSQL databases including MongoDB, HBase and Cassandra.
- Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
- Worked on partitioning Hive tables and running scripts parallel to reduce run time of the scripts.
- Worked with the analysis teams and management teams and supported them based on their requirements.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Provided technical support during delivery of MDM (Master Data Management) components.
- Implemented Partitioning, Dynamic Partitions and Buckets in Hive for increasing performance benefit and helping in organizing data in a logical fashion.
- Implemented python scripts to parse XML documents and load the data in the databases.
- Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, and loaded final data into HDFS.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Utilized Hadoop, Hive and SQL technologies and moved data sets to production to be utilized by business teams to make business decisions.
- Designed and Developed Real time Stream processing Application using Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Worked closely with business analyst for requirement gathering and translating into technical documentation.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Wrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive
- Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
- Involved in reports development using reporting tools like Tableau.
- Wrote SQL scripts to run ad-hoc queries, Stored Procedures & Triggers and prepare reports to the management.
- Designed and developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Developed MapReduce modules for machine learning & predictive analytics in Hadoop.
- Created and executed SQL scripts to validate, verify and compare the source data to target table data.
Environment: Hadoop 3.0, Oracle 12c, Apache Hive 2.3, HDFS, AWS, Sqoop 1.4, SQL, ETL, MapReduce, Tableau, Agile, Scala, Kafka 1.1, SAS, HBase 1.2, MongoDB, Cassandra 3.0.
Confidential - Medford, Oregon
Data Analyst/Data Engineer
Responsibilities:
- Worked with the analysis teams and management teams and supported them based on their requirements.
- Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
- Conducted JAD sessions with stakeholders and software development team to analyze the feasibility of needs.
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS.
- Performed Data Analysis and Data Manipulation of source data from SQL Server and other data structures to support the business organization.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Designed and Developed ETL jobs to extract data from Sales force replica and load it in data mart in Redshift.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.
- Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
- Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
- Extensively used SAS procedures like means, frequency and other statistical calculations for Data validation.
- Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
- Worked with Business Analyst during requirements gathering and business analysis to prepare high level Logical Data Models and Physical Data Models.
- Conducted Design discussions and meetings to come out with the appropriate Data Mart using Kimball Methodology.
- Worked extensively on Data Quality (running Data Profiling, Examine Profile outcome) Metadata management
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
- Worked with reversed engineer Data Model from Database instance and Scripts.
- Worked on data mapping and data mediation between the source data table and target data tables using MS Excel.
- Performed data extraction, data analysis, data manipulation and prepared various production and ad-hoc reports to support cost optimization initiatives and strategies.
- Gathered business requirements from the users and transformed and implemented into database schemas.
Environment: Erwin 9.2, Hadoop 2.3, ETL, PL/SQL, OLTP, Apache Hive 2.1, Informatica, SAS, HDFS, NoSQL, ODS, MS Excel 2016.
Confidential - Peoria, IL
Data Analyst Data Modeler
Responsibilities:
- As a Sr. Data Modeler/Data Analyst I was responsible for all data related aspects of a project.
- Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
- Designed the ER diagrams, logical model and physical database as per business requirements using Erwin
- Assisted project with analytical techniques including data modeling, data mining techniques, regression.
- Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
- Performed Data Analysis on both source data and target data after transfer to Data Warehouse.
- Developed Data mapping Transformation and Cleansing rules for the Master Data Management involving OLTP and OLAP.
- Designed and developed the data dictionary and Metadata of the models and maintain them.
- Involved in extensive Data validation using SQL queries and back-end testing
- Provided PL/SQL queries to developer as source queries to identify the data provided logic to assign.
- Designed the data marts in dimensional data modeling using star and snowflake schemas.
- Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
- Analyzed and presented the gathered information in graphical format for the ease of business managers.
- Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required.
- Performed GAP analysis to analyze the difference between the system capabilities and business requirements.
- Used reverse engineering to connect to existing database and create graphical representation (E-R diagram).
- Developed Data Migration and Cleansing rules for the Integration Architecture using OLTP.
- Designed of Redshift Data model, Redshift Performance improvements/analysis
- Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
- Developed complex PL/SQL procedures and packages using views and SQL joins.
- Performed various ad-hoc analyses by extracting data from multiple source systems and creating comprehensive reports for end users.
Environment: E/R Diagram, SQL, PL/SQL, OLAP, OLTP, Metadata, SQL
Confidential
Data Analyst
Responsibilities:
- Worked with Data Analysts to understand Business logic and User Requirements.
- Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
- Created reports for the Data Analysis using SQL Server Reporting Services.
- Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
- Created SQL queries to simplify migration progress reports and analyses.
- Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
- Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
- Validated Data to check for the proper conversion of the data. Data cleansing to identify unnecessary data and clean, data profiling for accuracy, completeness, consistency.
- Used Python, Tableau to analyze the number of products per customer and sales in a category for sales optimization.
- Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
- Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
- Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Collected, analyze and interpret complex data for reporting and/or performance trend analysis
- Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
- Responsible for analyzing business requirements and developing Reports using PowerPoint, Excel to provide data analysis solutions to business clients.
- Designed and developed weekly, monthly reports by using MS Excel Techniques (Charts, Graphs, Pivot tables) and Power point presentations.
- Performed Data analysis and Data profiling using complex SQL on various sources systems.
- Provided on-demand ad-hoc reports used to assist in long and short-term budgeting and planning.
Environment: SQL Server, SSRS, Business Intelligence, MS Excel 2010, OLAP, OLTP, Tableau.