Sr. Hadoop, Bigdata Engineer Resume
New York, NY
SUMMARY:
- Result - driven IT Professional with 9+ years of experience in Data Analysis, Data modeling and Big Data professional with applied information Technology.
- Thorough understanding on End to End Software Development Life Cycle process in Agile Environment using SCRUM methodologies.
- Excellent understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
- Strong knowledge in Developing Big data solutions using Data ingestion, Data Storage
- Good Knowledge with cloud technologies like Azure and AWS (EMR, S3, RedShift, EC2, DynamoDB).
- Proficient in Technical consulting and end-to-end delivery with data analysis, data modeling, data governance and design - development - implementation of solutions.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Experience in NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
- Proficient experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
- Effective in addressing complex POCs according to business requirements from the technical end.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, and MDM.
- Good experience on different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Good knowledge in using apache Nifi to automate the data movement between different Hadoop systems.
- Good knowledge on Normalization and De-normalization concepts and design methodologies like Ralph Kimball and Bill Inman’s Data Warehouse methodology.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems
- Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, and XML files.
- Excellent experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies.
- Good expertise knowledge with the UNIX commands like changing the permissions of the file to file and group permissions.
- Strong background in mathematics and have very good analytical and problem solving skills.
TECHNICAL SKILLS:
Big Data Ecosystem: Map Reduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Cloud Management: Amazon Web Services (AWS), Amazon RedShift
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Programming Languages: SQL, PL/SQL, UNIX Shell Scripting
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2
Testing & Defect Tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio 2016 & Visual Source Safe
Operating System: Windows 7/8/10, UNIX, Sun Solaris
ETL/Data warehouse Tools: Informatica v10, SAP Business Objects Business Intelligence 4.2 Service Pack 03, TalenD, Tableau, and Pentaho
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model
Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6
PROFESSIONAL EXPERIENCE:
Confidential - New York, NY
Sr. Hadoop, BigData Engineer
Roles & Responsibilities:
- Participated in Sprint review/retro meetings, daily SCRUM meetings and give the daily status report.
- Involved in all the phases of the Software Development Life Cycle (SDLC) methodologies such as Waterfall, Agile-SCRUM.
- Responsible for developing, troubleshooting and implementing programs.
- Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
- Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
- Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
- Developed in scheduling Oozie workflow engine to run multiple Hives and pig jobs.
- Developed Oozie workflow jobs to execute hive, Sqoop and Map Reduce actions.
- Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Created external tables pointing to HBase to access table with huge number of columns.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
- Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
- Involved in converting MapReduce programs into Spark transformations using Spark python API.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
- Involved in importing and exporting data between RDBMS and HDFS using Sqoop.
- Managed data from various file system to HDFS using UNIX command line utilities.
- Implemented monitoring and established best practices around usage of Elastic search
- Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Migrated the needed data from MySQL in to HDFS using Sqoop and importing various formats of flat files into HDFS.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
- Worked on Apache Nifi as ETL tool for batch processing and real time processing.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Created Hive tables on top of the loaded data and writing hive queries for ad-hoc analysis.
Environment: Hadoop 3.0, HDFS, Apache Hive2.3, MapReduce, Apache Nifi1.6, Apache Pig 0.17, MongoDB, Sqoop1.4, SQL, Oracle 12c, PL/SQL, Agile, Azure, Yarn, Oozie 4.3, ETL
Confidential - Greensboro, NC
Sr. BigData Engineer
Roles & Responsibilities:
- Provided technical expertise and aptitude to BigData technologies as they relate to the development of analytics.
- Participated in JAD meetings to gather the requirements and understand the End Users System.
- Participated in requirements sessions to gather requirements along with business analysts and product owners.
- Involved in Agile development methodology active member in scrum meetings.
- Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
- Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
- Architected, Designed and Developed Business applications and Data marts for reporting.
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Developed Big Data solutions focused on pattern matching and predictive modeling
- Installed and configured Hadoop Ecosystem components.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Worked Platform using Hive, Sqoop, HBASE .This effort showcased the benefits of the Horton works
- Prepared process flow/activity diagram for existing system using MS Visio and re- engineer the design based on business requirements.
- Utilized Hadoop, Hive and SQL technologies and moved data sets to production to be utilized by business teams to make business decisions.
- Designed and Developed Real time Stream processing Application using Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Worked closely with business analyst for requirement gathering and translating into technical documentation.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Handled performance requirements for databases in OLTP and OLAP models.
- Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
- Involved in Manipulating, cleansing & processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.
- Wrote SQL scripts to run ad-hoc queries, Stored Procedures & Triggers and prepare reports to the management.
- Designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
- Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
- Involved in reports development using reporting tools like Tableau.
Environment: PL/SQL, Agile, AWS, SQL, HDFS, SSAS, SSRS, Sqoop 1.2, Apache Pig 0.16, Kafka, Scala 1.4, MapReduce, OLAP, OLTP, HBase, Amazon Redshift, MS Visio
Confidential - McLean, VA
Data Analyst/Data Engineer
Roles & Responsibilities:
- Worked with the analysis teams and management teams and supported them based on their requirements.
- Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL/Teradata) using SAS/SQL, SAS/macros.
- Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
- Conducted JAD sessions with stakeholders and software development team to analyze the feasibility of needs.
- Extensively experienced with Agile and SCRUM programming methodology
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS.
- Performed Data Analysis and Data Manipulation of source data from SQL Server and other data structures to support the business organization.
- Performed data analysis and data profiling using on various sources systems including Oracle, SQL Server and DB2.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
- Created data models for AWS Red shift, Hive and HBase from dimensional data models.
- Created Data Validation rules using SQL to validate the structure and integrity of the extracted data.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
- Worked with reversed engineer Data Model from Database instance and Scripts.
- Worked on data mapping and data mediation between the source data table and target data tables using MS Access and MS Excel.
- Performed data extraction, data analysis, data manipulation and prepared various production and ad-hoc reports to support cost optimization initiatives and strategies.
Environment: HDFS, Sqoop, Apache Pig, AWS, Amazon Red shift, Apache Hive, Map Reduce, ODS, OLTP, HBase, SQL Server
Confidential - Dallas, TX
Data Analyst/Data Modeler
Roles & Responsibilities:
- Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
- Designed the ER diagrams, logical model and physical database and for Oracle and Teradata as per business requirements using Erwin
- Designed 3rd normal form target data model and mapped to logical model.
- Developed Data mapping Transformation and Cleansing rules for the Master Data Management involving OLTP and OLAP.
- Used MS Access, MS Excel, Pivot tables and charts, MS PowerPoint, MS Outlook, MS Communicator and User Base to perform responsibilities.
- Designed reports in Access, Excel using advanced functions not limited to pivot tables, formulas
- Used SQL, PL/SQL to validate the Data going in to the Data warehouse
- Wrote complex SQL, PL/SQL testing scripts for Backend Testing of the data warehouse application. Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
- Designed and developed the data dictionary and Metadata of the models and maintain them.
- Involved in extensive DATA validation using SQL queries and back-end testing
- Tested the database to check field size validation, check constraints, stored procedures and cross verifying the field size defined within the application with metadata.
- Designed and developed cubes using SQL Server Analysis Services(SSAS) using Microsoft Visual
- Provided PL/SQL queries to developer as source queries to identify the data provided logic to assign.
- Designed the data marts in dimensional data modeling using star and snowflake schemas.
- Analyzed and presented the gathered information in graphical format for the ease of business managers.
- Developed and maintained the data dictionaries, Naming Conventions, Standards, and Class words Standards Document.
- Performed GAP analysis to analyze the difference between the system capabilities and business requirements.
- Produced Source to target data mapping by developing the mapping spreadsheets.
- Created documentation and test cases, worked with users for new module enhancements and testing.
Environment: Erwin9.5, Agile, Oracle11g, ETL, SSIS, Teradata, SSAS, PL/SQL, OLTP, OLAP, DBA, AWS
Confidential
Data Analyst
Roles & Responsibilities:
- Worked with Data Analysts to understand Business logic and User Requirements.
- Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
- Created reports for the Data Analysis using SQL Server Reporting Services.
- Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
- Created SQL queries to simplify migration progress reports and analyses.
- Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
- Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
- Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
- Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
- Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
- Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
- Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
- Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
- Using V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
- Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
- Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
- Extracted data from different sources performing Data Integrity and quality checks.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Collected, analyze and interpret complex data for reporting and/or performance trend analysis
- Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
Environment: SQL Server, MS Excel, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint