Hadoop Developer Resume
Buffalo, NY
SUMMARY:
- More than 9 years of combined experience in Data Analysis and Data Science with proficiency in Data Extraction, Data Modelling, Statistical Modeling, Data Mining, Machine Learning and Data Visualization.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
- Proficient in Machine Learning algorithms and Predictive Modeling including Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, Gradient Boosting, SVM, KNN, K - mean clustering.
- Involved in data science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modelling, Optimization, Testing, and Deployment.
- Experienced in Natural Language Processing (NLP), Text Mining, Topic Modelling, Sentiment Analysis, Association Rules Analysis and Market Basket Analysis.
- Deep knowledge with Hadoop, Spark and experience with Big Data tools such as PySpark, Pig, Hadoop, and Hive.
- Experience in building machine learning solutions using PySpark for large sets of data on Hadoop System.
- Experience in using cloud services Amazon Web Services (AWS) including EC2, S3, AWS Lambda, EMR.
- Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and regularly accessed JIRA tool as internal issue tracker & for project development.
- Experienced in creating ETL packages, migrating data from, Flat File & Excel, cleaning data and backing up data files, and synchronizing daily transactions by using SSIS.
- Experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.
- Skilled in analyzing large e-commerce datasets (clickstream, order data, tracking data, competitive price changes, currency fluctuations) and optimize business goals.
- Quick learner in any new business industries or software environment to deliver the best solutions adapted to new requirements and challenges.
TECHNICAL SKILLS:
Statistical Methods: Hypothetical Testing, Exploratory Data Analysis, Confidence Intervals, ANOVA, Principal Component Analysis (PCA), Correlation Analysis
Machine Learning: Linear/Logistic Regression, Naïve Bayes, Decision Tree, Support Vector Machine, K-Means Clustering, Adaptive Boosting, Gradient Boosting, Random Forests, Deep Learning
Hadoop Ecosystem: Hadoop, Spark, MapReduce, Hive, Pig, HDFS
Cloud Services: Amazon Web Services (AWS) EC2/S3/Redshift
Databases: MS SQL Server, Oracle, MongoDB, Teradata
Data Visualization: Tableau, MatPlotLib, Seaborn, ggplot2
Languages: Python, R, T-SQL, XML, C#, Java, PL/SQL
PROFESSIONAL EXPERIENCE:
Confidential, Buffalo, NY
Hadoop Developer
Responsibilities:
- Involved in importing data from various data sources into HDFS using Sqoop and applying various transformations using Hive, apache Spark and then loading data into Hive tables.
- Responsible for Cluster maintenance; monitoring, commissioning and decommissioning Data nodes; and troubleshooting, managing & reviewing data backups and log files.
- Developed shell scripts to periodically perform incremental import of data from third party API to AWS.
- Analyzed the data using HiveQL to identify the different correlations and used core Java technologies to create Hive UDFs to use in the project.
- Worked on AWS to create, manage EC2 instances and Hadoop Clusters. Involved in connecting to target database to get data.
- Utilized AWS CloudWatch to monitor the performance environment instances for operational and performance metrics during load testing.
- POC involved in loading data from LINUX file system to AWS S3 and HDFS.
- Imported data from local file system, RDBMS into HDFS and Sqoop and developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Exported analyzed data onto RDBMS using Sqoop, in order to be used in Tableau to generate reports and also used Sqoop to transfer data between RDBMS and HDFS.
- Involved in creating a data lake by extracting customer data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
- Created MapReduce jobs for performing ETL transformations on the transactional and application specific data sources.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into Hadoop Distributed File Systems and PIG to pre-process the data.
- Responsible in analysis, design, testing phases and documenting technical specifications.
- Coordinated effectively with offshore team and managed project deliverables on time.
Confidential, Cincinnati, OH
Data Scientist
Responsibilities:
- Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions.
- Using machine learning and data mining algorithm to understand the patterns in large volumes of data, identify relationships detect data anomalies, and classify data sets.
- Processed data using Python Pandas to examine transaction data, identify outliers and inconsistencies and conducted exploratory data analysis using NumPy.
- Designing and building algorithms and predictive algorithm using techniques such as linear and logistic regression, support vector machines, ensemble models (random forest and/or gradient boosted trees), neural networks, and clustering techniques.
- Collaborated with data engineers and operation team to implement ETL process.
- Optimized SQL queries to perform data extraction to fit the analytical requirements.
- Developed MapReduce modules for machine learning & predictive analytics in Hadoop on AWS.
- Extracted data from database, copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
- Developed Data Discovery tools for inferring data structures from raw data. These tools became important part of ETL pipeline, as they could automatically generate schema for table creation.
- Created tools for discovering semantic equivalency between database tables, allowing intelligence fusions to be made between related tables.
- Used Grid Search to evaluate each model and to find best hyper-parameters for each model.
- Designed and implemented a recommendation system which utilized Collaborative filtering techniques to recommend course for different customers and deployed to AWS EMR cluster.
- Designed rich data visualization to model data into readable form with Tableau & Matplotlib.
- Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.
Confidential, Minneapolis, MN
Data Analyst
Responsibilities:
- Analyzed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements in Quality Center.
- Conducted UAT for most iterations by writing Test Cases & signing off with approval.
- Developed Informatica Mappings using various transformations and SQL Packages to extract, transformation and loading of data.
- Developed SQL scripts to validate the data loaded into Data warehouse and Data Mart tables using ETL Informatica.
- Designed a STAR schema for the detailed data marts and Plan data marts involving Confirmed dimensions.
- Created Test Cases and Helped ETL team in performing Smoke and Regression Testing.
- Created Source to Target mappings for Billing, Customer and Product related Dimensions and Facts.
- Created and maintained the Data Model repository as per company standards.
- Conduct design reviews with the business analysts and content developers to create a proof of concept for the reports.
- Performed Detailed Data Analysis (DDA), Data Quality Analysis (DQA) and Data Profiling on source data.
- Ensured the feasibility of the logical and physical design models.
- Collaborated with the Reporting Team to design Monthly Summary Level Cubes to Support the further aggregated level of detailed reports.
- Worked on the Snow-flaking the Dimensions to remove redundancy.
- Worked with the Implementation team to ensure a smooth transition from the design to the implementation phase.
- Involved in designing high level ETL architecture for overall data transfer from the OLTP to OLAP with the help of SSIS.
- Developed, managed and validated existing data models including logical and physical models of the data warehouse and source systems utilizing a 3NF model.
- Identify source systems, their connectivity, related tables and fields and ensure data suitably for mapping.
- Identify Data Quality Controls Review Business Requirements Document and attend project release meetings.
Confidential, St. Louis, MO
Business Data Analyst
Responsibilities:
- Developed Business Requirement Document (BRD), Functional Specification Document (FSD) as well as the High-level Project Plan.
- Gathered requirements for Service-Oriented Architecture (SOA) projects.
- Designed and developed Use Cases, Activity and data flow Diagrams, Sequence Diagrams, flow charts using UML.
- Understanding of Regulatory and Compliance processes for Insurance business.
- Participated in the research, development of business opportunities and brainstorming sessions for ideas within the scope of the project. Contributed to the definition of scope, preparation of work plans and definition of business requirements.
- Designed and developed project document templates and managed SDLC using the RUP methodology. Performed requirement gathering for the System using Requisite Pro.
- Designed and developed all Use Cases, UML models, Activity & Sequence diagrams using Rational Suite of products.
- Created business requirement specifications for Commercial Auto, General Liability, Inland Marine, Professional Liability, and Commercial Property.
- Performed Requirement and AS-IS GAP analysis.
- Helped in creating and Maintaining data mapping documents
- Performed Risk Analysis and identified risks rules and policies for IT and Business teams.
- Facilitated and managed meeting sessions aimed at JAD and project status updates with committee of SMEs from various business areas.
- Conducted interviews with business users to collect requirements & business process info.
- Involved in User Interface (UI) analysis with the business team to validate accuracy.
- Helped the QA team in testing.
- Provided customer support for any issues related to Asset Management tool.
- Functioned as the primary liaison between the business line, operations, and technical areas throughout the project cycle.
- Prepared project reports for management and assisted project managers in the development of weekly and monthly status reports, documented process flows, policies and procedures.
Confidential, Boise, ID
Data Analyst
Responsibilities:
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Identify business, functional & Technical requirements through meetings and JAD sessions.
- Used Date functions, String functions, Database functions to manipulate and format source data as a part of Data cleansing and Data profiling process.
- Attending meetings to understand project requirements and review of BRD documents.
- Analyzed the source systems to understand the source data relationships along with deeper understanding of business rules and data integration checks.
- Define the ETL mapping specification and Design the ETL process to source the data from sources and load it into DWH tables.
- Designed the logical and physical schema for data marts and integrated the legacy system data into data marts.
- Integrate DataStage Metadata to Informatica Metadata & create ETL mappings & workflows.
- Designed mapping and resolved performance bottlenecks in Source to Target, Mappings.
- Developed Mappings using Source Qualifier, Expression, Filter, Look up, Update Strategy, Sorter, Joiner, Normalizer and Router transformations.
- Involved in writing, testing, and implementing triggers, stored procedures and functions at Database level using PL/SQL.
- Developed Stored Procedures to test ETL Load per batch and provided performance optimized solution to eliminate duplicate records.
- Involved with the team on ETL design and development best practices.
- Build / improve relationships with development teams through proactive communication and face-to-face meetings.