We provide IT Staff Augmentation Services!

Data Scientist Resume

Charlotte, NC

PROFESSIONAL SUMMARY:

  • Above 8+ years of experience in Data mining with large datasets of Structured (Data Warehouse applications using Informatica, Oracle and Teradata) and Unstructured data, Data Acquisition, Data Validation, Predictive Modeling, Data Visualization, Business Intelligence.
  • Strong experience in Business, Data Profiling, Data Migration, Data Conversion, Data Quality, Data Integration, Metadata Management Services and Configuration Management.
  • Developed Key Performance Indicator (KPI) dashboards and different types of reports like Parameterized reports, DrillDown reports, Drill - through reports, and Sub - Reports.
  • Analyzing large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Having Hadoop/BigData related technology experience in Storage, Querying, Processing and analysis of data.
  • Proficient in design and development of MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Developed MapReduce/SparkPython modules for machine learning & predictive analytics in Hadoop on AWS.
  • Extensively worked on Python3.5/2.7 (NumPy, Pandas, Matplotlib, Tensorflow, NLTK and Scikit-learn).
  • Performed configuration, deployment and support of cloud services including AmazonWebServices (AWS).
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, DataMining solutions to various business problems and generating data visualizations using R, Python, and Tableau
  • In-depth knowledge of Hadoop ecosystem components like Pig, Hive, Sqoop, Flume, Oozie.
  • Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles.
  • Adept in statistical programming languages like R andPython including BigData technologies like Hadoop, Hive.
  • Ability to extract Web search and data collection, Web data mining, Extract database from the website, Extract Data entry and Data processing.
  • Build servers using AWS which includes importing necessary volumes, launching the EC2 instance, creating security groups, auto-scaling, load balancers, Route53 and SNS as per the architecture.
  • Developed Cloud Formation template stacks to automate building new VPC's using JSON files.
  • Good experience of AWSElasticBlock Storage (EBS), different volume types and use of various types of EBS volumes based on requirement.
  • Extensively worked on using major statistical analysis tools such as R, SQL, SAS, and MATLAB
  • Experience in using HCanalogy for Hive, Pig, and HBase. Exposure to NoSQL databases HBase and Cassandra
  • Strong with performance improvement with very large datasets in SAS. Using SAS/SQL for extract, transform and load ETL methodology and processes
  • Have excellent knowledge ofPython Collections and Multi-threading.
  • Experience in Normalization and De-Normalization techniques for both OLTP and OLAP systems in creating Database Objects like tables, Constraints (Primary key, Foreign Key, Unique, Default), Indexes.
  • Independent, Self-starter, enthusiastic team player with strong adaptability to new technologies
  • Expertise in creating complex SSRSreportsagainstOLTP, OLAP databases.
  • Developed Data Warehouse DataMart systems, using various RDBMS (Oracle, MS-SQLServer, Mainframes, Teradata and DB2)
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe with JSON and Avro.
  • Experience on advanced SAS programming techniques, such as PROCSQL (JOIN/ UNION), PROCAPPEND, PROCDATASETS, and PROCTRANSPOSE.
  • Developing Logical Data Architecture with adherence to Enterprise Architecture.
  • Experience on bootstrapping and maintaining AWS using Chef on complex hybrid IT infrastructure nodes through the VPN and Jump Servers.
  • Good experience in Networking in AWS, VPC, and Datacenter to Cloud Connectivity, Security Groups, Route Tables and ACL's in AWS.
  • Worked on Performance Tuning of Hadoop jobs by applying techniques such as Map Side Joins, Partitioning, and Bucketing.
  • Excellent communication and interpersonal skills with the ability to develop creative solutions for challenging business needs.

TECHNICAL SKILLS:

Data Analytics Tools: Python (numpy, scipy, pandas, Gensim, Keras), R (Caret, Weka, ggplot), MATLAB.

Analysis & Modelling Tools: Erwin, Sybase Power Designer, Oracle Designer, Erwin, Rational Rose, ER/Studio, TOAD, MS Visio, SAS, Django, Flask, pip, NPM, Node JS, Spring MVC.

Machine Learning Frameworks: Spark ML, Kafka, Spark MiLB, Scikit-Learn & NLTK.

Big Data Tools: Hadoop, Map Reduce, SQOOP, Pig, Hive, NOSQL, Spark, Apache Kafka, Shiny, Yarn, Data Frames, pandas, ggplot2, Sklearn, Theano, Cuda, Azure, HD Insight, etc.

ETL Tools: Informatica Power Centre, Data Stage 7.5, Ab Initio, Talend.

Programming Languages: SQL, PL/SQL, T-SQL, XML, HTML, UNIX Shell Scripting, Microsoft SQL Server, Oracle, Python, Scala, C, C++, AWK, JavaScript.

R Package: dplyr, sqldf, data table, Random Forest, gbm, caret, elastic net and all sort of MachineLearning Packages.

Databases: SQL Server, Linked Servers, DTS Packages, SSIS,, PL/ SQL, MS SQL Server, Hyperion Essbase, Teradata, DB2 UDB, Netezaa, Sybase ASE, Informix, AWS RDS, Cassandra, and MongoDB, PostgreSQL .

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

Tools & Software: SAS/STAT, SAS/ETS, SAS E-Miner, SPSS, R, Advance R, TOAD, MS Office, BTEQ, Teradata SQL Assistant.

Methodologies: Ralph Kimball, COBOL.

Version Control: Git, SVN.

Reporting Tools: Business ObjectsXIR 2/6.5/5.0/5.1, Cognos Impromptu 7.0/6.0/5.0, Informatica Analytics Delivery Platform, Micro Strategy, SSRS, Tableau.

Operating Systems: Windows 2007/8, UNIX (Sun-Solaris, HP-UX), Windows NT/XP/Vista, MSDOS.

PROFESSIONAL SUMMARY:

Confidential, Charlotte, NC

Data Scientist

Responsibilities:

  • Analyze and Prepare data, identify the patterns on thedataset by applying historical models. Collaborating with Senior Data Scientists for understanding of data
  • Perform data manipulation, data preparation, normalization, and predictive modeling. Improve efficiency and accuracy by evaluating model in R.
  • This project was focused on customer segmentation based on machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation.
  • Used R and Python for programming for improvement of themodel. Upgrade the entire models for improvement of the product.
  • Built price elasticity model for various product and services bundled offering
  • Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering
  • Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, SparkStreaming, MLlib, R, a broad variety of machine learning methods including classifications, regressions, dimensionality reduction etc.
  • Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions prototyping and experimentation ML/DL algorithms and integrating into production system for different business needs
  • Worked on Multiple datasets containing two billion values which are structured and unstructured data about web applications usage and online customer surveys
  • Good hands on experience on AmazonRedshift platform
  • Performed Data cleaning process applied Backward - Forwardfillingmethods on dataset for handling missing values
  • Design, built and deployed a set of python modeling APIs for customer analytics, which integrates multiple machine learning techniques for various user behavior predictionand supports multiple marketing segmentation programs
  • Segmented the customers based on demographics using K-means Clustering.
  • Explored different regression and ensemble models in machine learning to perform forecasting
  • Presented Dashboards to Higher Management for more Insights using PowerBI.
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
  • Worked with using a different kind of compression techniques to save data and optimize data transfer over thenetwork using LZO, Snappy, andGZip etc.
  • Analyze large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, HiveUDF, Pig, Sqoop, Zookeeper, &Spark.
  • Developed custom aggregate functions using SparkSQL and performed interactive querying.
  • Connected Tableau from client end with AWSIP addresses and view the end results
  • Performed Boosting method on predicted model for the improve efficiency of the model
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom, visualization tools using R, Tableau, and PowerBI.
  • Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports.

Environment: MS SQL Server, R/R studio, SQL Enterprise Manager, Python, Redshift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS Office, Outlook, AS E-Miner, Power BI, K-means.

Confidential, Burlington, MA

Data Scientist

Responsibilities:

  • Used various approaches to collect the business requirements and worked with the business users for ETL application enhancements by conducting various JRD sessions to meet the job requirements
  • Performed exploratory data analysis like calculation of descriptive statistics, detection of outliers, assumptions testing, factor analysis, etc., in R
  • Worked exclusively on making applications more scalable and highly available system in AWS (load balancing) with full automation.
  • Extracted data from the database using SAS/Access, SAS/SQL procedures and created SAS datasets for statistical analysis, validation, and documentation.
  • Extensively understanding BI, analytics focusing on consumer and customer space
  • Innovate and leverage machine learning, data mining and statistical techniques to create new, scalable solutions for business problems.
  • Extensive experience in working with TableauDesktop, TableauServer, and TableauReader in various versions of Tableau9.2 and 10 as a Developer and Analyst.
  • Analyzed different types of data to derive insights about relationships between locations, statistical measurements and qualitatively assess the data using R/R Studio
  • Performed Data Profiling to assess data quality using SQL through complex internal database
  • Improved sales and logistic data quality by data cleaning using NumPy, SciPy, Pandas in Python
  • Designed data profiles for processing, including running SQL, PL/SQL queries and using R for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks.
  • Used R to generate regression models to provide statistical forecasting
  • Used drilldowns, filter actions and highlight actions in Tableau for developing dashboards in Tableau.
  • Applied Clustering Algorithms such as K-Means to categorize customers into certain groups
  • Performed data management, including creating SQLServer Report Services to develop reusable code and an automatic reporting system and designed user acceptance test to provide end with an opportunity to give constructive feedback
  • Used Tableau and designed various charts and tables for data analysis and creating various analytical Dashboards to showcase the data to managers.
  • Created AMI images of critical ec2 instances as backup using AWSCLI and GUI.
  • Created AWS Cloud formation templates on creating IAMRoles& total architecture deployment end to end (Creation of EC2 instances & its infrastructure)
  • Isolating customer behavioral patterns by analyzing millions of customer data records over a period and correlating multiple customers' attributes.
  • Empowered decision makers with data analysis dashboards using Tableau and PowerBI.

Environment: R/R Studio, SAS, SSRS, SSIS, Oracle Database 11g, Oracle BI tools, Tableau, MS-Excel, Python, Naive Bayes, SVM, K- means, ANN, Regression, MS Access, SQL Server Management Studio, SAS E-Miner.

Confidential, Charlotte, North Carolina

Data Scientist

Responsibilities:

  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed MapReduce/SparkPython modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naiveBayes, RandomForests, K-means, &KNN for data analysis.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, BigData environments.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
  • Developed MapReduce/SparkPython modules for machine learning & predictive analytics in Hadoop on AWS.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Worked with various Teradata15 tools and utilities like TeradataViewpoint, MultiLoad, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, SparkStreaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Designed and implemented system architecture for AmazonEC2 based cloud-hosted solution for client.
  • Tested Complex ETLMappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, Informatica, Java, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential, San Francisco

Data Analyst

Responsibilities:

  • Monitor the Database for duplicate records. Merge the duplicate records and ensure that the information is associated with company records.
  • Standardize company names, addresses, and ensure that necessary data fields are populated.
  • Review the database proactively t3 identify inconsistencies in the data, conduct research using internal and external sources to determine information is accurate.
  • Resolve the data issues by following up with the end user.
  • Coordinate activities and workflow with other data Stewards in the firm to ensure data changes are done effectively and efficiently.
  • Extract the data from database and provide data analysis using SQL to the business user based on the requirements. Create pivots and charts in excel sheet to report data in the format requested
  • Used VBA for excel to automate the data entry forms to help standardize data
  • Assist CRM Analyst with Email Marketing Campaigns, including Client Publications, Newsletters, and announcements.
  • Assist other data Stewards with Data Change Management (DCM) Inbox in resolving various tickets created by the User Change Request in Interaction Database.
  • Developed and Created Logical and Physical Database Architecture using ERWIN.
  • Designed STARSchemas for the detailed data Marts and plan Data Marts involving Shared Dimensions.
  • Coordinated with different users in UAT process.
  • Conduct Design reviews with the business analysts and content developers to create a proof of concept for the reports.
  • Ensured the feasibility of the logical and physical design models.
  • Conducted the required GAP analysis between their AS-IS submission process and TO-BE Encounter Data Submission Process.

Environment: MS Outlook, MS Project, MS Word, MS Excel, MS Visio, MS Access, Power MHS, Citrix, Clarity, MS SharePoint.

Confidential

SQL Developer

Responsibilities:

  • Monitor the Database for duplicate records. Merge the duplicate records and ensure that the information is associated with company records.
  • Standardize company names, addresses, and ensure that necessary data fields are populated.
  • Review the database proactively t3 identify inconsistencies in the data, conduct research using internal and external sources to determine information is accurate.
  • Resolve the data issues by following up with the end user.
  • Coordinate activities and workflow with other data Stewards in the firm to ensure data changes are done effectively and efficiently
  • Extract the data from database and provide data analysis using SQL to the business user based on the requirements. Create pivots and charts in excel sheet to report data in the format requested
  • Used VBA for excel to automate the data entry forms to help standardize data
  • Assist CRM Analyst with Email Marketing Campaigns, including Client Publications, Newsletters, and announcements.
  • Assist other data Stewards with data Change Management (DCM) Inbox in resolving various tickets created by the User Change Request in Interaction Database.
  • Developed and Created Logical and Physical Database Architecture using ERWIN.
  • Designed STAR Schemas for the detailed data Marts and plan Data Marts involving Shared Dimensions.
  • Coordinated with different users in UAT process.
  • Conduct Design reviews with the business analysts and content developers to create a proof of concept for the reports.
  • Ensured the feasibility of the logical and physical design models.
  • Conducted the required GAP analysis between their AS-IS submission process and TO-BE Encounter Data Submission Process.

Environment: MS Outlook, MS Project, MS Word, MS Excel, MS Visio, MS Access, Power MHS, Citrix, Clarity, MS SharePoint.

Confidential

Data Architecture

Responsibilities:

  • Configured the project on WebSphere 6.1 application servers.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, and AJAX.
  • Communicated with other Health Care info by using Web Services with the help of SOAP, WSDLJAX-RPC.
  • Implemented the online application by using CoreJava, Jdbc, JSP, Servlets and EJB1.1, WebServices, SOAP, WSDL.
  • Used SAX and DOM parsers to parse the raw XML documents.
  • Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
  • Preparing and executing Unit test cases.
  • Used RAD as Development IDE for web applications.
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Used Log4J logging framework to write Log messages with various levels.
  • Doing functional and technical reviews.
  • Implemented MicrosoftVisio and RationalRose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application.
  • Guaranteeing quality in the deliverables.
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Implemented the project in Linux environment.
  • Created test plan documents for all back-end database modules.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Hire Now