We provide IT Staff Augmentation Services!

Data Scientist Resume

Chicago, IL

SUMMARY

  • 6 Plus years of relevant work experience as a Data Scientist, including deep expertise and experience with Statistical Analysis, Data Mining and Machine Learning Skills using R, Python and SAS.
  • Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever evolving regulatory environment.
  • Professional working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K - Means Clustering and Association Rules.
  • Experience with analyzing online user behavior, Conversion Data (A/B Testing) and customer journeys, funnel analysis.
  • Strong Data Analysis skills using business intelligence, SQL and / or MS Office Tools.
  • Experience working in Agile/Scrum Methodologies to accelerate Software Development iteration.
  • Experience in applying predictive modeling and machine learning algorithms for analytical reports.
  • Profound Analytical and problem solving skills along with ability to understand current business process and implement efficient solutions to issues/problems.
  • Experience using technology to work efficiently with datasets such as scripting, data cleansing tools, statistical software packages.
  • Strong understanding of how analytics supports a large organization including being able to successfully articulate the linkage between business objectives, analytical approaches findings and business decisions.
  • Excellent analytical skills with demonstrated ability to solve problems.
  • Mastery of R programming/data processing experience knowledge in SPSS.
  • Hands on experience in writing queries in SQL and R to Extract, Transform and Load (ETL) data from large datasets.
  • Experience with traditional analytics tools (Excel, Tableau and Qlikview).
  • Working experienced of statistical analysis using R, SPSS, Matlab and Excel.
  • Ability to work with large transactional databases across multiple platforms (Teradata, Oracle, HDFS, SAS).
  • High Proficiency in Excel including complex data analysis and manipulation.
  • A results-driven individual with a passion for data/analytics who can work collaboratively with others to solve business problems that drive business growth.
  • Demonstrated leadership and self-direction. Demonstrated willingness to both teach others and learn new techniques.
  • Extensive experience in querying languages using SQL, PL/SQL, T-SQL, SAS, HIVE.
  • Migrated data platform to AWS EMR, Redshift, S3, Kinesis to ingest streaming data and real-time analytics using Pig, Hive, Spark
  • Hands on experience in configuring and using ecosystem components likeBigdata, HDFS, MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Hbase and Zookeeper.
  • Proficient in Data Analysis with sound knowledge in extraction of data from various database sources like MySQL, MSSQL, Oracle, Teradata and other database systems.
  • Experience in developing predictive models like Decision trees, Interactive decision tree, Gradient boosting, Regression, Neural networks etc using SAS enterprise miner.
  • Extensive knowledge of advancedSAS/STATprocedures including PROC REPORT,PROC TABULATE,PROC CORR, PROC GLM, PROC ANOVA, PROC LOGISTIC, PROC TTEST, PROC GPLOT, PROC REG, PROC FREQ, PROC MEANS, PROC UNIVARIATE.
  • Automate the process of data extraction, transformation and processing using shell scripts.
  • Experience in developing programs by using SQL, SAS & shell scripts and scheduling the processes to run on a regular basis.
  • Worked in creating different Visualizations inTableauusing Bar charts, Line charts, Pie charts, Maps, Scatter Plot charts, Heat maps and Table reports.
  • Extensive experience in various reporting objects like Facts, Attributes, Hierarchies, Transformations, Filters, Prompts, Calculated Fields, Sets, Groups, Parameters inTableau.
  • Experience in creating Ad-hoc reports, subscription reports by using Report Builder in SSRS.

TECHNICAL SKILLS

Programming: C, C++, JAVA, R, Python, Mainframes, SAS, Hive, UNIX, Java script, Html, Xml

SAS Tools/ Procedures: SAS/EG, SAS Enterprise miner, SAS/BASE, SAS/MACROS, SAS/STAT, SAS/GRAPH, SAS/SQL, SAS/ACCESS, SAS/ODS, PROC/PRINT, CONTENTS, MEANS, CHART, PLOT,TABULATE, SUMMARY, SORT, SQL, FORMAT, FREQ, UNIVARIATE, T-TEST, IMPORT, EXPORT AND DATASETS

Statistical Software: Statistical Software SPSS, R, SAS

Techniques: Machine learning, Regression, Clustering, Data mining.

Predictive Analytics: Decision tree, Interactive decision tree, Regression, Gradient boosting, Neural networks etc

Hadoop/Bigdata: Machine Learning Naive Bayes, Decision trees, Regression models, Random Forests, Time-series, K-means HDFS, Mapreduce, HBase, Pig, Hive, SOLR, Sqoop, Flume, Puppet, Oozie, Zookeeper

NOSQL: HBase, Cassandra, MongoDB, Accumulo

Databases: Microsoft SQL server 2005/2008, Oracle 10g, MySQL, DB2 and Teradata

Operating System: Unix, Mac OS and Windows (95, Vista, XP, 7, 8, 8.1, 10)

BI tools: Qlikview, Tableau, MSBI (SSIS, SSAS, SSRS)

PROFESSIONAL EXPERIENCE

Confidential, Chicago, IL

Data Scientist

Responsibilities:

  • Worked with the Intent Owners in understanding the intent and built the Business Intent template with all the segmentation criteria required
  • Work independently and collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.
  • Performed statistical analysis to determine peak and off-peak time periods for ratemaking purposes
  • Conducted analysis of customer data for the purposes of designing rates.
  • Identified root causes of problems and facilitated the implementation of cost effective solutions with all levels of management.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, clustering, SVM to identify Volume using scikit-learn package in R.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Principal Component Analysis.
  • Performed K-means clustering, Regression and Decision Trees in R.
  • Work independently or collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.
  • Partner with technical and non-technical resources across the business to leverage their support and integrate our efforts.
  • Partner with infrastructure and platform teams to configure, tune tools, automate tasks and guide the evolution of internal big data ecosystem; serve as a bridge between data scientists and infrastructure/platform teams.
  • Worked on Text Analytics and Naive Bayes creating word clouds and retrieving data from social networking platforms.
  • Pro-actively analyze data to uncover insights that increase business value and impact.
  • Support various business partners on a wide range of analytics projects from ad-hoc requests to large-scale cross-functional engagements
  • Prepared Data Visualization reports for the management using R
  • Approach analytical problems with an appropriate blend of statistical/mathematical rigor with practical business intuition.
  • Hold a point-of-view on the strengths and limitations of statistical models and analyses in various business contexts and is able to evaluate and effectively communicate the uncertainty in the results.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Approach analysis in multiple ways in order to evaluate approaches and compare results.
  • Developed statistical reports with Charts, Bar Charts, Box plots, Line plots using PROC SGPLOT, PROC GCHART and PROC GBARLINE.
  • Used SAS Procedures like PROC FREQ, PROC SUMMARY, PROC MEANS, PROC SQL, PROC SORT, PROC PRINT, PROC Tabulate, PROC UNIVARIATE, PROC PLOT and PROC REPORT to generate various regulatory and ad-hoc reports.
  • Created reports in the style format (RTF, PDF and HTML) using SAS/ODS.
  • Currently working on XML parsing using PIG, Hive, HDP andRedshift.
  • Built complex formulas in Tableau for various business calculations.
  • Create Bar Charts which is compiled with data sets and added trend lines and forecasting on future trend of the financial performance.
  • Complied interactive dashboards in Tableau Desktop and published them to Tableau Server with Quick Filters for on demand needed information with just a click of a button.
  • Possess good technical skills in SAS programming language for different SAS solutions that are utilized to analyze and generate files, tables, listings, graphs, validations, reports and documentation.

Environment: Windows 8, SAS v9.4, SAS/EG, Python, SAS/Base, SAS/Access, SAS/Macro, SAS/SQL, SAS/Graph, SAS/STAT, SAS/Connect, MS SQL Server, Oracle, Teradata, MS-Excel, Tableau 8.2, Qlikview 9.0 SR2

Confidential, Plano, TEXAS

Sr. Data Scientist

Responsibilities:

  • Worked with the analysis teams and management teams and supported them by providing variables based on their requirements.
  • Responsible for Retrieving data using SQL Queries and perform Analysis enhancements and documentation of the system.
  • Used R, SAS and SQL to manipulate data, and develop and validate quantitative models.
  • Brainstorming sessions and propose hypothesis, approaches and techniques.
  • Created and optimized processes in the Data Warehouse to import retrieve and analyze data from the CyberLife database.
  • Analyzed data collected in stores (JCL jobs, stored-procedures and queries) and provided reports to the Business team by storing the data in excel/SPSS/SAS file.
  • Performed Analysis and Interpretation of the reports on various findings.
  • Performing Exploratory Data Analysis on the data provided by the Client.
  • Prepared Test documents for zap before and after changes in Model, Test And Production regions.
  • Responsible for production support Abend Resolution and other production support activities and comparing the seasonal trends based on the data by Excel.
  • Used advanced Microsoft Excel functions such as pivot tables and VLOOKUP in order to analyze the data and prepare programs.
  • Successfully implemented migration of client's requirement application from Test/DSS/Model regions to Production.
  • Prepared SQL scripts for ODBC and Teradata servers for analysis and modeling.
  • Provided complete assistance of the trends of the financial time series data.
  • Various statistical tests performed for clear understanding to the client.
  • Implemented procedures for extracting Excel sheet data into the mainframe environment by connecting to the database using SQL.
  • Provided training to Beginners regarding the CyberLife system and other basics.
  • Complete support to all regions. (Test/Model/System/Regression/Production).
  • Actively involved in Analysis, Development and Unit testing of the data.
  • Generated reports of more than 100 Agent Fraud Investigation cases based on the client requirement and making sure the data is accurate.
  • Complete delivery assurance of the project.

Confidential

Sr. Data Scientist

Responsibilities:

  • Worked with informatics department and I was responsible to build predictive models using clinical, survey, or administrative data in support of Informatics analytical projects.
  • Extensively used Teradata-SQL Assistant and Advanced query tool to write SQL queries.
  • Converted various SQL statements into stored procedures thereby reducing the number of database accesses.
  • Responsible for architecting analytic frameworks for data mining, ETL, analysis, and reporting under the supervision of the Manager.
  • Prepared regular patient reports by collecting samples of Diagnosed Patients using Excel spreadsheets.
  • Cleaned data by analyzing and eliminating duplicate and inaccurate data (outliers) using R.
  • Worked in Agile Environment.
  • Ensure that there are no missing values in the dataset and can be used for further Analysis.
  • Trained in Basics of Data Scientist and implemented those software applications in collecting and managing patient data in Excel/SPSS.
  • Assisted in performing statistical analysis of the data and storing them in a database.
  • Worked with Quality Control Teams to develop Test Plan and Test Cases.
  • Involved in designing and implementing the data extraction (XML DATA stream) procedures.
  • Generated graphs and reports using ggplot and RStudio for analyzing models.
  • Generating the Results and predicting the Accuracy.
  • Preparing the Final Documents and ensure delivery to the Client before EOD.

Confidential

Data Scientist

Responsibilities:

  • A highly immersive Data Science program involving Data Manipulation and Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure
  • Implementing Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering and dimensionality reduction.
  • Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database.
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big Data.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and snowflake schemas.
  • Manipulating/mining data from database tables (Redshift, Oracle, Data Warehouse)
  • Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, Mapreduce, Pig and others
  • Interface with other technology teams to extract, transform, and load (ETL) data from a wide variety of data sources
  • Own the functional and nonfunctional scaling of software systems in your own area.
  • Provides input and recommendations on technical issues to BI Engineers, Business & Data Analysts, and Data Scientists.

Confidential

Junior. Data Scientist

Responsibilities:

  • Analyzed business needs and developed Technical Specifications based on interaction with Managers and Development Teams. Worked with the analysis teams and management teams and supported them by providing variables based on their requirements. Involved in gathering, understanding and validating the project specifications.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Processed huge datasets (over billion data points, over 1TB of datasets) for data association pairing and provided insights into meaningful data association and trends
  • Developed cross-validation pipelines for testing the accuracy of predictions
  • Enhanced statistical models (linear mixed models) for predicting the best products for commercialization using Machine Learning Linear regression models, KNN and K-means clustering algorithms
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, PowerBALL, and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary NameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (sciPy, numPy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Used Teradata 15 utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems t

Hire Now