We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Madison, WI

PROFESSIONAL SUMMARY:

  • 8+ years of experience in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Adept in statistical programming languages like R and Python including BigData technologies like Hadoop, Teradata with strong domain expertise in multiple industries.
  • Expertise in Applied Statistics, Exploratory Data Analysis and Visualization using tools like R, Tableau along with various Hadoop environment
  • Hands onexperience in Supervised Machine learning for Regression and Classification, Unsupervised learning using K - means and Hierarchical clustering.
  • Data extraction, Cleansing, Analysis, and Reporting using SAS, SQL, and Advance Excel. Analyzed and Developed analytical reports for monthly risk management presentation to higher management. Maintained and enhanced the Reports and dashboards.
  • Experience in building Data Integration, Workflow Solutions, and Extract, Transform and Load (ETL) solutions for data warehousing using SQLServerIntegrationService (SSIS).
  • Experience in creating and implementing interactive charts, graphs, and other user interface elements.
  • Strong experiencein writing complexSQL queries using PROCSQL. Involved in coding and pulling data from various tables using Unions and Joins and producing Tables, Reports, Graphs, and Listings using PROCREPORT and PROCGPLOT.
  • Experience in working with multiple databases like MSSQLServer, and MSAccess, Hbase, Teradata
  • Proficient in design and development of various dashboards, reports utilizing Tableau Visualizations according to the end user requirement.
  • Experience on creation of Pivot tables, Graphs, Charts, Functions, and Macros in excel.
  • Experience in all stages of Software Development Life Cycle(SDLC) including unit testing and system integration.
  • Experience inApachePig for constructing data flows for Extract Transform and Load (ETL) processing and analyzing data
  • Worked on Hive to process structured data in Hadoopenvironment for querying and analyzing data using HQL
  • Good experience in Using ofHbase for storing huge data on top of HDFS and access random real time data in HDFS
  • Good knowledge in Acquiring data using ETL builds from Audible and Amazon Data Warehousing (EC2/DMART/Redshift)
  • Expertise in writing complex database SQL queries with a focus on PostgreSQL and Redshift
  • Proficient working knowledge in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering,
  • Strong knowledge at Microsoft SQL Server 2005 server side development including (SSIS, SSRS, SSMS), T-SQL (Transact SQL)
  • Strong Testing and validation experience in various prediction and statistical models using ROC plot, K- fold cross validation and data visualization.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, DataMining solutions to various business problems and generating data visualizations usingggplot ofR, Python and Informatica.
  • Deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
  • Experience working with SAS, Toad, Oracle SQL, Hadoop and R in data analysis and data cleansing.
  • Good awareness in Writingcomplex queries using PL-SQL and T-SQL for extracting data from Oracle and MS SQL databases
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Excellent skilled in performing data parsing, data manipulation and data modeling with methods including describe data contents, compute descriptive statistics of data, RegEx, split and combine, Remap, merge, subset, melt and reshape.
  • Working knowledge in Hadoop, Hive and NoSQL databases like MongoDB, Cassandra and HBase.
  • Experience with the use of Network Monitoring Systems, databases like MySQL, PostgreSQL, MongoDB.
  • Knowledge in networking protocols and technologies, web services, version control tools like GIT and SVN.
  • Knowledge on working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from various sources, prepared data for data exploration using data munging.
  • Good industry knowledge, analytical and problem solving skills and ability to work well within a team as well as an individual. Highly creative, innovative, committed, intellectually curious, business savvy with effective communication and interpersonal skills.
  • Experienced in creating Test Cases, performing Manual Testing, identifying and tracking defects.
  • Comfortable in interacting with business and end users, documenting the development process
  • Skilled in documenting, tracking and managing issues/risks, supported project metrics analysis, team communication, risk analysis, report generation, and documentation control.
  • Good knowledge in Integrated ComplexRegular Expressions (RegEX) for extracting information into the work flow of the process
  • Experience in working with various operating system like Linux (RHEL, Ubuntu, Centos), Unix along with windows

TECHNO SKILLS:

Reporting: Access, SSRS, Crystal Report, Excel, SQL*Plus

Python, R, SAS, PL: SQL, MDX, T-SQL, PL/SQL

Scripting Languages: Python, linux Bash, apache Pig, Hive,Sqoop

Operating System: Windows, Linux (centos, Ubuntu, RHEL) and Unix

DWH/ BI Tools: SQL Server, Amazon Redshift, Tera data,Tableau, Informatica

Database: MS SQL Server 2014, 2012, 2008 R2/2005 Oracle 9i/10g, MS-Access - 2007, SQL Server Oracle 11g, and Access 2013 Database, Hive, MySQL, Amazon Redshift, Postgres SQL

NOSQL Databases: Hbase, Cassandra, MongoDB, Amazon DynamoDB, Redis

Tools: Toad, Tableau, SQL Developer, Putty, SQL Profiler, Query Analyzer

Other Tools: MS Visual Studio, MS Office, MS Project, MS-Visio, MS PowerPoint, MS LYNC, SQL Loader, Quick Test professional and Quality Center (QC), JIRA

ETL Tools: Informatica, SSIS, Talend

PROFESSIONAL EXPERIENCE:

Confidential, Madison, WI

Data Scientist

Responsibilities:

  • Performed Data Profiling to learn about user behavior and merged data from multiple data sources.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python and R.
  • Developed Clustering algorithms and Support Vector Machines that improved Customer segmentation and Market Expansion.
  • Professional Tableau user (Desktop, Online, and Server), Experience with Keras and Tensor Flow.
  • Data Story teller, Mining Data from different Data Source such as SQL Server, Oracle, Cube Database, Web Analytics, Business Object and Hadoop.
  • Provided AD hoc analysis and reports to Executive level management team
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View
  • Design, Implement and Maintain Database Models in RDBMS
  • Implemented analytics delivery on cloud-based visualization such as Business Object and amazon web services platform
  • SPOC Data Scientist and predictive analyst to create annual and quarterly Business forecast reports.
  • Main source of Business Regression report
  • Collaborated with the UI developer to ensure that the models can be seamlessly deployed into the MVP.
  • Designedinteractive tableau KPI dashboards by gathering requirements from business users and slicing, across amazon Redshift
  • Creating various B2B Predictive and descriptive analytics using R and Tableau
  • Creating and automating ad hoc reports
  • Experience with technical documentation of code and processes
  • Responsible for planning & scheduling new product releases and promotional offers
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms. Experience working with SAS, Toad, Oracle SQL, Hadoop and R in dataanalysis and data cleansing.
  • Worked on unstructured data of NoSQL databases like MongoDB
  • Experienced in Agile methodologies and attended regular SCRUM meetings.
  • Experience on working with different data types like FLATFILES, CSV, AVRO and JSON
  • Parsing data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.
  • Worked on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in R and Python.

Environment: R, Python, MongoDB, GitHub, SQL Server, HDFS, Pig,RDBMS, Centos, Talend

Confidential, Jersey City, NJ

Data Scientist/Big Data Analyst

Responsibilities:

  • Performed Logistic Regression, Classification, Random Forests and Clustering in R and Python.
  • Worked on Developing Fraud Detection Platform using various machine learning algorithms in R and Python.
  • Used Hive to store the data and perform data cleaning steps for huge datasets.
  • Worked on Linear Discriminant analysis, Greedy Forward Selection, Greedy Backward Selection and Feature Selection, Feature reduction algorithms like Principal Component Analysis (PCA) and Factor Analysis.
  • Implemented Classification using Supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Performed Exploratory Data Analysis and Data Visualizations using R, Python and Tableau.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Implemented Ensemble models like Boosting and Bagging
  • Experience working with Toad, Oracle SQL, Hadoop and R in data analysis and data cleansing.
  • Performed Inventory Analysis with Statistical and Data Visualization Tools.
  • Worked on Regression in performing Safety Stock and Inventory Analysis using R.
  • Performed data visualizations using Tableau and R.
  • Involved in working initiative for Intuit's AWS innovation project using S3, data pipeline, and redshift
  • Used SQL to retrieve data from the Oracle database for data analysis and reporting.
  • Performed Market Basket analysis to identify customer buying patterns, ps and behaviors to better manage sales and inventory.
  • Performed Decision Tree Analysis and Random forests for strategic planning and forecasting.
  • Manipulating and Cleaning data using dplyr,tidyr, sqldf packages in R.
  • Analyzed historical data to find the trends in system First Allocation Fill and First Pass Fill.
  • Extensively used Xelus for planning High cost/ More Selling parts.
  • Analyzed Demand data and inputs from Xelus to find out the Safety Stock for all parts.

Environment: R, Tableau, Oracle BI, Xelus, MySQL, Hadoop, Pig, Hive, SQL Server, Tableau, Python, GitHub.

Confidential, Los Angeles, CA

Big Data Analyst

Responsibilities:

  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Performed Market-Basket Analysis and implemented decision trees, random forests and K- fold cross validation.
  • Responsible for data identification, collection, exploration & cleaning for modeling, participate in model development.
  • Facilitate ETL to large data sets utilizing Pig/Hive/Hbase on Hadoop Ecosystem
  • Visualize, interpret, report findings and develop strategic uses of data.
  • Understand transaction data and develop Analytics insights using Statistical models using Azure Machine learning.
  • Selection of statistical algorithms - (Random Forest, SVM, Bagged CART, GBM etc)
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs
  • Built tools, analyzed, captured and reported Business Intelligence metrics captured at various stages of customer journey across marketing funnel
  • Responsible for providing reporting, analysis and insightful recommendations to business leaders on key performance metrics pertaining to sales & marketing
  • Strong Knowledge of other relational database platforms such as Oracle, DB2, IMS-DB, NoSQL (Elastic Search) etc.
  • Used Decision trees and Random forests to find employee attrition rate.
  • Worked on Time Series forecasting using R to fill consumer demand by frequent tracking of inventory
  • Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
  • Accomplished Data analysis, Statistical analysis, generated reports, listings and graphs.
  • Provide quantitative supports for decision making to management team and Responsible for methodology development for real time data processing, using datamining, machine learning and statistics analysis using Hadoop and big data analytics
  • Developed sampling method to better analyze big data, such as shipment data and real time capturing of inventory data.
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented advanced stochastic method based on Hierarchical Bayesian Networks to estimate and forecast travel demand from multiple data sources.
  • Conducted comprehensive scenario analysis to improve business procedure and area planning and Optimized freight network design.
  • Implemented quantitative data analysis to understand multimodal freight behavior in port area.

Environment: R, Python, Mahout, MySQL, Hadoop, Pig, MongoDB, Oracle, DB2

Confidential

Data Analyst

Responsibilities:

  • Analyzed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements in Quality Center 9.0
  • Involved in developing detailed test plan, test cases and test scripts using Quality Center for Functional and Regression Testing.
  • External interactions with NSCC, DTCC, NFS and various fund companies in which MMLISI is associated.
  • Involved in Data mapping specifications to create and execute detailed system test plans.
  • The data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
  • Tested the reports using Business Objects functionalities like Queries, Slice and Dice, Drill Down, Cross Tab, Master Detail and Formulae
  • Involved in Teradata SQL Development, Unit Testing and Performance Tuning
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Tested the ETL Informatica mappings and other ETL Processes (Data Warehouse Testing)
  • Tested several stored procedures.
  • Validated several Business Objects reports. Reviewed and tested business requests for data and data usage.
  • Tested the ETL process for both before data validation and after data validation process.
  • Tested the messages published by ETL tool and data loaded into various databases
  • Responsible for Data mapping testing by writing complex SQL Queries using WINSQL
  • Experience in creating UNIX scripts for file transfer and file manipulation.
  • Validating the data passed to downstream systems.
  • Worked with Data Extraction, Transformation and Loading (ETL).
  • Involved in testing data mapping and conversion in a server based data warehouse.
  • Involved in testing the UI applications
  • Involved in Security testing for different LDAP roles.
  • Tested whether the reports developed in Business Objects are as per company standards.
  • Used Quality Center to track and report system defects
  • Involved in testing the XML files and checked whether data is parsed and loaded to staging tables.

Environment: Informatica 8.1/7.1, Informix, DB2, Java, Business Objects, SQL, SQL Server 2000/2005, Teradata V2R6 (MLOAD, FLOAD, FAST EXPORT, BTEQ), TeradataSQL Assistant 7.0, Toad, XML, XSLT, IBM AIX 5.3, UNIX, Shell Scripting, WINSQL, Ultra edit, Rumba UNIX Display, Quality Center

Confidential

Data Analyst

Responsibilities:

  • Analyze the Bank's data and business terms from a data quality and integrity perspective
  • Perform root cause analysis on smaller self-contained data analysis tasks that are related to assigned data processes.
  • Writes various SQL statements for data generation for purpose of analysis
  • Work to ensure high levels of data consistency between diverse source systems including flat files, XML and SQL Database.
  • Develop and run ad hoc data queries from multiple database types to identify system of records, data inconsistencies, and data quality issues.
  • Daily Data quality checks and Data profiling for accurate and better reporting and analysis.
  • Involved in translating the business requirements into data requirements across different systems.
  • Involved in understanding the customer needs with regards to data, documenting requirements and complex SQL statements to extract the data and packaging data for delivery to customers.
  • Participates in the development of Enhancement for the current Commercial and Mortgage Securities
  • Writes SQL Stored Procedures and Views and perform in-depth testing of new and existing systems.
  • Manipulate and prepare data, extract data from database for business analyst using Tableau.
  • Review normalized schemas for effective and optimum performance tuning queries and data validations in OLTP and OLAP environments.
  • Exploits power of MS SQL to solve complex business problems by data analysis on a large set of data.
  • Use Tableau for T-SQL queried data, and data analysis, generating reports, graphics and statistical analysis.
  • Develop Reports using the SQL advanced techniques like Rank, Row number etc.
  • Analyze data using Tableau for automation and determine business data trends.
  • Transfer data objects and queries from MS Excel to SQL Server.
  • Assists ETL team to define Source to Target Mappings.
  • Compile and Generate Reports in a Presentable Format to the Project Team.

Environment: SQL Server, Tableau, MS excel, XML, T-SQL

We'd love your feedback!