We provide IT Staff Augmentation Services!

Data Scientist Resume



  • Over 8+ years of experience in IT and 3+ years experience asData Scientist with strong technical expertise, business experience.
  • Extensive experience in Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations and ad hoc reportsusingSQL, PL/SQL, R 3.2.2, Python2.7\3.4.3, SAS and Tableau9.4/10.
  • Extensively worked on Python 3.5/2.7 (Numpy, Pandas, Matplotlib, NLTK and Scikit - learn)
  • Strong experience in BigData technologies like Spark 1.6, Sparksql, pySpark, Hadoop 2.X, HDFS, Hive.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA
  • Expertise in transforming business requirements into Analytical models, designing algorithms, building models, developing datamining and reporting solutions that scales across massive volume of structured and unstructured data
  • Strong experience in the Analysis, design, development, testing and Implementation of Business Intelligence solutions usingDataWarehouse/DataMart Design, ETL, OLAP, BI, Client/Server applications.
  • Worked with NoSQL Database including Hbase, Cassandra and MongoD
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting using SQL Server Reporting Services (SSRS).
  • Good understanding of Microsoft SQL Management Studio and data load/ export utilities like BTEQ, Fast Load, Multi Load, Fast Export and Performance Tuning for enhancing extraction speed of data.
  • Expertise in development of software with major contribution inDatawarehousing and Database Applications usingInformatica, OraclePL/SQL, UNIX Shell Programming.
  • Proficiency in understanding statistical and other tools/languages - R, Python, C, C++, Java, SQL, SAS
  • Expertise in transforming business requirements into Analytical models, designing algorithms, building models, developing datamining and reporting solutions that scales across massive volume of structured and unstructured data
  • Experience with Tensor Flow, Caffe and other Deep Learning frameworks. solid understanding of RDBMS database concepts including Normalization and hands on experience creating database objects such as tables, views, stored procedures, triggers, row-level audit tables, cursors, indexes, user-defined data-types and functions.
  • Hands-on experience in managingall project stages including business development, estimation, requirements determination, gap analysis, business process reengineering, issue resolution, configuration, training, go-live assistance, vendor management and post implementation support.
  • Worked on facets of data warehousing like Data-modelling, Integration, Reporting, Database Development, Testing, Requirement Gathering, SDLC, Client Interaction, Gap Analysis, Change Management


Tools: HQL, Pig, Apache Spark, Microsoft Visual Studio, Minitab, Arena, SPSS, Microsoft SQL Integration Services, Microsoft SQL Analyzing Services, Microsoft SQL Reporting Services, Oracle Database Integrator.

Languages: Python2.7/3.4.3, R3.2.2, SQL, PL/SQL, SAS,C, C++, Java

Operating system: Windows 7/8/10, Linux

Environment: s: Jupyter, R Studio, Anaconda, Spyder, Python Console, Pycharm

Statistical Tools: Multivariate regression analysis, SVM, Cluster Analysis, NLP, Parametric and Non-Parametric Tests and Analysis of variance technique, SVM, Polynomial regression, Cluster Analysis, Decision tress, Random Forest, Apriori, MBA, Factor Analysis, Principal Component Analysis, K Nearest Neighbors, Reinforcement Leaning

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence or Azure Data Warehouse


Confidential, NY

Data Scientist


  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Created data visualization with ggplot2 in R to understand annual sales pattern.
  • Applied concepts of probability, distribution and statistical inference on given dataset to unearth interesting findings through use of comparison, T-test, F-test, R-squared, P-value etc.
  • Applied linear regression, multiple regression, ordinary least square method, mean-variance, theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function etc. to data with help of Scikit, Scipy, Numpy and Pandas module of Python.
  • Applied clustering algorithms i.e. Hierarchical, K-means with help of Scikit and Scipy and developed visualizations and dashboards using ggplot2, Tableau.
  • Python and R scripting to wrangle and aggregate a war dataset consisting of 2+ million records and inconsistent formats.Functions used such as is.na, median and filters like which ().
  • Reset data frame index in R for misaligned data and generate qgplot () for data visualization.
  • Developed large data sets from structured and unstructured data. Perform data mining.
  • Partnered with modelers to develop data frame requirements for projects and converting vector data into matrices by using rbind () and nbind () functions.
  • Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
  • Tracked various campaigns, generating customer profiling analysis and data manipulation.
  • Provided R/SQL programming, with detailed direction, in the execution of data analysis that contributed to the final project deliverables. Responsible for data mining.
  • Analyzed large datasets to answer business questions by generating reports and outcome.
  • Worked in a team of programmers and data analysts to develop insightful deliverables that support data- driven marketing strategies.
  • Executed SQL queries from R/Python on complex table configurations.
  • Retrieving data from database through SQL as per business requirements.
  • Prepared data frames by using Gsub () function in R for identifying missing data that used for production data analysis.
  • Create, maintain, modify and optimize SQL Server databases and troubleshoot server problems.
  • Manipulation of Data using BASE SAS Programming.
  • Data collection, cleaned, filtered and transformed data in the specified format.
  • Prepared the workspace for Markdown.
  • Accomplished Data analysis, statistical analysis, generated reports, listings, and graphs.
  • Worked on R and Python to identify business performance via Classification, tree map, and regression models along with visualizing data for interactive understanding and decision-making.
  • Documented all programs and procedures to ensure an accurate historical record of work completed on an assigned project, which improved quality and efficiency of process by 15%.
  • Adhering to best practices for project support and documentation.
  • Developed the hypothesis models and validate the same using the data.
  • Managing the Reporting/Dash boarding for the Key metrics of the business.
  • Involved in data analysis with using different analytic techniques and modeling techniques.

Environment: R, Python, Tableau,Logistic Regression, Boosted Trees, Recommendation engine using Collaborative Filtering and Market Basket Analysis

Confidential, Reston, VA

Data Scientist


  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, numpy, Seaborn, scipy, matplotlib, sci-kit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used CaffeDeepLearning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, PowerBI and SmartView.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoop eco system such as PIG, HIVE, and HBase.
  • Utilized graph databases such as Neo4j, and Virtuoso to perform network based analysis, extract and embed knowledge into predictive models.
  • Implemented an entity extraction algorithm to find entities in short unstructured text (ex: tweets) as well as from long unstructured texts such as research articles by applying Natural Language Processing techniques.
  • Developed a personalization algorithm for social media that utilizes openly available knowledge bases to augment machine learning algorithms with additional features to estimate user’s interest in entities and content.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, Business Objects.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.

Environment: R 9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Rational Rose.

Confidential, South Portland, Maine

Data Analytics


  • Gathered, analyzed, documented and translated application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.
  • Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, MapReduce and HDFS.
  • Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Applied clustering algorithms i.e. Hierarchical, K-means usingScikit and Scipy.
  • Performs complex pattern recognition of automotive time series data and forecast demand through the ARMA and ARIMA models and exponential smoothening for multivariate time series data.
  • Delivered and communicated research results, recommendations, opportunities to the managerial and executive teams, and implemented the techniques for priority projects.
  • Designed, developed and maintained daily and monthly summary, trending and benchmark reports repository in Tableau Desktop.
  • Generated complex calculated fields and parameters, toggled and global filters, dynamic sets, groups, actions, custom color palettes, statistical analysis to meet business requirements.
  • Implemented visualizations and views like combo charts, stacked bar charts, pareto charts, donut charts, geographic maps, spark lines, crosstabs etc.
  • Published workbooks and extract data sources to Tableau Server, implemented row-level security and scheduled automatic extract refresh.

Environment: Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica

Confidential, Irvine, California

Data analytics


  • Involved with Business Analysts team in requirements gathering and in preparing functional specifications and changing them into technical specifications
  • Applied the Agile software development process to establish a business analysis methodology
  • Analyzed the client dataand business terms from adataquality and integrity perspective
  • Identified the most suitable source of record and outlined thedata required for sales and service
  • Implemented metadata repository, maintained data quality,data cleanup procedures, transformations, data standards,data governance program, scripts, stored procedures, triggers and executed test plans
  • Researched and resolved issues with application teams regarding the integrity of data flow into databases & reporting to the client
  • Created customized SQL Queries using MS SQL Management Studio to pull specifieddata for analysis and report building in conjunction with Crystal Reports
  • Analyzed transaction level datasets using SQL combined with demographic information to provide customer insights and generate ad-hoc reports
  • Prepared sales forecasts by collecting and analyzed salesdata to evaluate current sales goals
  • Designed and developed Use cases, Activity diagrams, Sequence diagrams, OOD using UML and Business Process Modeling
  • Established thelogical and physical ER/Studiodata models in line with business standards and guidelines
  • Created logical and physical datamodels using best practices to ensure high data quality and reduced redundancy
  • Designed & developed various Ad hoc reports for different teams in Business (Teradata and MSACCESS, MSEXCEL)
  • Extracted Mainframe Flat Files (Fixed or CSV) onto UNIX Server and then converted them into TeradataTables for user convenience
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system
  • Utilized SQL to develop stored procedures, views to create result sets to meet varying reporting requirements
  • Data visualization, reporting using Tableau and SSRS

Environment: Agile methodology, Python, SQL/Server 2008R2/2005 Enterprise, SSRS, SSIS, SSAS, Hadoop, Tableau, MS Access, MS Visio, MS Excel, MS Project, Teradata, ER Studio, Crystal reports, and Business Objects


Data Architect/Data Modeler


  • Worked with large amounts of structured and unstructured data.
  • Knowledge of Machine Learning concepts (Generalized Linear models, Regularization, Random Forest, Time Series models, etc.)
  • Worked in Business Intelligence tools and visualization tools such as Business Objects, Tableau, ChartIO, etc.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, and AJAX.
  • Configured the project on WebSphere 6.1 application servers
  • Implemented the online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL.
  • Handled end-to-end project from data discovery to model deployment.
  • Monitoring the automated loading processes.
  • Communicated with other Health Care info by using Web Services with the help of SOAP, WSDL JAX-RPC
  • Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
  • Used SAX and DOM parsers to parse the raw XML documents
  • Used RAD as Development IDE for web applications.
  • Preparing and executing Unit test cases
  • Used Log4J logging framework to write Log messages with various levels.
  • Involved in fixing bugs and minor enhancements to the front-end modules.
  • Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Doing functional and technical reviews
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Guaranteeing quality in the deliverables.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support
  • Created test plan documents for all back-end database modules
  • Implemented the project in Linux environment.

Environment: R, Erwin, Tableau, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.


Data Analyst/Data Modeler


  • Developed Internet traffic scoring platform for ad networks, advertisers, and publishers (rule engine, site scoring, keyword scoring, lift measurement, linkage analysis).
  • Responsible for defining the key identifiers for each mapping/interface.
  • Clients include eBay, Click Forensics, Cars.com, Turn.com, Microsoft, and Looksmart.
  • Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans.
  • Designed the architecture for one of the first analytics 3.0. Online platforms: all-purpose scoring, with on-demand, SaaS, API services. Currently under implementation.
  • Web crawling and text mining techniques to score referral domains, generate keyword taxonomies, and assess commercial value of bid keywords.
  • Developed new hybrid statistical and data mining technique known as hidden decision trees and hidden forests.
  • Reverse engineering of keyword pricing algorithms in the context of pay-per-click arbitrage.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Automated bidding for advertiser campaigns based either on keyword or category (run-of-site) bidding.
  • Creation of multimillion bid keyword lists using extensive web crawling. Identification of metrics to measure the quality of each list (yield or coverage, volume, and keyword average financial value).
  • Enterprise Metadata Library with any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.

Environment: Erwin r, SQL Server 2000/2005, Windows XP/NT/2000, Oracle, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.

Hire Now