We provide IT Staff Augmentation Services!

Data Scientist Resume

Dallas, Tx


  • Around 8+ years of IT experience as a Data Scientist, including profound expertise and experience on statistical data analysis such as transforming business requirements into analytical models, designing algorithms, and strategic solutions that scale across massive volumes of data.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, PythonandTableau.
  • Hands on experience in implementing LDA, Naïve Bayes and skilled in Random Forests, Decision Trees, Linearand Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Familiar with Recommendation System Design by implementing Collaborative Filtering, Matrix Factorizationand Clustering Methods.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
  • Skilled in performing data parsing, data ingestion, data manipulation, data architecture, data modeling and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex melt and reshape.
  • Experienced with Natural Language Processing along with Topic modeling and Sentiment Analysis.
  • Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA
  • Extensively worked on Python 3.5/2.7 (NumPy, Pandas, Matplotlib, NLTKand Scikit - learn)
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X, R 3.0 (ggplot2, Caret, dplyr) and Excel
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008, NoSQL databases like MongoDB 3.2
  • Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Social Network Analysis, Cluster Analysis, and Neural Networks.
  • Experienced in Machine Learning and Statistical Analysis with Python Scikit-Learn.
  • Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for data analysis.
  • Worked with complex applications such as R, SAS, MatlabandSPSS to develop a neural network, cluster analysis.
  • Strong SQL programming skills, with experience in working with functions, packagesand triggers.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
  • Experience and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Good understanding of Teradata SQL Assistant, Teradata Administratorand data load/ export utilities like BTEQ, Fastload, Multiload, FastExport.
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, and SSRS.
  • Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
  • Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in a collaborative team, a self-motivated enthusiastic learner.


Programming Languages: Java 8, Python, R, PowerShell

Databases: SQL, Hive, Pig, Databases SQL-Server, My SQL, MS SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra.

Statistical Software: SPSS, R, SAS.

Web Packages: Google Analytics, Adobe Test & Target, Web Trends

Statistical Methods: Time Series, regression models, splines, confidence intervals, principal component analysisandDimensionality Reduction, bootstrapping

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalizationand De-normalization techniques.

Big Data / Grid Technologies: Cassandra, Coherence, Mongo DB, Zookeeper, Storm, Kafka, Hadoop.

Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, DTS, Crystal Reports, Power Pivot, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA,SparkMLlib.


Confidential, Dallas,TX.

Data Scientist


  • As an Architect design conceptual, logical and physical models using Erwin and build DataMart's using hybrid Inmon and Kimball DW methodologies.
  • Defined accountability procedures governing data access, processing and storage, retention, reportingand auditing measuring contract compliance.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MS Visio.
  • As an Architect implemented MDM hub to provide clean, consistent data for and SOA implementation.
  • Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for forwarding/Reverse Engineered Databases.
  • Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash.
  • This project was focused on customer clustering based on ML and statistical modeling effort including building predictive models and generate data products to support customer classification and segmentation.
  • Application of various Artificial Intelligence(AI)/machine learning algorithms and statistical modeling like decision trees,text analytics, natural language processing(NLP), supervised and unsupervised, regression models.
  • Used Principal Component Analysis in feature engineering to analyze high dimensional data.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python and build models using deep learning frameworks.
  • Developed predictive causal model using annual failure rate and standard cost basis for the new bundled services.
  • Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning. prototyping and experimenting ML algorithms and integrating into production system for different business needs.
  • Application Machine Learning algorithms with Spark Mlib standalone and R/Python.
  • Worked on Multiple datasets containing 2billion values which are structured and unstructured data about web applications usage and online customer surveys
  • Design, built and deployed a set of python modeling APIs for customer analytics, which integrates multiple machine learning techniques for various user behavior prediction and supports multiple marketing segmentation programs
  • Segmented the customers based on demographics using K-means Clustering
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau,Power BI
  • Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team
  • Built and published customized interactive reports and dashboards, report scheduling using Tableau server.
  • Hands-on Oracle External Tables feature to read the data from flat files into Oracle staging tables.
  • Analyzed the weblog data using the HiveQL to extract a number of unique visitors per day, page views, visit duration, most purchased product on the website and managed and reviewed Hadoop log files.
  • Used Erwin9.1 for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Designed and developed user interfaces and customization of Reports using Tableau and OBIEE and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc. to import data into the data warehouse.
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.

Environment: ERwin9.x, Teradata, Oracle12c, Hadoop, HDFS, Pig, Hive, MapReduce, PL/SQL, UNIX, Informatica Power Center, MDM, SQL Server, Netezza, DB2, Tableau, Aginity, Architecture, SAS/Graph, SAS/SQL, Tableau, SAS/Connectand SAS/Access.

Confidential, Austin TX.

Data Scientist


  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop Hive.
  • Involved in Hadoop Name node metadata backups and load balancing as a part of Cluster Maintenance and Monitoring.
  • Used File System Check (FSCK) to check the health of files in HDFS and used Sqoop to import data from SQL server to Cassandra.
  • Experience in determine trends and significant data relationships using advanced Statistical Methods.
  • Implemented techniques like forwarding selection, backward elimination and stepwise approach for selection of most significant independent variables.
  • Performed Feature selection and Feature extraction dimensionality reduction methods to figure out significant variables.
  • Used RMSEscore, Confusion matrix, ROC, Cross-validationand A/B testing to evaluate model performance in both simulated environment and the real world.
  • Performed Exploratory Data Analysis using R. Also involved in generating various graphs and charts for analyzing the data using Python Libraries.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Conducted studies, rapid plots and using advanced data mining and statistical modeling techniques to build a solution that optimizes the quality and performance of data.
  • Demonstrated experience in the design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Developed Linux Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, Informatica, Java, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, Hive, OLAP, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.

Confidential - Eden Prairie, MN

Data Analyst


  • Communicated and coordinated with end client for collecting data and performed ETL to define the uniform standard format.
  • Queried and retrieved data from SQLServer database to get the sample dataset.
  • In the preprocessing phase, used Pandas to clean all the missing data, datatype casting and merging or grouping tables for the EDA process.
  • Used PCA and another feature engineering, feature normalizationand label encoding Scikit-learn preprocessing techniques to reduce the high dimensional data (>150 features).
  • In data exploration stage used correlation analysis and graphical techniques in Matplotlib and Seaborn to get some insights about the patient admission and discharge data.
  • Experimented with predictive models including Logistic Regression, Support Vector Machine (SVC), Random Forest provided by Scikit-learn, XGBoost, LightGBMand Neural network by Keras to predict showing probability and visiting counts.
  • Designed and implemented Cross-validation and statistical tests including k-fold, stratified k-fold, hold-out scheme to test and verify the models' significance.
  • Implemented, tuned and tested the model on AWS Lambda with the best performing algorithm and parameters.
  • Implemented Hypothesis testing kit for sparse sample data by wring R packages.
  • Collected the feedback after deployment, retrained the model to improve the performance.
  • Designed, developed and maintained daily and monthly summary, trending and benchmark reports in Tableau Desktop.

Environment: SQL Server 2012/2014, AWS EC2, AWS Lambda, AWS S3, AWS EMR, Linux, Python3.x (Scikit-Learn, NumPy, Pandas, Matplotlib), R, Machine Learning algorithms, Tableau.

Confidential - New York, NY

Data Engineer


  • Conducted one-to-one sessions with business users to gather data for Data Warehouse requirements.
  • Part of the team analyzing database requirements in detail with the project stakeholders through Joint Requirements Development (JRD) sessions.
  • Developed an Object modeling in UML for Conceptual Data Model using Enterprise Architect.
  • Developed logical and Physical data models using Erwin to design the OLTP system for different applications.
  • Facilitated transition of logical data models into the physical database design and recommended technical approaches for good data management practices.
  • Worked with the DBA group to create a Best-Fit Physical Data Model with DDLfrom the logical data model using Forward engineering.
  • Involved in detail designing of data marts by using Star Schema and plan data marts involving shared dimensions.
  • Used Model Manager Option in Erwin to synchronize the data models in ModelMart approach.
  • Gather various reporting requirements from BusinessAnalysts.
  • Worked on enhancements to the DataWarehouse model using Erwin as per the business reporting requirements.
  • Reverse Engineering the reports and identified DataElements (in the source system). Dimensions, Factsand Measures required for reports.
  • Worked with the ETL team to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.
  • Extensive system study, design, development and testing were carried out in the Oracle environment to meet the customer requirements.
  • Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
  • Used Teradata utilities such as FastExport, MultiLOAD for handling various tasks.
  • Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
  • Developed data mapping documents between Legacy, Production, and User Interface Systems.
  • Generated comprehensive analytical reports by running SQL Queries against current databases to conduct data analysis.
  • Developed and maintained Data Dictionary to create Metadata Reports for technical and business purpose using Erwin report designer.
  • Generated ad-hoc reports using Crystal Reports 9and SQL Server Reporting Services (SSRS).

Environment: Erwin r9.6, DB2, Teradata, SQL-Server2008, Informatica 8.1, Enterprise Architect, Power Designer, MS SSAS, Crystal Reports, SSRS, ER Studio, Lotus Notes, Windows XP, MS Excel, Wordand Access.


Data Analyst.


  • Experienced in developing business reports by writing complex SQL queries using views, volatile tables
  • Experienced in Automating and Scheduling the TeradataSQLScripts in UNIX using Korn Shell scripting.
  • Wrote several TeradataSQL Queries using Teradata SQL Assistant for AdHoc Data Pull request.
  • Implemented Indexes, Collecting Statistics, and Constraints while creating table
  • Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • The building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
  • Design and deploy rich Graphic visualizations with Drill Down and Drop-down menu option and Parameterized using Tableau.
  • Created side by sidebars, Scatter Plots, Stacked Bars, Heat Maps, Filled Maps and Symbol Maps according to deliverable specifications.

Environment: Oracle 9i, MS-Office, Teradata, Tableau 6.1.10, Teradata.


Data Analyst


  • Worked on development of data warehouse, Data Lake and ETL systems using relational and non-relationaltools like SQL, No SQL.
  • Involved in data mining, transformationand loading from the source systems to the target system.
  • Supported business areas and database platforms to ensure logical data model and database design, creation, and generation follows enterprise standards, processes, and procedures
  • Generated a variety of metadata artifacts
  • Designed database solution for applications, including all required database design components and artifacts.
  • Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, PowerBIand Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Maintenance in the testing team for System testing/Integration/UAT.

Environment: Data Lake,ETL,SQL,NoSQL,UAT,data collection, data cleaning, developing models, validation.

Hire Now