We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Temple, TX

PROFESSIONAL SUMMARY:

  • Around 8 years of experience in Machine Learning, Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit - learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as Linear Regression, Multivariate Regression, Naive Bayes, Random Forests, K-Means, & KNN for Data Analysis.
  • Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Expertise in transforming business requirements into Analytical Models, Designing Algorithms, Building Models, Developing Data Mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Developed Logical Data Architecture with adherence to Enterprise Architecture.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Adept in statistical programming languages like Rand also Python including Big Data technologies like Hadoop, Hive.
  • Skilled in using dplyr and pandas in R and Python for performing Exploratory data analysis.
  • Experience working with data modeling tools like Erwin, Power Designer and ERStudio.
  • Experience in designing star schema, Snow flake schema for Data Warehouse, ODS architecture.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
  • Analysed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modelling techniques.
  • Wrote Python modules to extract/load asset data from the MySQL source database.
  • Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3.JS for creating dashboards.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using Data Munging and Teradata.
  • Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.

TECHNICAL SKILLS:

Scripting/programming language: R (dplyr, ggplot2, shiny, plotly), Python (Numpy, Scipy, Pandas, Scikit-learn, Matplotlib, NLTK, Beautiful Soup, Selenium, Python IDE), Pyspark

Machine learning/Deep learning: Classification, Regression(Linear, Logistic, Elastic Net), Clustering analyses using neuralnets (MLP), RF, KNN, SVM, GLM, MLR, Logit, K-means algorithms

Database management systems: RDBMS (Microsoft SQL server, Oracle DB, Teradata)

Big Data: MySQL, Spark, Hadoop/MapReduce, Hive, Impala

Statistical Analysis Tools: SAS Studio, SAS Enterprise Guide, SAS Enterprise Miner, Python, R, ggplot2, dplyr,cart, scipy,sklearn

Data storage/processing framework: Hadoop And Spark

Data visualization/reporting: Tableau, Power BI and shiny

Operating System: Windows, Unix

Case Tools: Erwin & ERStudio

PROFESSIONAL EXPERIENCE:

Confidential, Temple, TX

Data Scientist

Roles & Responsibilities

  • Responsible for working with various teams on a project to develop analytics based solution to target roaming subscribers specifically.
  • Leading a team of 4 data analysts and created multi-dimensional segmentation.
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
  • Coded R functions to interface with CaffeDeepLearningFramework.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various machinelearning algorithms.
  • Combination of these elements (travel prediction & multi-dimensional segmentation) would enable operators to conduct highly targeted and personalized roaming services campaigns leading to significant subscriber uptake.
  • Installed and used CaffeDeep Learning Framework.
  • Scaled up to Machine Learning pipelines: 4600 processors, 35000 GB memory achieving 5-minute execution.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX.
  • Develop Python, Pyspark, HIVE scripts to filter/map/aggregate data. Scoop to transfer data to and from Hadoop.
  • Configured the project on WebSphere 6.1 application servers
  • Developed a Machine Learning test-bed with 24 different model learning and feature learning algorithms.
  • By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
  • Developed in-disk, huge (100GB+), highly complex Machine Learning models.
  • Used SAX and DOM parsers to parse the RAW XML documents
  • Used RAD as Development IDE for web applications.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Redesigned Interactive Visualization graphs in D3.js
  • Used DataQuality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on DataModeling tools ErwinDataModeler to design the DataModels.
  • Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional DataModels using Star and Snow flake Schemas.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Interaction with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and functionality for various project solutions.

Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, San Antonio, TX

Data Scientist

Roles & Responsibilities

  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, and time, Date and Time etc.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
  • Analyze traffic patterns by calculating autocorrelation with different time lags.
  • Ensured that the model has low False Positive Rate.
  • Addressed over fitting by implementing of the algorithm regularization methods like L2 and L1.
  • Used Principal Component Analysis in feature engineering to analyze high dimensional data.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
  • Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database.
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, Matlab, Spark SQL, Pyspark.

Confidential, Albany, NY

Data Analytics Engineer/Data Scientist

Roles & Responsibilities

  • Provided Configuration Management and Build support for more than 5 different applications, built and deployed to the production and lower environments.
  • Implemented public segmentation using unsupervised machine learning algorithms by implementing k-means algorithm using Pyspark.
  • Explored and Extracted data from source XML in HDFS, preparing data for exploratory analysis using data munging.
  • Responsible for different Data mapping activities from Source systems to Teradata
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS
  • Used R and python for Exploratory Data Analysis, A/B testing, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
  • Created clusters to classify Control and test groups and conducted group campaigns.
  • Analyzed and calculated the lifetime cost of everyone in the welfare system using 20 years of historical data.
  • Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Developed triggers, stored procedures, functions and packages using cursors and ref cursor concepts associated with the project using Pl/SQL
  • Created various types of data visualizations using R, python and Tableau.
  • Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
  • Identified and targeted welfare high-risk groups with Machine learning algorithms.
  • Conducted campaigns and run real-time trials to determine what works fast and track the impact of different initiatives.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right datasets for Tableau dashboards
  • Perform analyses such as regression analysis, logistic regression, discriminant analysis, cluster analysis using SAS programming.
  • Used Meta data tool for importing metadata from repository, new job categories and creating new data elements.
  • Scheduled the task for weekly updates and running the model in workflow. Automated the entire process flow in generating the analysis and reports.

Environment: R 3.x, HDFS, Hadoop 2.3, Pig, Hive, Linux, R-Studio, Tableau 10, SQL Server, MS Excel, Pypark.

Confidential, Pittsburgh, PA

Data Analyst

Roles & Responsibilities

  • Analyzed data sources and requirements and business rules to perform logical and physical data modeling.
  • Analyzed and designed best fit logical and physicaldatamodels and relational database definitions using DB2. Generated reports of data definitions.
  • Involved in Normalization/De-normalization, Normal Form and database design methodology.
  • Maintained existing ETL procedures, fixed bugs and restored software to production environment.
  • Developed the code as per the client's requirements using SQL, PL/SQL and Data Warehousing concepts.
  • Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
  • Worked with Data Warehouse Extract and load developers to design mappings for Data Capture, Staging, Cleansing, Loading, and Auditing.
  • Developed enterprise data model management process to manage multiple data models developed by different groups
  • Designed and created Data Marts as part of a data warehouse.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2.
  • Using Erwin modeling tool, publishing of a data dictionary, review of the model and dictionary with subject matter experts and generation of data definition language.
  • Coordinated with DBA in implementing the Database changes and also updating DataModels with changes implemented in development, QA and Production.Worked Extensively with DBA and Reportingteam for improving the ReportPerformance with the Use of appropriate indexes and Partitioning.
  • Developed Data Mapping, Transformation and Cleansing rules for the Master Data Management Architecture involved OLTP, ODS and OLAP.
  • Tuned and coded optimization using different techniques like dynamic SQL,dynamic cursors, and tuningSQL queries, writing generic procedures, functions and packages.
  • Experienced in GUI, Relational Database Management System (RDBMS), designing of OLAP system environment as well as Report Development.
  • Extensively used SQL, T-SQL and PL/SQL to write stored procedures,functions, packages and triggers.
  • Analyzed of data report were prepared weekly, biweekly, monthly using MS Excel, SQL & UNIX.

Environment: ER Studio, Informatica Power Center 8.1/9.1, Power Connect/ Power exchange, Oracle 11g, Mainframes,DB2 MS SQL Server 2008, SQL,PL/SQL, XML, Windows NT 4.0, Tableau, Workday, SPSS, SAS, Business Objects, XML, Tableau, Unix Shell Scripting, Teradata, Netezza, Aginity.

Confidential

Data Analyst

Roles & Responsibilities

  • Understood and articulated business requirements from user interviews and then convert requirements into technical specifications. Effectively communicated with the SMEs to gather the requirements.
  • Worked on Regression in performing Safety Stock and Inventory Analysis using R and performed data visualizations using Tableau and R.
  • Used SQL to retrieve data from the Oracle database for data analysis and visualization and performed Inventory Analysis with Statistical and Data Visualization Tools.
  • Followed the RUP based methods using Rational Rose to create Use Cases, Activity Diagrams / State Chart Diagrams, Sequence Diagrams.
  • Designed different type of STAR schemas for detailed data marts and plan data marts in the OLAP environment.
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
  • Performed Decision Tree Analysis and Random forests for strategic planning and forecasting and manipulating and cleaning data using dplyr and tidyr packages in R.
  • Involved in data analysis and creating data mapping documents to capture source to target transformation rules.
  • Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, fact-less Fact, snowflake and star schemas.
  • Wrote, executed, performance tuned SQL Queries for Data Analysis& Profiling and wrote complex SQL queries using joins, sub queries and correlated sub queries.
  • Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
  • Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
  • Wrote test cases, developed Test scripts using SQL and PL/SQL for UAT.
  • Worked with the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
  • Transferred data from various OLTP data sources, such as Oracle, MS Access, MS Excel, Flat files, CSV files into SQL Server.
  • Performed data testing, tested ETL mappings (Transformation logic), tested stored procedures, and tested the XML messages.
  • Created Use cases, activity report, logical components to extract business process flows and workflows involved in the project using Rational Rose, UML and Microsoft Visio.

Environment: R, SQL, Tableau, SSRS, Oracle, T-SQL, UNIX Shell Scripting, DB2.

We'd love your feedback!