We provide IT Staff Augmentation Services!

Sr. Data Scientist/machine Learning Engineer Resume

Reston, VA


  • Around 8 Years of experience in Machine Learning, Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit - learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as Linear Regression, Multivariate Regression, Naive Bayes, Random Forests, K-Means, & KNN for Data Analysis.
  • Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Expertise in transforming business requirements into Analytical Models, Designing Algorithms, Building Models, Developing Data Mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Developed Logical Data Architecture with adherence to Enterprise Architecture.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Adept in statistical programming languages like Rand also Python including Big Data technologies like Hadoop, Hive.
  • Skilled in using dplyr and pandas in R and Python for performing Exploratory data analysis.
  • Experience working with data modeling tools like Erwin, Power Designer and ERStudio.
  • Experience in designing star schema, Snow flake schema for Data Warehouse, ODS architecture.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
  • Analysed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modelling techniques.
  • Wrote Python modules to extract/load asset data from the MySQL source database.
  • Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3.JS for creating dashboards.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, Regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using Data Munging and Teradata.
  • Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.


Scripting/programming language: R (dplyr, ggplot2, shiny, plotly), Python (Numpy, Scipy, Pandas, Scikit-learn, Matplotlib, NLTK, Beautiful Soup, Selenium, Python IDE), Pyspark

Machine learning/Deep learning: Classification, Regression(Linear, Logistic, Elastic Net), Clustering analyses using neuralnets (MLP), RF, KNN, SVM, GLM, MLR, Logit, K-means algorithms

Database management systems: RDBMS (Microsoft SQL server, Oracle DB, Teradata)

Big Data: MySQL, Spark, Hadoop/MapReduce, Hive, Impala

Statistical Analysis Tools: SAS Studio, SAS Enterprise Guide, SAS Enterprise Miner, Python, R, ggplot2, dplyr,cart, scipy,sklearn

Data storage/processing framework: Hadoop And Spark

Data visualization/reporting: Tableau, Power BI and shiny

Operating System: Windows, Unix

Case Tools: Erwin & ERStudio


Confidential, Reston, VA

Sr. Data Scientist/Machine Learning Engineer


  • Responsible for working with various teams on a project to develop analytics based solution to target roaming subscribers specifically.
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
  • Coded R functions to interface with CaffeDeepLearningFramework.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various machinelearning algorithms.
  • Combination of these elements (travel prediction & multi-dimensional segmentation) would enable operators to conduct highly targeted and personalized roaming services campaigns leading to significant subscriber uptake.
  • Installed and used CaffeDeep Learning Framework.
  • Scaled up to Machine Learning pipelines: 4600 processors, 35000 GB memory achieving 5-minute execution.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX.
  • Develop Python, Pyspark, HIVE scripts to filter/map/aggregate data. Scoop to transfer data to and from Hadoop.
  • Configured the project on WebSphere 6.1 application servers
  • Developed a Machine Learning test-bed with 24 different model learning and feature learning algorithms.
  • By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
  • Developed in-disk, huge (100GB+), highly complex Machine Learning models.
  • Used SAX and DOM parsers to parse the RAW XML documents
  • Used RAD as Development IDE for web applications.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Redesigned Interactive Visualization graphs in D3.js
  • Used DataQuality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on DataModeling tools ErwinDataModeler to design the DataModels.
  • Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional DataModels using Star and Snow flake Schemas.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Interaction with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and functionality for various project solutions.

Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, Plano, TX

Data Scientist/Data analyst


  • Developed the prediction model for crop yield, based on different kinds of field, weather and imagery data.
  • Exploratory data analysis and Feature engineering to best fit the regression model.
  • Designed a static pipeline in MS Azure for data ingestion and dashboarding. Used MS ML Studio for modeling and MS Power BI for dash boarding.
  • Analyze large datasets to provide strategic direction to the company.
  • Perform quantitative analysis of product sales trends to recommend pricing decisions.
  • Conduct cost and benefits analysis on new ideas.
  • Developing Models on SCALA and SPARK for users, prediction models, sequential algorithms
  • Used PANDAS, NUMPY, SEABORN, MATPLOTLIB, SCIKIT-LEARN, SCIPY, NLTK in Python for developing various machine learning algorithms.
  • Scrutinize and track customer behavior to identify trends and unmet needs.
  • Develop statistical models to forecast inventory and procurement cycles.
  • Assist in developing internal tools for Data Analysis.
  • Advising on the suitability of methodologies and suggesting improvements.
  • Carrying out specified data processing and statistical techniques.
  • Supplying qualitative and quantitative data to colleagues & clients.
  • Using Informatica & SAS to extract transform & load source data from transaction systems.
  • Creating data pipelines using big data technologies like Hadoop, pyspark etc.
  • Familiarity with Hadoop cluster environment and configurations for resource management for analysis works Python, Pyspark, HIVE for analytics and developing dashboards
  • Creating statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
  • Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, MapReduce, Pig and others.
  • Refine and train models based on domain knowledge and customer business objectives
  • Deliver or collaborate on delivering effective visualizations to support the client's objectives.
  • Produce solid and effective strategies based on accurate and meaningful data reports and analysis and/or keen observations.
  • Used Teradata utilities such as Fast Export, MLOAD for handling various tasks DATA MIGRATION/ETL from OLTP Source Systems to OLAP Target Systems
  • Developed web applications using .net technologies; work on bug fixes/issues that arise in the production environment and resolve them at the earliest.

Environment: Power BI, MS Azure, HDFS, SAS, Python, Pyspark, Informatica, Mapreduce,PIG, Hive, Unix,OLAP, OLTP,ODS, NLTK,XML, JSON etc.

Confidential, Boston, MA

Data Engineer/Data analyst


  • Developed scalable machine learning solutions within a distributed computation framework (e.g. Hadoop, Spark, Storm etc.).
  • Utilizing NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.
  • Creating automated anomaly detection systems and constant tracking of its performance, strong command of data architecture and data modelling techniques.
  • Knowledge in ML & Statistical libraries (e.g. Scikit-learn, Pandas).
  • Having knowledge to build predict models to forecast risks for product launches and operations and help predict workflow and capacity requirements for TRMS operations
  • Having experience with visualization technologies such as Tableau
  • Draw inferences and conclusions, and create Dashboards and Visualizations of processed data, identify trends, anomalies
  • Generation of TLFs and summary reports, etc. ensuring on-time quality delivery.
  • Participated in client meetings, teleconferences and video conferences to keep track of project requirements, commitments made and the delivery thereof Solved analytical problems, and effectively communicate methodologies and results.
  • Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partner teams.
  • Data mining using state-of-the-art methods
  • Extending company’s data with third party sources of information when needed
  • Enhancing data collection procedures to include information that is relevant for building analytic systems
  • Processing, cleansing, and verifying the integrity of data used for analysis
  • Doing ad-hoc analysis and presenting results in a clear manner.
  • Created automated metrics using complex databases.
  • Foster culture of continuous engineering improvement through mentoring, feedback, and metrics

Environment: HDFS,Hadoop,Tableau,Scikit-Learn,Pandas,Spark,Storm etc.

Confidential, Boston, MA

BIG Data Anayst/Engineer


  • Worked with Ajax API calls to communicate with Hadoop through Impala Connection and SQL to render the required data through it .These API calls are similar to Microsoft Cognitive API calls.
  • Good grip on Cloudera and HDP ecosystem components.
  • Used ElasticSearch (Big Data) to retrieve data into application as required.
  • Performed Map Reduce Programs those are running on the cluster.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Statistical Modelling with ML to bring Insights in Data under guidance of Principal Data Scientist Data modeling with Pig, Hive, Impala.
  • Ingestion with Sqoop, Flume.
  • Understanding and implementation of text mining concepts, graph processing and semi structured and unstructured data processing.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
  • Used Hive to partition and bucket data.
  • Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured data.
  • Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Worked on improving performance of existing Pig and Hive Queries.

Environment: HDFS, HBase, Flume, Sqoop, Pig, Hive, Impala, Java API, Mapreduce, elastic search, Amazon EC2 etc.

Confidential, Hopkinton, MA

Engineer - Data Analytics


  • Participated in all phases of research including requirement gathering, data cleaning, data mining, developing model and visualization.
  • Collaborated with Data analyst and others to get insights and understanding of the data.
  • Used R to manipulate and analyze data for solution. Packages were used for test mining.
  • Performed Data Mining, Data Analytics, Data Collection, and Data Cleaning.
  • Developing Models, Validation, Visualization and performed Gap analysis.
  • Cleansed and transformed the data by treating Outliers, Imputing the missing values.
  • Used predictive analysis to create models of customer behavior that are correlated positively with historical data and use these models to forecast future results.
  • Translate business needs into mathematical model and algorithms and build exceptional machine learning algorithms.
  • Tried and implemented multiple models to evaluate predictions and performance.
  • Application of various machine learning algorithms and statistical modelling like Logistic Regression, Decision tree, SVM, to identify Volume using Scikit-learn package in R & Python.
  • Performed Boosting method on predicted model for the improve efficiency of the model.
  • Improve efficiency and accuracy by evaluating model in R.
  • The model could predict 87.4% accurate.
  • Documented the visualizations and results and submitted to HR management.
  • Presented Dashboards to Higher Management for more Insights using Power BI and Tableau.
  • Working knowledge of MapReduce coding, including Java, Python, Pig programming, Hadoop Streaming, Hive for data analysis of production applications.

Environment: R/R studio, Python, Tableau, MS SQL Server 2005/ 2008, MS Access.

Confidential, Washington, DC.

Associate Data Analyst


  • Performed exploratory analysis on research data and transformed raw information into insight and provided recommendations using Tableau, Advanced Excel Macros, PL/SQL and Omniture analytics tool.
  • Derived insights from machine learning algorithm using SAS to analyze web log files and campaign data to recommend/improve promotional opportunities
  • Tuned PL/SQL query procedures and scheduled cronjobs in apache server to update/enhance business logics and process. I have strong SQL skills with the ability to write complex SQL statements that analyze data and create prototypes.
  • Interacted with clients for system study, requirements gathering, business analysis and scoping for modification in the existing system
  • Developed an automated report generation tool that improved efficiency by 50% and increased revenue by 80% with technologies such as JAVA, Python, Shell Scripting in Apache httpd LINUX server.
  • Created SQL procedures, functions, computed query performance tunings and index optimization for database efficiency.
  • Familiarity with several software delivery methodologies (RUP/Agile/Waterfall). Strong knowledge of requirement gathering, use case analysis and user acceptance testing.
  • Troubleshooting, tracking and investigating data related issues and providing KPIs and campaign end metrics with actionable insights using SAS and Tableau.

Environment: PL/SQL, Tableau, Java, Python, Shell scripting, SAS etc.

Hire Now