We provide IT Staff Augmentation Services!

Machine Learning/data Scientist Resume

O Fallon, MO


  • Experience in Data Science and Analytics including Machine Learning, Data Mining, and Data Blending &Statistical Analysis.
  • Over 7 years’ Experience with Machine Learning techniques and algorithms (such as k - NN, Naive Bayes, etc.).
  • Experience in AWS (Amazon Web Services) EC2, VPC, IAM, IAM, S3, Cloud Front, Cloud Watch, Cloud Formation, Glacier, RDS Config, Route53, SNS, SQS, Elastic Cache.
  • Azure Cloud Extensive full cycle Cloud Azure experience with full Big Data, Elastic search and SOLR, Machine Learning and Deep Learning development and deployment.
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0, R 3.0 (ggplot2, Caret, dplyr) and Excel.
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQLServer2008
  • Experience in Big Data technologies like Spark1.6, Sparksql, pySpark, Hadoop2.X, HDFS, and Hive1. X.
  • Experience in visualization tools likeTableau9.X, 10.X, Data Blending for creating dashboards.
  • Proficient in Predictive Modeling, Datamining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.
  • Expert in creating PL/SQLSchemaobjects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Experience building and optimizing big Data pipelines, architectures, and Data sets Hadoop, Spark, Hive and Python.
  • Experience implementing machine learning back-end pipeline Spark ML-lib, Scikit-learn, Pandas, Numpy.
  • Working knowledge of extract, transform, and Load (ETL) components and process flow using Talend
  • Experience with process mining with Microsoft Visio.
  • Experience with AWS cloud services EC2, S3.
  • Experience with Building and implementing architecture roadmaps for next generation Artificial Intelligence solutions for clients.


Programming Languages: Python, SQL, R

Scripting Languages: Python ETL Tool Talend

Data Sources: SQL Server, Excel

Data Visualization: Tableau, Power BI, SSRS

Predictive and Machine Learning: Linear Regression, Logistic regression, Principal Component Analysis (PCA), K-means, Random Forest, Decision Trees, SVM, K-NN, Deep learning, Time Series Analysis and Ensemble methods

Big Data Ecosystems: Hadoop, Map Reduce, HDFS, HBase, Hive, Spark Operating System Linux, Windows, Unix


Confidential, O’Fallon, MO

Machine Learning/Data Scientist


  • Effective software development processes to customize and extend the computer vision and image processing techniques to solve new problems for Automation Anywhere.
  • Develop and implement innovative data quality improvement tools.
  • Involved in Peer Reviews, Functional and Requirement Reviews.
  • Develop project requirements and deliverable timelines; execute efficiently to meet the plan timelines.
  • Creating and support a data management workflow from data collection, storage, analysis to training and validation.
  • Involved with Data Analysis Primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
  • Well experienced in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Understanding requirements, significance of weld point data, energy efficiency using large datasets.
  • Develop necessary connectors to plug ML software into wider data pipeline architectures.
  • Creating and support a data management workflow from data collection, storage, analysis to training and validation.
  • Wrangled data, worked on large datasets (acquired data and cleaned the data), analyzed trends by making visualizations using matplotlib and python.
  • Experience with Tensor Flow, Theano, Keras and other Deep Learning Frameworks.
  • Built Artificial Neural Network using Tensor Flow in Python to identify the customer's probability of cancelling the connections. (Churn rate prediction)
  • Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
  • Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning
  • Developed NLP models for Topic extraction, Sentiment Analysis
  • Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
  • Design and build scalable software architecture to enable real-time / big-data processing.
  • Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
  • ML performance a deep analysis of the HTPD/RTPD/LTPD test data to define a model of FBC growth rate across the temperature.
  • ML models for projection pre-production SLC, MLC, and TLC single and multi-die packages ICC memory.
  • Used Tensor Flow library in dual GPU environment for training and testing of the Neural Networks
  • Develop necessary connectors to plug ML software into wider data pipeline architectures.
  • Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
  • Design and build scalable software architecture to enable real-time / big-data processing.
  • Taking responsibility for technical problem solving, creatively meeting product objectives and developing best practices.

Environment: R 9.0, R Studio, Machine learning, Informatica a 9.0, Scala, Spark, Cassandra, DL, Scikit-learn, Shogun, Data Warehouse, MLLib, Cloud era Oryx, Apache.

Confidential, Wilton, CT

Data Scientist/Machine Learning


  • Analyzed Trading mechanism for real-time transactions and build collateral management tools.
  • Compiled data from various sources to perform complex analysis for actionable results.
  • Utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Measured Efficiency of Hadoop/Hive environment ensuring SLA is met.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance.
  • Analyzing the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program. Used Tensor Flow to train the model from insightful data and look at thousands of examples.
  • Designing, developing and optimizing SQL code (DDL / DML).
  • Building per formant, scalable ETL processes to load cleanse and validate data.
  • Expertise in Data archival and Data migration, ad-hoc reporting and code utilizing SAS on UNIX and Windows Environments.
  • Tested and debugged SAS programs against the test data.
  • Processed the data in SAS for the given requirement using SAS programming concepts.
  • Imported and Exported data files to and from SAS using Proc Import and Proc Export from Excel and various delimited text-based data files such as .TXT (tab delimited) and .CSV (comma delimited) files into SAS datasets for analysis.
  • Expertise in producing RTF, PDF, HTML files using SAS ODS facility.
  • Providing support for data processes. This will involve monitoring data, profiling database usage, trouble shooting, and tuning and ensuring data integrity.
  • Participating in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other agile methodologies.
  • Collaborate with team members and stakeholders in design and development of data environment.
  • Learning new tools and skill sets as needs arise.
  • Preparing associated documentation for specifications, requirements and testing.
  • Optimizing the Tensor flow Model for efficiency.
  • Used Tensor flow for text summarization.
  • Used Spark API over Cloud era Hadoop YARN to perform analytics on data in Hive.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Developed Kafka producer and consumers for message handling.
  • Responsible for analyzing multi-platform applications using python.
  • Used storm for an automatic mechanism to analyze large amounts of non-unique data points with low latency and high throughput.
  • Developed Map Reduce jobs in Python for data cleaning and data processing.

Environment: Machine learning, AWS, MS Azure, Cassandra, SAS, Spark, HDFS, Hive, Pig, Linux, Anaconda Python, MySQL, Eclipse, PL/SQL, SQL connector, SparkML.


SQL Developer


  • Performed ETL development by using queries, SSIS packages, Views and Triggers.
  • Design, create and implement database systems based on the end user's requirements.
  • Utilized SQL in a daily basis in creating customs view for data and business analysis.
  • Efficient in using Inner, Outer and Cross Joins, sub-queries for complex queries involving multiple tables.
  • Created tables, views, stored procedures and functions in SQL Server.
  • Prepared documentations for database applications.
  • Developed database schemas, tables and dictionaries.
  • Automated the mailing of reports and jobs processing in SQL Server SSMS.
  • Performed calculations and aggregations using inbuilt functions in SSMS.
  • Developed and modified existing SSIS package for loading data from CSV files into database.
  • Designed and developed reports using SSRS and EXCEL.
  • Created and scheduled jobs in the Server Management Studio.
  • Automated the jobs and mails in SSMS.
  • Test databases and perform bug fixes.
  • Knowledge in using all types of SQL Server Constraints, SQL Server Database design, Database maintenance.
  • Good knowledge in BI Development and Deployment of SSIS packages from Access, SQL and Oracle.

Environment: SQL Server, SSIS, SSRS, SSMS, ETL, Oracle, Role: Oracle PL/SQL.


Application Developer


  • Participated in the Database Analysis, design and Interacted with the business users and participated in requirement collection.
  • Analyzing the requirements - designing and developing the maps for extraction, transformation and loading of data into warehouse.
  • Development of complex mappings related to dimensional and fact tables.
  • Developed and modified code to make new enhancements or resolve problems, as per customer requirements.
  • Developed several stored procedures and functions using advanced Oracle concepts such as Bulk Binds, Bulk Collects Ref Cursors to improve performance.
  • Developed control files for SQL*Loader and PL/SQL scripts for Data Extraction/Transformation/Loading (ETL) and loading data into interface Tables from different systems and validating the data.
  • Extensively worked on the ETL mappings, analysis and documentation of OLAP reports requirements.
  • Involved in Performance tuning of the programs and ETL processes.
  • Used Data Stage Designer for developing various ETL jobs to extract, cleansing, transforming, integrating and loading data.
  • Strong understanding of Data Modeling in data warehouse environment such as Star schema and Snow flake schema. Partitions to further improve the performance while using tables containing large number of columns and rows.
  • Created database objects like Tables, views, procedures, functions, packages, materialized views, Index and sequences using SQL and PL/SQL.
  • Used Data modeling tools for mapping the parent child tables before creation of new objects for a specific business requirement.
  • Supported the application when in production and exercised strong debugging skills in resolving time sensitive issues.

Environment: SQL Navigator, Erwin, SQL*Plus, UNIX, Oracle 12c, Windows, SQL*Loader. Oracle 9i/10g, Forms 6i/10g, SQL*LOADER, TOAD, SQL*Loader, Toad.

Hire Now