We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Engineer Resume

Dayton, Oh


  • Over 8 years of experience in Machine learning with large datasets of structured and unstructured data, Predictive modelling, Data analysis, Data acquisition, Data validation and Data visualization.
  • Hands - on experience with Machine Learning algorithms such as Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools.
  • Data Scientist with proven expertise in Data Analysis, Machine Learning, and Modeling.
  • Experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K-Means Clustering and Association Rules.
  • Experience in applying predictive modeling and machine learning algorithms for analytical reports.
  • Experience using technology to work efficiently with datasets such as scripting, data cleaning tools, statistical software packages.
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Very Strong in Python, statistical analysis, tools, and modeling.
  • Experienced in Machine Learning and Statistical Analysis with Python Scikit-Learn.
  • Strong programming skills in a variety of languages such as Python, R and SQL.
  • Valuable experience working with large datasets and Deep Learning algorithms with Tensor Flow.
  • Worked on various applications using Python integrated IDEs such as Anaconda and PyCharm.
  • Used several Python modules and controls to rapidly build the application.
  • Experience in Data Cleaning, Transformation, Integration, Data Imports and Data Exports.
  • Experienced with machine learning algorithm such as logistic regression, random forest, Xgboost, KNN, SVM, neural network, linear regression, and k-means.
  • Good Knowledge in Data Validation, Data Cleaning, Data Verification and Identifying data mismatch.
  • Experienced with Machine Learning, Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools.
  • Experienced with tuning parameters for different machine learning models to improve performance.
  • Interacted with various clients, teams to update and modify deliverables to meet the business needs.
  • Have hands on experience in applying SVM, Random Forest, K means clustering.
  • Experienced in writing complex SQL Queries like Stored Procedures, triggers, joints, and Sub Queries.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning models, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Used Python to generate regression models to provide statistical forecasting and applied Clustering Algorithms such as K-Means to categorize customers into certain groups.
  • Performed data manipulation, data preparation, normalization, and predictive modeling. Improved efficiency and accuracy by evaluating model in Python.
  • Worked on SQLServer concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
  • Experience building and optimizing big data pipelines, architectures, and data sets Hadoop, Spark, Hive and Python.
  • Experience implementing machine learning back-end pipeline Spark ML-lib, Scikit-learn, Pandas, Numpy.
  • Working knowledge of extract, transform, and Load (ETL) components and process flow using Talend
  • Experience with process mining with Microsoft Visio.
  • Experience with AWS cloud services EC2, S3.
  • Experience with Building and implementing architecture roadmaps for next generation Artificial Intelligence solutions for clients.


Programming Languages: Python, SQL, R

Scripting Languages: Python

ETL Tool: Talend

Data Sources: SQL Server, Excel

Data Visualization: Tableau, Power BI, SSRS

Predictive and Machine Learning: Linear Regression, Logistic regression, Principal Component Analysis (PCA), K-means, Random Forest, Decision Trees, SVM, K-NN, Deep learning, Time Series Analysis and Ensemble methods

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Spark

Operating System: Linux, Windows, Unix.


Confidential, Dayton, OH.

Data Scientist/Machine Learning Engineer


  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
  • Coded R functions to interface with CaffeDeepLearningFramework.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machinelearning algorithms.
  • Installed and used CaffeDeepLearningFramework
  • Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
  • Setup storage and data analysis tools in AmazonWebServicescloud computing infrastructure.
  • Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with customvisualizationtools using R, Mahout, Hadoop and MongoDB.
  • Worked as DataArchitects and ITArchitects to understand the movement of data and its storage and ERStudio9.7.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, SparkStreaming, MLLib, Python, a broad variety of machinelearning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used SparkDataframes, Spark-SQL, SparkMLLib extensively and developing and designing POC's using Scala, SparkSQL and MLlib libraries.
  • Used DataQualityValidation techniques to validate CriticalDataElements (CDE) and identified various anomalies.
  • Extensively worked on DataModeling tools ErwinDataModeler to design the DataModels.
  • Developed various Qlik-View DataModels by extracting and using the data from various sources files, DB2, Excel, FlatFiles and Bigdata.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed GapAnalysis.
  • DataManipulation and Aggregation froma different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
  • Implemented AgileMethodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
  • Programmed a utility in Python that used multiple packages (Scipy, Numpy, Pandas).
  • Implemented Classification using supervised algorithms like LogisticRegression, DecisionTrees, KNN, Naive Bayes.
  • Designed both 3NF data models for ODS, OLTP systems and DimensionalDataModels using Star and SnowflakeSchemas.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
  • Designed and developed UseCase, ActivityDiagrams, SequenceDiagrams, OOD (ObjectorientedDesign) using UML and Visio.
  • Interaction with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and functionality for various project solutions
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.

Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, New York, NY

Data Scientist /Machine Learning


  • A highly immersive DataScience program involving DataManipulation&Visualization, Web Scraping, MachineLearning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Installed and used CaffeDeepLearningFramework
  • Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
  • Participated in all phases of datamining; datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis.
  • Developing Voice Bot using AI (IVR ), improving the interaction between Human and the Virtual Assistant
  • Implemented Event Task for execute Application Automatically
  • Involved in developing Patches & Updates Module.
  • Setup storage and dataanalysis tools in AmazonWebServices cloud computing infrastructure.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machinelearningalgorithms.
  • Development and Deployment using Google Dialogflow Enterprise.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7
  • Data visualizationusingElasticsearch, Kibana and Logstash in python.
  • Used Kibana an open source plugin for Elasticsearch in analytics and Data visualization.
  • DataManipulation and Aggregation from different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python and build models using deep learning frameworks
  • Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model
  • Implemented Agile Methodology for building an internal application.
  • Extracting the source data from Oracle tables, MS SQL Server, sequential files and excel sheets
  • Migrating Informatica mappings from SQL Server to Netezza Foster culture of continuous engineering improvement through mentoring, feedback, and metrics
  • Broad knowledge of programming, and scripting (especially in R / Java / Python)
  • Developing and maintaining Data Dictionary to create metadata reports for technical and business purpose.
  • Predictive modeling using state-of-the-art methods
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Parse and manipulate raw, complex data streams to prepare for loading into an analytical tool.
  • Build and maintain dashboard and reporting based on the statistical models to identify and track key metrics and risk indicators
  • Proven experience building sustainable and trustful relationships with senior leader
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors
  • Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
  • Attained good knowledge in Hadoop Data Lake Implementation and HADOOP Architecture for client business data management.
  • Extracted data from HDFS and prepared data for exploratory analysis using datamunging

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce, Google Dialog Flow.

Confidential, New York, New York.

Data Analyst /Data Scientist


  • Wrote the conversion rules in the Data mapping document and updating according to the business needs and the impacts identified during the conversion.
  • Actively participated in gathering requirements from various SMEs, Financial Analysts, Risk Analysts, and Portfolio Management via JAD/JAR sessions.
  • Worked on data modeling and produced data mapping and data definition documentation.
  • Performed Data mapping, logical data modeling, created class diagrams and ER diagrams and used SQL queries to filter data.
  • Extracted data from different sources like Oracle and MS Access, Excel, and text files using SAS/Access, SAS SQL procedures and created SAS required datasets using survey programming and/or data processing.
  • Gathered data from the legacy system (db2) to the present updated system using data warehouse tools such as Informatica.
  • Project experience in Data mining, segmentation analysis, business forecasting and association rule mining using Large Data Sets with Machine Learning.
  • Worked with BI team in gathering the report requirements and also Sqoop to export data into HDFS and Hive
  • Involved in the below phases of Analytics using R, Python and Jupyter notebook.
  • Identified business rules for data migration and Perform data administration through data models and metadata.
  • Designed, developed, and tested software to implement the models using SAS statistics procedures.
  • Functioned as the primary liaison between the business line, operations, and the technical areas throughout the project cycle. Worked with ODS and OLAP system.
  • Worked with Market Risk and Operational Risk team which were part of a different initiative.
  • Worked with Internal modeling team for data analysis and profiling.
  • Performed ETL Informatica development task's like creating jobs using different stages, debugging etc
  • Performed data mapping, logical data modeling, created class diagrams and ER diagrams and used SQL queries to filter data within the Oracle database.
  • Resolved the data related issues such as: assessing data quality, data consolidation, evaluating existing data sources.
  • Worked closely with Data Architect to review all the conceptual, logical and physical database design models with respect to functions, definition, maintenance review and support Data analysis, Data Quality and ETL design that feeds the logical data models.
  • Created financial package that supports 3-year financial plan for all AWS cloud services infrastructure expenses.
  • Analyzed the source data coming from various data sources like Mainframe & Oracle.
  • Created data mapping documents mapping Logical Data Elements to Physical Data Elements and Source Data Elements to Destination Data Elements.
  • Defined key facts and dimensions necessary to support the business requirements.
  • Worked on ETL design for various source systems.
  • Developed test cases for System, Integration, Functional, Progression and Regression testing, in PyUnit framework and executed test cases using Python scripts to validate products and software applications.
  • Extracted the source tables into SQL server and also MS-Access databases and extracting the target tables from Oracle and Teradata databases.
  • Mined data from complex perspectives and summarized it into Excel.
  • Designed reports using Tableau, tabular forms, Pivot tables and Charts on Excel.
  • Wrote complex SQL queries for checking the counts and for validating the data at field level.
  • Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, OOD using UML.
  • Identified, researched, investigated, analyzed, defined, and documented business processes and Use Case Scenarios
  • Worked with project managers supporting Amazon Web Services (AWS) migration.
  • Wrote complex queries to extract and insert data from and into Teradata Databases.
  • Wrote packages to fetch complex data from different tables in remote databases using joins, sub queries
  • Validated Data to check for the proper conversion of the data. Data Cleansing to identify bad data and clean the data. Data profiling for accuracy, completeness, consistency.
  • Reviewed all the systems design by assuring adherence to defined requirements.
  • Met with user groups to analyze requirements and proposed changes in design and specifications.
  • Flat file conversion from the data warehouse scenario.
  • Created Static and Dynamic Parameters at the report level.
  • Performed migration of Reports (Crystal Reports, and Excel) from one domain to another domain using Import/Export Wizard.
  • Performed a wide range of QA testing to ensure data integrity and coordinated UAT to meet or exceed specified standards and end-user requirements.

Environment: UNIX, Mainframe, DB2, SAS, MS SQL Server, MS Visio, Oracle, Web Logic, MS Project, IIS, XML, Rational RequisitePro, UML, Business Objects, RUP, SSIS, Teradata, AWS, Python, Informatica 8.6.1, Cognos, WinSQL, QTP 9.2, Quality Center 9.2, TOAD, Oracle 9i/10g, PL/SQL, IBM DB2, VBA MS Excel

Confidential - DUBLIN, CA

Data Analyst


  • Gathered requirements, analyzed and wrote the design documents.
  • Provided End-user Training and documentation for customer reporting services.
  • Attended JAD sessions and created reports for Budgeting, Finance and Project Management departments of Pima County.
  • Performed data profiling in the source systems that are required for Dual Medicare Medicaid Datamart.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the trumping rules applied by Master Data Repository
  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines
  • Involved in defining the source to target data mappings, business rules, business and data definitions
  • Responsible for defining the key identifiers for each mapping/interface
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team
  • Reverse engineered all the Source Database's using Embarcadero.
  • Designed and implemented data integration modules for Extract/Transform/Load (ETL) functions
  • Involved in data warehouse and Data mart design
  • Experience with various ETL, data warehousing tools and concepts
  • Good Experience with Mainframe enterprise billing systems Involved in defining the business/transformation rules applied for Dual Medicare Medicaid data.
  • Used data analysis techniques to validate business rules and identify low quality for Missing data in the existing Humana Enterprise data warehouse (EDW).
  • Performed daily data queries and prepared reports on daily, weekly, monthly, and quarterly basis.
  • Used advanced Excel functions to generate spreadsheets and pivot tables.
  • Worked with the users to do the User Acceptance Testing (UAT).
  • Created Technical specifications documents based on the functional design document for the ETL coding to build the data mart.
  • Extensively involved in Data Extraction, Transformation and Loading (ETL process) from Source to target systems.
  • Wrote SQL queries for custom reports
  • Worked on daily basis with lead Data Warehouse developers to evaluate impact on current implementation, redesign of all ETL logic.
  • Worked on exporting reports in multiple formats including MS Word, Excel, CSV and PDF.

Environment: SQL/Server, Oracle10&11g, MS-Office, Embarcadero, Teradata, Enterprise Architect, ETL (Extract, Transform, Load), ER Studio, XML, PowerPoint, MS Visio and MS Outlook.


SQL Developer


  • Created and executed SQL Server Integration Service packages to populate data from the various data sources, created packages for different data loading operations for many applications.
  • Created SSIS Packages using SSIS Designer for exporting heterogeneous data from OLE DB Source, Excel Spreadsheet to SQL Server.
  • Defined report layouts including report parameters and wrote queries for drill down reports as per client's requirements using SSRS 2008.
  • Generated multiple Enterprise reports (SSRS/Crystal) from OLTP which included various reports.
  • Migrated DTS packages to SSIS packages and modified the packages with new features of SQL SSIS.
  • Written complex SQLs using joins, sub queries and correlated sub queries. Expertise in SQL Queries for cross verification of data.
  • Performed backup/restore, database objects such as tables, procedures, triggers, constraints, indexes and views
  • As a part of optimization: Used MAP operations to route UPDATE and INSERT records in warehouse workflows.
  • Written complex SQLs using joins, sub queries and correlated sub queries. Expertise in SQL Queries for cross verification of data.
  • Reviewed the SQL for missing joins & join constraints, data format issues, miss-matched aliases, casting errors.
  • Responsible for Design, Data Mapping Analysis, Mapping rules.
  • Created dimension model for reporting system by identifying required dimensions and facts using ERWIN.
  • Designed Reusable Transformations using the Transformation Developer & created Mapping Parameters and Variables.
  • Worked with Database Admins to setup and create connection to Teradata / Oracle / SQL Server Database with Data stage.
  • Involved in Unit testing, System testing to check whether the data loads into target are accurate, which was extracted from different source systems according to the user requirements.
  • Interacted with the Business Analysts and the testing team for the testing of the ETL processes.

Environment: SQL Server 2012/2016 Enterprise Edition, Enterprise Manager, JavaScript, DAX, ERWIN, UML, MS Project, Windows 2003 Server, .NET, C#, ASP.NET, SSIS, SSRS, SSAS

Hire Now