We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Engineer Resume

New York City, NY


  • Over 8+ years of Experience in DataArchitecture, Design, Development and Testing of business application systems, DataAnalysis and developingConceptual, logicalmodels and physicaldatabase design for OnlineTransactionalProcessing (OLTP) and OnlineAnalyticalProcessing (OLAP) systems.
  • Experienced with MachineLearningAlgorithm such as LogisticRegression, KNN, SVM, RandomForest, NeuralNetwork, LinearRegression, LassoRegression and K - Means.
  • Experienced working with datamodelingtools like Erwin, PowerDesigner and ERStudio.
  • Experienced in designing starschema, Snowflakeschema for DataWarehouse, and ODSarchitecture.
  • Experienced in DataModeling&DataAnalysis experience using DimensionalDataModeling and RelationalDataModeling, StarSchema/SnowflakeModeling, FACT,Dimensionstables, Physical&LogicalDataModeling.
  • Experienced in big data analysis and developing data models using Hive, PIG and MapReduce, SQL with strong data architecting skills designing data-centric solutions.
  • Very good knowledge and experience on AWS, Redshift, S3 and EMR.
  • Excellent development experience SQL, ProceduralLanguage(PL) of databases like Oracle, Teradata, Netezza and DB2.
  • Very good knowledge and working experience on big data tools like Hadoop, AzureDataLake, AWSRedshift.
  • Expertise in synthesizing MachineLearning, PredictiveAnalytics and Bigdatatechnologies into integrated solutions.
  • Creating from scratch MachineLearning and NLP solutions for BigData on top of Spark using Scala.
  • Extensively experienced in working with structured data using HiveQL, join operations, writing custom UDF's and experienced in optimizing HiveQueries.
  • Experienced in MachineLearning and StatisticalAnalysis with PythonScikit-Learn.
  • Experience in using various packages in Rand python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, BeautifulSoup, Rpy2.
  • Excellent knowledge of MachineLearning, MathematicalModeling and OperationsResearch. Comfortable with R, Python, SAS and Weka, MATLAB, Relationaldatabases. Deep understanding & exposure of BigDataEco-system.
  • Expert in creating PL/SQLSchemaobjects like Packages, Procedures, Functions, Subprograms, Triggers, Views, MaterializedViews, Indexes, Constraints, Sequences, ExceptionHandling, DynamicSQL/Cursors, NativeCompilation, CollectionTypes, RecordType, ObjectType using SQLDeveloper.
  • Experienced in Python to manipulate data for dataloading and extraction and worked with pythonlibraries like Matplotlib, Numpy, Scipy and Pandas for dataanalysis.
  • Hands on Experience in implementing ModelViewControl (MVC) architecture using Spring, JDK, CoreJava (Collections, OOPSConcepts), JSP, Servlets, Struts, springs, Hibernate, JDBC.
  • Strong knowledge of SoftwareDevelopmentLifeCycle (SDLC) including Waterfall and Agile development
  • Created data visualizations with Tableau and provided ServerAdministrator duties LogicalPosition
  • Strong experience in application development using Java/J2EE technologies which includes implementing ModelViewControl (MVC) architecture using Spring, JDK 1.6, CoreJava (Collections, OOPSConcepts), JSP, Servlets, Struts, springs, Hibernate, WebServices, AJAX, JDBC, HTML andJavaScript.
  • Worked with complex applications such as R, SAS, Matlab and SPSS to develop neuralnetwork, clusteranalysis.
  • Skilled in performing dataparsing, datamanipulation and datapreparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
  • Experienced in statistical data analysis like Chi-square, T-test, Dimensionalityreduction methods like PCA, LDAand feature selection methods.
  • Worked withNoSQLDatabase including Hbase, Cassandra and MongoDB.
  • Experienced in BigData with Hadoop, HDFS, MapReduce, and Spark.
  • Extensive experience in development of T-SQL, DTS, OLAP, PL/SQL, StoredProcedures, Triggers, Functions, Packages, performance tuning and optimization for business logic implementation.
  • Experienced using query tools like SQLDeveloper, PLSQLDeveloper, and TeradataSQLAssistant.
  • Excellent in performing data transfer activities between SAS and various databases and data file formats like XLS,CSV,DBF,MDB etc.
  • Extensively worked with Teradata utilities BTEQ, Fastexport, and Multi-Load to export and loaddata to/from different source systems including flat files.
  • Expertise in extracting, transforming and loading data between homogeneous and heterogeneous systems like SQLServer, Oracle, DB2, MSAccess, Excel, FlatFile etc. using SSIS packages.
  • Experience in UNIXshellscripting, Perlscripting,and automation of ETLProcesses.
  • Strong experience and knowledge in DataVisualization with TableaucreatingLine and scatterplots, BarCharts, Histograms, Piechart, Dotcharts, Boxplots, Timeseries, ErrorBars, MultipleChartstypes, MultipleAxes, subplots etc.
  • Excellent understanding and working experience of industry standard methodologies like SystemDevelopmentLifeCycle (SDLC), asperRationalUnifiedProcess (RUP), AGILEMethodologies.
  • Experience in source systems analysis and data extraction from various sources like Flatfiles, OracleIBMDB2UDB, XML files.
  • Experienced in developing Entity-Relationship diagrams and modeling TransactionalDatabases and Dataware house using tools like ERWIN, ER/Studio and PowerDesigner and experienced with modeling using ERWIN in both forward and reverse engineering cases.
  • Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.


Languages: Python, C, C++, SQL.

Machine Learning: Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Simple/Multiple linear, Classification, Clustering, Kernel SVM, K-Nearest Neighbours (K-NN).

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.

Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Packages: ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, seaborn, sciPy, matplot lib, scikit-learn, Beautiful Soup, Rpy2, sqlalchemy.

Web Technologies/Other: Django, Flask, Pyramid, Ajax, HTML5, CSS3, XML, JavaScript, jQuery, JSON, and Bootstrap.

IDE s: PyCharm, Emacs, Eclipse, NetBeans, Sublime, Pystudio, PyScripter.

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA.

Frameworks: Django, Flask, Bootstrap, Tornado, Pyramid.

Web Servers: JBoss 4.0.5, BEA Web Logic, Web Sphere, Apache Tomcat 5.5/6.0.

Version Controls: SVN, VSS, CVS, Git, GitHub.

Operating Systems: MS Windows, Linux/Unix, Ubuntu, Sun Solaris.

Building & Design Tools: JIRA, Bugzilla, Jasmine, Pyunit, Junit.

Methodologies: Agile, Scrum, Waterfall.


Confidential, New York City,NY

Data Scientist/Machine Learning Engineer


  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
  • Coded R functions to interface with CaffeDeepLearningFramework.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machinelearning algorithms.
  • Installed and used CaffeDeepLearningFramework
  • Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
  • Setup storage and data analysis tools in AmazonWebServicescloud computing infrastructure.
  • Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with customvisualizationtools using R, Mahout, Hadoop and MongoDB.
  • Worked as DataArchitects and ITArchitects to understand the movement of data and its storage and ERStudio9.7.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, SparkStreaming, MLLib, Python, a broad variety of machinelearning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used SparkDataframes, Spark-SQL, SparkMLLib extensively and developing and designing POC's using Scala, SparkSQL and MLlib libraries.
  • Used DataQualityValidation techniques to validate CriticalDataElements (CDE) and identified various anomalies.
  • Extensively worked on DataModeling tools ErwinDataModeler to design the DataModels.
  • Developed various Qlik-View DataModels by extracting and using the data from various sources files, DB2, Excel, FlatFiles and Bigdata.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed GapAnalysis.
  • DataManipulation and Aggregation froma different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
  • Implemented AgileMethodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAPDatabases/Cubes, Scorecards, Dashboards and Reports.
  • Programmed a utility in Python that used multiple packages (Scipy, Numpy, Pandas).
  • Implemented Classification using supervised algorithms like LogisticRegression, DecisionTrees, KNN, Naive Bayes.
  • Designed both 3NF data models for ODS, OLTP systems and DimensionalDataModels using Star and SnowflakeSchemas.
  • Updated Python scripts to match training data with our database stored in AWSCloudSearch, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
  • Designed and developed UseCase, ActivityDiagrams, SequenceDiagrams, OOD (ObjectorientedDesign) using UML and Visio.
  • Interaction with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and functionality for various project solutions
  • Interaction with BusinessAnalyst, SMEs, and other DataArchitects to understand Business needs and functionality for various project solutions
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and BusinessObjects.

Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, Monterey Park

Data Scientist


  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.
  • Built models using Statisticaltechniques like BayesianHMM and MachineLearningclassification models like XGBoost, SVM, and RandomForest.
  • Participated in all phases of datamining, datacleaning, datacollection, developingmodels, validation, visualization, and performed Gapanalysis.
  • A highly immersive DataScience program involving DataManipulation&Visualization, WebScraping, MachineLearning, Pythonprogramming, SQL, GIT, MongoDB, Hadoop.
  • Setup storage and data analysis tools in AWScloudcomputing infrastructure.
  • Installed and used CaffeDeepLearningFramework
  • Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
  • Worked as DataArchitects and ITArchitects to understand the movement of data and its storage and ERStudio9.7.
  • Developing Models on scala and Spark for users, predictionmodels, sequentialalgorithms .
  • Used pandas, numpy, seaborn, matplotlib, scikit-learn, scipy, NLTK in Python for developing various machinelearning algorithms.
  • Data Manipulation and Aggregation from different source using Nexus, BusinessObjects, Toad, PowerBI and SmartView.
  • Implemented AgileMethodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Coded proprietary packages to analyze and visualize SPCfile data to identify bad spectra and samples to reduce unnecessary procedures and costs.
  • Programmed a utility in Python that used multiple packages (numpy, scipy, pandas)
  • Implemented Classification using supervised algorithms like LogisticRegression, Decisiontrees, NaiveBayes, KNN.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Updated Python scripts to match training data with our database stored in AWSCloudSearch, so that we would be able to assign each document a response label for further classification.
  • Used Teradata utilities such as FastExport, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAPTargetSystems
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROCCurves and LiftCharts.

Environment: Unix, Python 3.5.2, MLLib, SAS, regression, logistic regression, Hadoop 2.7.4, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce.

Confidential, Englewood, Colorado

Data Scientist/R Developer


  • Designed an Industry standard data Model specific to the company with group insurance offerings, Translated the business requirements into detailed production level using WorkflowDiagrams, Sequence Diagrams, ActivityDiagrams and UseCaseModeling.
  • Involved in design and development of data warehouse environment, liaison to business users and technical teams gathering requirement specification documents and presenting and identifying data sources, targets and report generation.
  • Recommend and evaluate marketing approaches based on quality analytics of customer consuming behavior.
  • Determine customer satisfaction and help enhance customer experience using NLP.
  • Work on TextAnalytics, NaiveBayes, Sentimentanalysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
  • Conceptualized the most-used product module (Research Center) after building a business case for approval, gathering requirements and designing the User Interface
  • A team member of Analytical Group and assisted in designing and development of statisticalmodels for the end clients. Coordinated with end users for designing and implementation of e-commerce analytics solutions as per project proposals.
  • Conducted market research for client; developed and designed sampling methodologies, and analyzed the survey data for pricing and availability of clients' products. Investigated product feasibility by performing analyses that include market sizing, competitive analysis and positioning.
  • Successfully optimized codes in Python to solve a variety of purposes in datamining and machinelearning in Python.
  • Facilitated stakeholder meetings and sprint reviews to drive project completion.
  • Successfully managed projects using Agile development methodology
  • Project experience in Datamining, segmentationanalysis, businessforecasting and association rule mining using LargeDataSets with MachineLearning.
  • Automated Diagnosis of BloodLossduringAccidents and Applied MachineLearning algorithms to diagnose blood loss from vital signs (ECG, HF, GSR, etc.) . Demonstrated performances of 94.6% on par with state-of-the-art models used in industry

Environment: R, MATLAB, MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning), Python, Spark (MLlib, PySpark), Tableau, Micro Strategy, SAS, Tensor Flow, regression, logistic regression, Hadoop 2.7, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce

Confidential, New York, New York

Data Analyst


  • Data analysis and reporting using MySQL, MSPowerPoint, MSAccess and SQLassistant.
  • Involved in MySQL, MSPowerPoint, MSAccessDatabasedesign and design new database on Netezza which will have optimized outcome.
  • Involved in writing T-SQL, working on SSIS, SSRS, SSAS, DataCleansing, DataScrubbing and DataMigration.
  • Involved in writing scripts for loading data to target data Warehouse using Bteq, FastLoad, Multiload.
  • Create ETLscripts using RegularExpressions and custom tools (Informatica, Pentaho, and SyncSort) to ETLdata.
  • Developed SQLServiceBroker to flow and sync of data from MS-I to Microsoft's master database management (MDM).
  • Involved in loading data between Netezza tables using NZSQL utility.
  • Worked on Datamodeling using DimensionalDataModeling, StarSchema/SnowFlakeschema, and Fact&Dimensional, Physical&Logicaldatamodeling.
  • Generated Statspack/AWR reports from Oracledatabase and analyzed the reports for Oracle wait events, time consuming SQLqueries, tablespacegrowth, and databasegrowth.

Environment: MySQL, MS Power Point, MS Access, MY SQL, MS Power Point, MS Access, Netezza, DB2, T-SQL, DTS, SSIS, SSRS, SSAS, ETL, MDM, Teradata, Oracle, Star Schema and Snow Flake Schema.


Data Modeler


  • Communicated with other Health Care info by using WebServices with the help of SOAP, WSDLJAX-RPC.
  • Used Singleton, factorydesignpattern, DAODesignPatterns based on the application requirements
  • Used SAX and DOM parsers to parse the raw XML documents
  • Used RAD as DevelopmentIDE for web applications.
  • Preparing and executing Unit test cases
  • Used Log4J logging framework to write Log messages with various levels.
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX.
  • Configured the project on WebSphere 6.1 application servers.
  • Implemented MicrosoftVisio and RationalRose for designing the UseCaseDiagrams, Classmodel, Sequencediagrams, and Activitydiagrams for SDLC process of the application
  • Maintenance in the testing team forSystemtesting/Integration/UAT.
  • Guaranteeing quality in the deliverables.
  • Conducted Designreviews and Technicalreviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support
  • Created test plan documents for all back-end database modules
  • Implemented the project in Linuxenvironment.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLlib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, HIVE, AWS.


Data Analyst


  • Worked with internal architects, assisting in the development of current and target state data architectures.
  • Worked with project team representatives to ensure that logical and physicalER/Studio data models were developed in line with corporate standards and guidelines.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Implementatio of MetadataRepository, Transformations, MaintainingDataQuality, DataStandards, DataGovernanceprogram, Scripts, StoredProcedures, triggers and execution of test plans.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Involved in defining the source to business rules, target data mappings, datadefinitions.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Performed data quality in TalendOpenStudio.
  • Enterprise MetadataLibrary with any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.

Environment: Windows Enterprise Server 2000, SSRS, SSIS, Crystal Reports, DTS, SQL Profiler, and Query Analyze.

Hire Now