We provide IT Staff Augmentation Services!

Data Scientist/ Machine Learning Resume

Dublin, CA


  • Around 6 years of hands on experience and comprehensive industry knowledge of Machine Learning, Statistical Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
  • Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance.
  • Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema and Extended Star.
  • Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries.
  • Expertise in writing functional specifications, translating business requirements to technical specifications, created/maintained/modified database design document with detailed description of logical entities and physical tables.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco - system.
  • Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Center.
  • Proficient in Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.
  • Expertise in data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, ANOVA and other advanced statistical techniques.
  • Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool,
  • Experienced in building data models using machine learning techniques for Classification, Regression, Clustering and Associative mining.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL, SparkSQL, PySpark.
  • Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards.
  • Excellent Tableau Developer, expertise in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau(9.x/10.x).
  • Experienced in Agile methodology and SCRUM process.
  • Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.


Programming & Scripting Languages: R, C, C++, JAVA, JCL, python, HTML, CSS, JSP, Java Script

MS: Access, Oracle 12c/11g/10g/9i, and Teradata, bigdata, Hadoop, PostgreSQL.

Statistical Software: SPSS, R, SAS.

ETL/BI Tools: Informatica Power Center 9.x, Tableau, Cognos BI 10, MS Excel, SAS, SAS/Macro, SAS/SQL

Data Modelling: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Web Packages: Google Analytics, Adobe Test & Target, Web Trends

BigData Ecosystem: HDFS, PIG, MapReduce, HIVE, SQOOP, FLUME, HBase, Storm, Kafka, Elastic Search, Redis, Flume, Storm, Kafka, Elastic Search, Redis, Flume, Scoop.

Statistical Methods: Time Series, regression models, splines, confidence intervals, principal component analysis and Dimensionality Reduction, bootstrapping

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Cloud: AWS, S3, EC2.

Big Data / Grid Technologies: Cassandra, Coherence, Mongo DB, Zookeeper, Titan, Elastic search, Storm, Kafka, Hadoop


Confidential, Dublin, CA

Data Scientist/ Machine Learning


  • Extracted data from HDFS and prepared data for exploratory analysis using data munging
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XGBoost, SVM, and Random Forest.
  • Participated in all phases of data mining, data cleaning, data collection, developing models, validation, visualization and performed Gap analysis.
  • A highly immersive Data Science program involving Data Manipulation&Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, MongoDB, Hadoop.
  • Setup storage and data analysis tools in AWS cloud computing infrastructure.
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Used pandas, numpy, seaborn, matplotlib, scikit-learn, scipy, NLTK in Python for developing various machine learning algorithms.
  • Data Manipulation and Aggregation from different source using Nexus, Business Objects, Toad, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Coded proprietary packages to analyze and visualize SPCfile data to identify bad spectra and samples to reduce unnecessary procedures and costs.
  • Programmed a utility in Python that used multiple packages (numpy, scipy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, Naive Bayes, KNN.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.

Environment: : Unix, Python 3.5.2, MLLib, SAS, regression, logistic regression, Hadoop 2.7.4, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce.

Confidential, Pleasanton, CA

Data scientist


  • Designed an Industry standard data Model specific to the company with group insurance offerings, Translated the business requirements into detailed production level using Workflow Diagrams, Sequence Diagrams, Activity Diagrams and Use Case Modeling
  • Involved in design and development of data warehouse environment, liaison to business users and technical teams gathering requirement specification documents and presenting and identifying data sources, targets and report generation.
  • Recommend and evaluate marketing approaches based on quality analytics of customer consuming behavior.
  • Determine customer satisfaction and help enhance customer experience using NLP.
  • Work on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
  • Conceptualized the most-used product module (Research Center) after building a business case for approval, gathering requirements and designing the User Interface
  • A team member of Analytical Group and assisted in designing and development of statistical models for the end clients. Coordinated with end users for designing and implementation of e-commerce analytics solutions as per project proposals.
  • Conducted market research for client; developed and designed sampling methodologies, and analyzed the survey data for pricing and availability of clients' products. Investigated product feasibility by performing analyses that include market sizing, competitive analysis and positioning.
  • Successfully optimized codes in Python to solve a variety of purposes in data mining and machine learning in Python.
  • Facilitated stakeholder meetings and sprint reviews to drive project completion.
  • Successfully managed projects using Agile development methodology
  • Project experience in Data mining, segmentation analysis, business forecasting and association rule mining using Large Data Sets with Machine Learning.
  • Automated Diagnosis of Blood Loss during Accidents and Applied Machine Learning algorithms to diagnose blood loss from vital signs (ECG, HF, GSR, etc.). Demonstrated performances of 94.6% on par with state-of-the-art models used in industry

Environment: R, MATLAB, MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning), Python, Spark (MLlib, PySpark), Tableau, Micro Strategy, SAS, Tensor Flow, regression, logistic regression, Hadoop 2.7, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce

Confidential, Tennessee

Data Analyst


  • Worked with BI team in gathering the report requirements and also Sqoop to export data into HDFS and Hive
  • Involved in the below phases of Analytics using R, Python and Jupyter notebook. Data collection and treatment: Analysed existing internal data and external data, worked on entry errors,classification errors and defined criteria for missing values
  • Data Mining: Used cluster analysis for identifying customer segments, Decision trees used for profitable and non-profitable customers, Market Basket Analysis used for customer purchasing behaviour and part/product association.
  • Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
  • Assisted with data capacity planning and node forecasting.
  • Installed, Configured and managed Flume Infrastructure .
  • Administrator for Pig, Hive and HBase installing updates patches and upgrades.
  • Worked closely with the claims processing team to obtain patterns in filing of fraudulent claims.
  • Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
  • Developed Map Reduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
  • Patterns were observed in fraudulent claims using text mining in R and Hive.
  • Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
  • Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created tables in Hive and loaded the structured (resulted from Map Reduce jobs) data
  • Using HiveQL developed many queries and extracted the required information.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.

Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, R, VMware, Eclipse, Cloudera, Python.


Python Developer


  • Involved in the design, development and testing phases of application using AGILE methodology.
  • Designed and maintained databases using Python and developed Python based API (Restful Web Service) using Flask, SQLAlchemy and PostgreSQL.
  • Designed and developed the UI of the website using HTML, XHTML, AJAX, CSS and JavaScript.
  • Participated in requirement gathering and worked closely with the architect in designing and modeling.
  • Worked on Restful web services which enforced a stateless client server and support JSON few changes from SOAP to RESTFUL Technology Involved in detailed analysis based on the requirement documents.
  • Involved in writing SQL queries implementing functions, triggers, cursors, object types, sequences, indexes etc.
  • Created and managed all of hosted or local repositories through Source Tree's simple interface of GIT client, collaborated with GIT command lines and Stash.
  • Responsible for setting up Python REST API framework and spring frame work using Django
  • Develop consumer based features and applications using Python, Django, HTML, behaviour Driven Development (BDD) and pair based programming.
  • Designed and developed components using Python with Django framework. Implemented code in python to retrieve and manipulate data.
  • Involved in development of the enterprise social network application using Python, Twisted, and Cassandra.
  • Used Python and Django creating graphics, XML processing of documents, data exchange and business logic implementation between servers. worked closely with back-end developer to find ways to push the limits of existing Web technology.
  • Designed and developed the UI for the website with HTML, XHTML, CSS, Java Script and AJAX
  • Used AJAX&JSON communication for accessing Restful web services data payload.
  • Designed dynamic client-side JavaScript codes to build web forms and performed simulations for web application page.
  • Created and implemented SQL Queries, Stored procedures, Functions, Packages and Triggers in SQL Server.
  • Successfully implemented Auto Complete/Auto Suggest functionality using JQuery, Ajax, Web Service and JSON.
  • Identified and added the report parameters and created the reports based on the requirements using SSRS 2008.
  • Tested and managed the SSIS 2005/2008 and SSIS 2007/8 packages and was responsible for its security.

Environment: Python 2.5, Java/J2EE, Django1.0, HTML,CSS Linux, Shell Scripting, Java Script, Ajax, JQuery, JSON, XML, PostgreSQL, Jenkins, ANT, Maven, Subversion, Python


SQL developer


  • Responsible for the s tudy of SAS Code, SQL Queries, Analysis enhancements and documentation of the system.
  • Used R, SAS, and SQL to manipulate data, and develop and validate quantitative models.
  • Brainstorming sessions and propose hypothesis, approaches, and techniques.
  • Analyzed data collected in stores (JCL jobs, stored-procedures, and queries) and provided reports to the Business team by storing the data in excel/SPSS/SAS file.
  • Performed Analysis and Interpretation of the reports on various findings.
  • Responsible for production support Abend Resolution and other production support activities and comparing the seasonal trends based on the data by Excel.
  • Used advanced Microsoft Excel functions such as pivot tables and VLOOKUPin order to analyze the data.
  • Successfully implemented migration of client's requirement application from Test/DSS/Model regions to p roduction.
  • Prepared SQL scripts for ODBC and Teradata servers for analysis and modeling.
  • Provided complete assistance of the trends of the financial time series data.
  • Various statistical tests performed for clear understanding to the client.
  • Implemented procedures for extracting Excel sheet data into the mainframe environment by connecting to the database using SQL.
  • Complete support to all regions (Test/Model/System/Regression/Production).
  • Actively involved in Analysis, Development, and Unit testing of the data.

Environment: R/R Studio, SQL Enterprise Manager, SAS, Microsoft Excel, Microsoft Access, outlook.

Hire Now