We provide IT Staff Augmentation Services!

Data Scientist/ Machine Learning Resume

3.00/5 (Submit Your Rating)

Dublin, CA

SUMMARY

  • Around 6 years of hands on experience and comprehensive industry knowledge of Machine Learning, Statistical Modeling,DataAnalytics,Data Modeling, Data Architecture, Data Analysis, DataMining, Text Mining & Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
  • Strong knowledge in all phases of teh SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance.
  • Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema and Extended Star.
  • Expertise in applyingdatamining techniques and optimization techniques in B2B and B2C industries.
  • Expertise in writing functional specifications, translating business requirements to technical specifications, created/maintained/modified database design document with detailed description of logical entities and physical tables.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of BigDataEco - system.
  • Expertise inDataAnalysis,DataMigration,Data Profiling, DataCleansing, Transformation, Integration, DataImport, andDataExport through teh use of multiple ETL tools such as Informatica Power Center.
  • Proficient in Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.
  • Expertise in data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, ANOVA and other advanced statistical techniques.
  • Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool,
  • Experienced in building data models using machine learning techniques for Classification, Regression, Clustering and Associative mining.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL, SparkSQL, PySpark.
  • Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards.
  • Excellent Tableau Developer, expertise in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau(9.x/10.x).
  • Experienced in Agile methodology and SCRUM process.
  • Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.

TECHNICAL SKILLS

Programming & Scripting Languages: R, C, C++, JAVA, JCL, python, HTML, CSS, JSP, Java Script

Databases: MS-Access, Oracle 12c/11g/10g/9i, and Teradata, bigdata, Hadoop, PostgreSQL.

Statistical Software: SPSS, R, SAS.

ETL/BI Tools: Informatica Power Center 9.x, Tableau, Cognos BI 10, MS Excel, SAS, SAS/Macro, SAS/SQL

Data Modelling: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Web Packages: Google Analytics, Adobe Test & Target, Web Trends

BigData Ecosystem: HDFS, PIG, MapReduce, HIVE, SQOOP, FLUME, HBase, Storm, Kafka, Elastic Search, Redis, Flume, Storm, Kafka, Elastic Search, Redis, Flume, Scoop.

Statistical Methods: Time Series, regression models, splines, confidence intervals, TEMPprincipal component analysis and Dimensionality Reduction, bootstrapping

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Cloud: AWS, S3, EC2.

Big Data / Grid Technologies: Cassandra, Coherence, Mongo DB, Zookeeper, Titan, Elastic search, Storm, Kafka, Hadoop

PROFESSIONAL EXPERIENCE

Confidential, Dublin, CA

Data Scientist/ Machine Learning

Responsibilities:

  • Extracted data from HDFS and prepared data for exploratory analysis using data munging
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XGBoost, SVM, and Random Forest.
  • Participated in all phases of data mining, data cleaning, data collection, developing models, validation, visualization and performed Gap analysis.
  • A highly immersive Data Science program involving Data Manipulation&Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, MongoDB, Hadoop.
  • Setup storage and data analysis tools in AWS cloud computing infrastructure.
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand teh movement of data and its storage and ER Studio 9.7
  • Used pandas, numpy, seaborn, matplotlib, scikit-learn, scipy, NLTK in Python for developing various machine learning algorithms.
  • Data Manipulation and Aggregation from different source using Nexus, Business Objects, Toad, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with teh acquisition of Identity Systems.
  • Coded proprietary packages to analyze and visualize SPCfile data to identify bad spectra and samples to reduce unnecessary procedures and costs.
  • Programmed a utility in Python that used multiple packages (numpy, scipy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, Naive Bayes, KNN.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated teh machine learning classifiers using ROC Curves and Lift Charts.

Environment: Unix, Python 3.5.2, MLLib, SAS, regression, logistic regression, Hadoop 2.7.4, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce.

Confidential, Pleasanton, CA

Data scientist

Responsibilities:

  • Designed an Industry standard data Model specific to teh company with group insurance offerings, Translated teh business requirements into detailed production level using Workflow Diagrams, Sequence Diagrams, Activity Diagrams and Use Case Modeling
  • Involved in design and development of data warehouse environment, liaison to business users and technical teams gathering requirement specification documents and presenting and identifying data sources, targets and report generation.
  • Recommend and evaluate marketing approaches based on quality analytics of customer consuming behavior.
  • Determine customer satisfaction and halp enhance customer experience using NLP.
  • Work on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
  • Conceptualized teh most-used product module (Research Center) after building a business case for approval, gathering requirements and designing teh User Interface
  • A team member of Analytical Group and assisted in designing and development of statistical models for teh end clients. Coordinated with end users for designing and implementation of e-commerce analytics solutions as per project proposals.
  • Conducted market research for client; developed and designed sampling methodologies, and analyzed teh survey data for pricing and availability of clients' products. Investigated product feasibility by performing analyses that include market sizing, competitive analysis and positioning.
  • Successfully optimized codes in Python to solve a variety of purposes in data mining and machine learning in Python.
  • Facilitated stakeholder meetings and sprint reviews to drive project completion.
  • Successfully managed projects using Agile development methodology
  • Project experience in Data mining, segmentation analysis, business forecasting and association rule mining using Large Data Sets with Machine Learning.
  • Automated Diagnosis of Blood Loss during Accidents and Applied Machine Learning algorithms to diagnose blood loss from vital signs (ECG, HF, GSR, etc.). Demonstrated performances of 94.6% on par with state-of-teh-art models used in industry

Environment: R, MATLAB, MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning), Python, Spark (MLlib, PySpark), Tableau, Micro Strategy, SAS, Tensor Flow, regression, logistic regression, Hadoop 2.7, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce

Confidential, Tennessee

Data Analyst

Responsibilities:

  • Worked with BI team in gathering teh report requirements and also Sqoop to export data into HDFS and Hive
  • Involved in teh below phases of Analytics using R, Python and Jupyter notebook.
  • Data collection and treatment: Analysed existing internal data and external data, worked on entry errors,classification errors and defined criteria for missing values
  • Data Mining: Used cluster analysis for identifying customer segments, Decision trees used for profitable and non-profitable customers, Market Basket Analysis used for customer purchasing behaviour and part/product association.
  • Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
  • Assisted with data capacity planning and node forecasting.
  • Installed, Configured and managed Flume Infrastructure .
  • Administrator for Pig, Hive and HBase installing updates patches and upgrades.
  • Worked closely with teh claims processing team to obtain patterns in filing of fraudulent claims.
  • Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
  • Developed Map Reduce programs to extract and transform teh data sets and results were exported back to RDBMS using Sqoop.
  • Patterns were observed in fraudulent claims using text mining in R and Hive.
  • Exported teh data required information to RDBMS using Sqoop to make teh data available for teh claims processing team to assist in processing a claim based on teh data.
  • Developed Map Reduce programs to parse teh raw data, populate staging tables and store teh refined data in partitioned tables in teh EDW.
  • Created tables in Hive and loaded teh structured (resulted from Map Reduce jobs) data
  • Using HiveQL developed many queries and extracted teh required information.
  • Created Hive queries that halped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Was responsible for importing teh data (mostly log files) from various sources into HDFS using Flume
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into teh Hadoop Distributed File System and PIG to pre-process teh data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.

Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, R, VMware, Eclipse, Cloudera, Python.

Confidential

Python Developer

Responsibilities:

  • Involved in teh design, development and testing phases of application using AGILE methodology.
  • Designed and maintained databases using Python and developed Python based API (Restful Web Service) using Flask, SQLAlchemy and PostgreSQL.
  • Designed and developed teh UI of teh website using HTML, XHTML, AJAX, CSS and JavaScript.
  • Participated in requirement gathering and worked closely with teh architect in designing and modeling.
  • Worked on Restful web services which enforced a stateless client server and support JSON few changes from SOAP to RESTFUL Technology Involved in detailed analysis based on teh requirement documents.
  • Involved in writing SQL queries implementing functions, triggers, cursors, object types, sequences, indexes etc.
  • Created and managed all of hosted or local repositories through Source Tree's simple interface of GIT client, collaborated with GIT command lines and Stash.
  • Responsible for setting up Python REST API framework and spring frame work using Django
  • Develop consumer based features and applications using Python, Django, HTML, behaviour Driven Development (BDD) and pair based programming.
  • Designed and developed components using Python with Django framework. Implemented code in python to retrieve and manipulate data.
  • Involved in development of teh enterprise social network application using Python, Twisted, and Cassandra.
  • Used Python and Django creating graphics, XML processing of documents, data exchange and business logic implementation between servers. worked closely with back-end developer to find ways to push teh limits of existing Web technology.
  • Designed and developed teh UI for teh website with HTML, XHTML, CSS, Java Script and AJAX
  • Used AJAX&JSON communication for accessing Restful web services data payload.
  • Designed dynamic client-side JavaScript codes to build web forms and performed simulations for web application page.
  • Created and implemented SQL Queries, Stored procedures, Functions, Packages and Triggers in SQL Server.
  • Successfully implemented Auto Complete/Auto Suggest functionality using JQuery, Ajax, Web Service and JSON.
  • Identified and added teh report parameters and created teh reports based on teh requirements using SSRS 2008.
  • Tested and managed teh SSIS 2005/2008 and SSIS 2007/8 packages and was responsible for its security.

Environment: Python 2.5, Java/J2EE, Django1.0, HTML,CSS Linux, Shell Scripting, Java Script, Ajax, JQuery, JSON, XML, PostgreSQL, Jenkins, ANT, Maven, Subversion, Python

Confidential

SQL developer

Responsibilities:

  • Responsible for teh study of SAS Code, SQL Queries, Analysis enhancements and documentation of teh system.
  • Used R, SAS, and SQL to manipulate data, and develop and validate quantitative models.
  • Brainstorming sessions and propose hypothesis, approaches, and techniques.
  • Analyzed data collected in stores (JCL jobs, stored-procedures, and queries) and provided reports to teh Business team by storing teh data in excel/SPSS/SAS file.
  • Performed Analysis and Interpretation of teh reports on various findings.
  • Responsible for production support Abend Resolution and other production support activities and comparing teh seasonal trends based on teh data by Excel.
  • Used advanced Microsoft Excel functions such as pivot tables and VLOOKUPin order to analyze teh data.
  • Successfully implemented migration of client's requirement application from Test/DSS/Model regions to production.
  • Prepared SQL scripts for ODBC and Teradata servers for analysis and modeling.
  • Provided complete assistance of teh trends of teh financial time series data.
  • Various statistical tests performed for clear understanding to teh client.
  • Implemented procedures for extracting Excel sheet data into teh mainframe environment by connecting to teh database using SQL.
  • Complete support to all regions (Test/Model/System/Regression/Production).
  • Actively involved in Analysis, Development, and Unit testing of teh data.

Environment: R/R Studio, SQL Enterprise Manager, SAS, Microsoft Excel, Microsoft Access, outlook.

We'd love your feedback!