We provide IT Staff Augmentation Services!

Machine Learning/data Scientist. Resume

4.50/5 (Submit Your Rating)

San, FranciscO

PROFESSIONAL SUMMARY:

  • Over 8 years of experience in Data Analysis/Business analysis, ETL Development, and Project Management.
  • HavingExperience in all phases of diverse technology projects specializing in Data Science and Machine Learning.
  • Proven expertise in employing techniques for Supervised and Unsupervised (Clustering, Classification, PCA, Decision trees, KNN, SVM) learning, Predictive Analytics, Optimization Methods and Natural Language Processing(NLP), Time Series Analysis.
  • Experienced in Machine Learning Regression Algorithms like Simple, Multiple, Polynomial, SVR(Support Vector Regression), Decision Tree Regression, Random Forest Regression.
  • Experienced in advanced statistical analysis and predictive modeling in the structured and unstructured data environment.
  • Strong expertise in Business and Data Analysis, Data Profiling, Data Migration, Data Conversion, Data Quality, Data Governance, Data Lineage, Data Integration, Master Data Management(MDM), Metadata Management Services, Reference Data Management (RDM).
  • Hands on experience of Data Science libraries in Python such as Pandas, NumPy, SciPy, Scikit - learn, Matplotlib, Seaborn, Beautiful Soup, Orange, Rpy2, LibSVM, neurolab, NLTK.
  • Solid understanding of AWS(Amazon Web Services) S3, EC2, RDS and IAM, Azure ML, Apache Spark, Scala process,and concepts.
  • Good Understanding of working on Artificial Neural Networks and Deep Learning models using Theano and TensorFlow packages using in Python.
  • Experienced in Machine Learning Classification Algorithms like Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree & Random Forest classification.
  • Hands on experience on R packages and libraries like ggplot2, Shiny, h2o, dplyr, reshape2, plotly, RMarkdown, ElmStatLearn, caTools etc.
  • Efficiently accessed data via multiple vectors (e.g. NFS, FTP, SSH, SQL, Sqoop, Flume, Spark).
  • Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in writing/documenting Technical Design Document(TDD), Functional Specification Document(FSD), Test Plans, GAP Analysis and Source to Target mapping documents.
  • Excellent understanding of Hadoop architecture and Map Reduce concepts and HDFS Framework.
  • Strong understanding of project life cycle and SDLC methodologies including RUP, RAD, Waterfall,and Agile.
  • Very good knowledge and understanding of Microsoft SQL Server, Oracle, Teradata, Hadoop/Hive.
  • Strong expertise in ETL, Data warehousing, Operational Data Store (ODS), Data Marts, OLAP and OLTP technologies.
  • Analytical, performance-focused, and detail-oriented professional, offering in-depth knowledge of data analysis and statistics; utilized complex SQL queries for data manipulation.
  • Equipped with experience in utilizing statistical techniques which include Correlation, Hypotheses modeling, Inferential Statistics as well as data mining and modeling techniques using Linear and Logistic regression, clustering, decision trees, and k-mean clustering.
  • Expertise in building Supervised and Unsupervised Machine Learning experiments using Microsoft Azure utilizing multiple algorithms to perform detailed predictive analytics and building Web Services models for all types of data: continuous, nominal, and ordinal.
  • Expertise in using Linear & Logistic Regression and Classification Modeling, Decision-trees, Principal Component Analysis (PCA), Cluster and Segmentation analyses, and have authored and co-authored several scholarly articles applying these techniques.
  • Mitigated risk factors through careful analysis of financial and statistical data. Transformed and processed raw data for further analysis, visualization, and modeling.
  • Proficient in research of current process and emerging technologies which need analytic models, data inputs,and output, analytic metrics and user interface needs.
  • Assist in determining the full domain of the MVP, create and implement its relevant data model for the App and work with App developers integrating the MVP into the App and any backend domains.
  • EnsureREST-based API including all CRUD operations integrate with the App and other service domains.
  • Installing and configuring additional services on appropriate AWS EC2, RDS, S3 and/or other AWS service instances.
  • Integrating these services with each other and ensuring that user access to data, data storage,and communication between various services.
  • Excellent Team player and self-starter possess good communication skills.

TECHNICAL SKILLS:

Languages: HTML5,DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, JSON, Ajax, Java, Scala

NO SQL Databases: Cassandra, HBase, MongoDB, MariaDB

Software/Libraries: Keras, Caffe, TensorFlow, OpenCV, Scikit-learn, Pandas, NumPy, Microsoft Visual Studio, Microsoft Office.

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.

Machine Learning Algorithms: Neural Networks, Decision trees, Support Vector Machines, Random forest, Convolutional Neural Networks, Logistic Regression, PCA, K- means, KNN.

Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco

Machine Learning/Data Scientist.

Responsibilities:

  • Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, Space-Time.
  • Coded R functions to interface with CaffeDeepLearningFramework.
  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various Machinelearning algorithms.
  • Installed and used CaffeDeep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Used pandas, NumPy, Seaborn, matplotlib, Scikit-learn, SciPy, NLTK in Python for developing various machine learning algorithms.
  • Data Manipulation and Aggregation from a different source using Nexus, Business Objects, Toad, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Setup storage and data analysis tools in AmazonWebServicescloud computing infrastructure.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop,and MongoDB.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL,andMLLib libraries.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on DataModeling tools ErwinDataModeler to design the DataModels.
  • Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Data Manipulation and Aggregation froma different source using Nexus, Toad, Business Objects, PowerBL,andSmartView.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, SecondaryNameNode, and MapReduce concepts.
  • As Architect delivered various complex OLAPDatabases/Cubes, Scorecards, Dashboards,and Reports.
  • Programmed by a utility in Python that used multiple packages (SciPy, NumPy, Pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snowflake Schemas.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL Plus,and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object-oriented Design) using UML and Visio.
  • Interaction with BusinessAnalyst, SMEs,and other DataArchitects to understand Business needs and functionality for various project solutions
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and BusinessObjects.

Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Confidential, Dallas, Texas

Machine Learning/Data Scientist.

Responsibilities:

  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for Modeling.
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
  • Designed the prototype of the Data mart and documented possible outcome from it for end-user.
  • Involved in business process Modeling using UML.
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Worked on Spark tool collaborating with ML libraries in eliminating a shotgun approach to understand customer buying patterns.
  • Responsible for handling Hive queries using Spark SQL that integrates with Spark environment.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS,andPL/SQL.
  • Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
  • Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server.
  • Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Wrote simple and advanced SQL queries and scripts to create standard and Adhoc reports for senior managers.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • Participated in Business meetings to understand the business needs & requirements.
  • Prepare ETLarchitect& design document which covers ETLarchitect, SSISdesign, extraction, transformation,and loading of Duck Creek data into the dimensional model.

Environment: Python, MDM, MLLib, PL/SQL, Tableau, SQL Server, MLLib, Scala NLP, SSMS, ERP, CRM, Netezza, Cassandra, SQL, PL/SQL, SSRS, Informatica, Spark, Azure, R Studio, MongoDB, JAVA, HIVE.

Confidential - NC.

Data Analyst.

Responsibilities:

  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • ResponsiblefordesignanddevelopmentofadvancedR/Pythonprogramstoprepare totransformand harmonize data sets in preparation for modeling.
  • Developed large data sets from structured and unstructured data. Perform data mining.
  • Partnered with modelers to develop data frame requirements for projects.
  • Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
  • Tracked various campaigns, generating customer profiling analysis and data manipulation.
  • ProvidedR/SQLprogramming, withdetaileddirection, intheexecutionofdata analysisthatcontributedto the final project deliverables. Responsible for data mining.
  • Analyzed large datasets to answer business questions by generating reports and outcome- driven marketing strategies.
  • UsedPython2.7toapplytimeseriesmodels, the fast growth opportunities for our clients
  • Analyzed the traffic queries of Baidu search engine using classification algorithm.
  • Assisted to improve the liquidity of our ads model.
  • Basedonthedataofclientsandtraffic, the strengths and weaknesses of products.
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Guaranteeing quality in deliverables.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support.
  • Involved in loading data from RDBMS and weblogs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • ExtracteddatafromTwitterusingJavaandTwitterAPI.ParsedJSONformattedtwitterdataanduploaded to the database the existing system.

Environment:: R3.0, Erwin9.5, Tableau8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata14.1, JSON, MapReduce, MySQL, Spark, R Studio, MAHOUT.

Confidential - Fremont, CA.

Data Analyst.

Responsibilities:

  • Work with users to identify the most appropriate source of record required to define the asset data for financing
  • Performed data profiling in Target DWH
  • Experience in using OLAP function like Count, SUM,and CSUM
  • Performed Data analysis and Data profiling using complex SQL on various sources systems including Oracle and Teradata.
  • Hands on Experience on Sqoop.
  • Developed normalized Logical and Physical database models for designing an OLTP application.
  • Developed new scripts for gathering network and storage inventory data and make Splunk ingest data.
  • Imported the customer data into Python using Pandas libraries and performed various data analysis - found patterns in data which helped in key decisions for the company
  • Created tables in Hive and loaded the structured (resulted from Map Reduce jobs) data
  • Using HiveQL developed many queries and extracted the required information.
  • Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
  • Design and deploy rich Graphic visualizations with Drill Down and Drop-down menu option and Parameterized using Tableau.
  • Extracted data from the database using SAS/Access, SAS SQL procedures and create SAS data sets.
  • Created Teradata SQL scripts using OLAP functions like RANK () to improve the query performance while pulling the data from large tables.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design, etc.
  • Performed Data analysis using Python Pandas.
  • Good experience in Agile Methodologies, Scrum stories, and sprints experience in a Python-based environment, along with data analytics and Excel data extracts.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Involved in defining the source to target data mappings, business rules, business and data definitions
  • Responsible for defining the key identifiers for each mapping/interface
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Hands on Experience on Pivot tables, Graphs in MS Excel
  • Using advanced Excel features like Pivot tables and Charts for generating Graphs.
  • Designed and developed weekly, monthly reports by using MS Excel Techniques (Charts, Graphs, Pivot tables) and Powerpoint presentations.
  • Strong Excel skills, including pivots, VLOOKUP, conditional formatting, large record sets. Including data manipulation and cleaning.

Environment:: Teradata, SAS/Access, SAS SQL, MS Excel, Python Pandas, RDBMS, HiveQL.

Confidential

Data Analyst.

Responsibilities:

  • Used SAS Proc SQL pass-throughfacility to connect to Oracle tables and created SAS datasets using various SQL joins such as left join, right join, inner join and full join.
  • Performing data validation, transforming data from RDBMS oracle to SAS datasets.
  • Produce quality customized reports by using PROC TABULATE, PROC REPORT Styles, and ODS RTF and provide descriptive statistics using PROC MEANS, PROC FREQ , and PROC UNIVARIATE.
  • Developed SAS macros for data cleaning, reporting and to support routing processing.
  • Performed advanced querying using SAS Enterprise Guide, calculating computed columns, using afilter, manipulate and prepare data for Reporting, Graphing, and Summarization, statistical analysis, finally generating SAS datasets.
  • Involved in Developing, Debugging, and validating the project-specific SAS programs to generate derived SAS datasets , summary tables, and data listings according to study documents.
  • Created datasets as per the approved specification collaborated with project teams to complete scientific reports and review reports to ensure accuracy and clarity.
  • Experienced in working with data modelers to translate business rules/requirements into conceptual/logical dimensional models and worked with complex de-normalized and normalized data models
  • Performed different calculations like Quick table calculations, Date Calculations, Aggregate Calculations, String and Number Calculations.
  • Good expertise in building dashboards and stories based on the available data points.
  • Created action filters, user filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Expertise in Agile Scrum Methodology to implement project life cycles of reports design and development
  • Combined Tableau visualizations into Interactive Dashboards using filter actions, highlight actions etc. and published them to the web.
  • Gathering business requirements, creating business requirement documents ( BRD /FRD ).
  • Work closely with business leaders and users to define and design the data sources requirements and data access Code, test, identify, implement and document technical solutions utilizing JavaScript, PHP & MySQL.
  • Created Rich dashboards using Tableau Dashboard and prepared user stories to create compelling dashboards to deliver actionable insights
  • Working with themanager to prioritize requirements and preparing reports on theweekly and monthly basis.

Environment: SQL Server, Oracle 11g/10g, MS Office Suite, PowerPivot, Power Point, SAS Base, SAS Enterprise Guide, SAS/MACRO, SAS/SQL, SAS/ODS, SQL, PL/SQL, Visio.

Confidential

Data Analyst.

Responsibilities:

  • Processed data received from vendors and loading them into the database. The process was carried out on weekly basis and reports were delivered on a bi-weekly basis. The extracted data had to be checked for integrity.
  • Documented requirements and obtained signoffs.
  • Coordinated between the Business users and development team in resolving issues.
  • Documented data cleansing and data profiling.
  • Wrote SQL scripts to meet the business requirement.
  • Analyzed views and produced reports.
  • Tested cleansed data for integrity and uniqueness.
  • Automated the existing system to achieve faster and accurate data loading.
  • Generated weekly, bi-weekly reports to be sent to client business team using business objects and documented them too.
  • Learned to create Business Process Models.
  • Ability to manage multiple projects simultaneously tracking them towards varying timelines effectively through a combination of business and technical skills.
  • Good Understanding of clinical practice management, medical and laboratory billing and insurance claim with processing with process flow diagrams.
  • Assisted QA team in creating test scenarios that covera day in a life of the patient for Inpatient and Ambulatory workflows.

Environment:: SQL, data profiling, data loading,QA team.

We'd love your feedback!