Sr. Data Scientist/machine Learning Engineer Resume
San Francisco, CA
SUMMARY:
- Over 8+ years of IT industry experience encompassing in Machine Learning, Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.
- Extensive experience in Text Analytics, developing different statistical machine learning, Data mining solutions to various business problems and generating data visualizations using R, Python, and Tableau .
- Experience with Machine learning techniques and algorithms (such as k - NN, Naive Bayes, etc.)
- Experience object-oriented programming ( OOP ) concepts using Python, C++ and PHP .
- Have knowledge on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE .
- Integration Architect & Data Scientist experience in Analytics, Big data, BPM, SOA, ETL and Cloud technologies.
- Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
- Tagging of experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning .
- Proficiency in understanding statistical and other tools/languages - R , Python, C, C++, Java, SQL, UNIX, QlikView data visualization tool and Anaplan forecasting tool.
- Proficient in the Integration of various data sources with multiple relational databases like Oracle/, MS SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and DataMart.
- Exposure to AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM
- Proficient in Statistical Methodologies including Hypothetical Testing, ANOVA, Time Series, Principal Component Analysis, Factor Analysis, Cluster Analysis, Discriminant Analysis .
- Worked with NoSQL Database including HBase, Cassandra and Mongo DB.
- Proficient in Python 2.x/3.x with SciPy Stack packages including NumPy, Pandas, SciPy, Matplotlib and IPython .
- Extensively worked on statistical analysis tools and adept Confidential writing code in Advanced Excel, R, MATLAB, Python .
- Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning .
- Worked with complex applications such as R, Stata, Scala, Perl, Linear, and SPSS to develop a neural network, cluster analysis.
- Experienced the full software lifecycle in SDLC, Agile and Scrum methodologies.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge of Recommender Systems.
- Experienced with machine learning algorithms such as logistic regression, random forest, XP boost, KNN, SVM, neural network, linear regression, lasso regression and k-means.
- Strong experience in Software Development Life Cycle ( SDLC ) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Experience working with data modelling tools like Erwin, Power Designer and ER Studio .
- Experience with data analytics, data reporting, Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Highly skilled in using visualization tools like Tableau, ggplot2, and d3.js for creating dashboards.
- Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata .
- Experienced in working with various Python Integrated Development Environments like Net Beans, PyCharm, PyScripter, Spyder, PyStudio, PyDev and Sublime Text .
- Worked on several python packages like NumPy, matplotlib, Beautiful Soup, Pickle, PySide, SciPy, wx Python, PyTables etc.
- Proficient knowledge of statistics, mathematics, machine learning, recommendation algorithms and analytics with an excellent understanding of business operations and analytics tools for effective analysis of data.
TECHNICAL SKILLS:
Languages: Python, R Machine Learning Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbors (K-NN)OLAP/ BI / ETL Tool
Business Objects, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports, Crystal Enterprise 10(CMC)
Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL
Tools: Erwin R 9.6,, Rational Rose, ER/Studio, MS Visio, SAP Power designer.
Big Data Technologies: Spark peg, Hive, HDFS, MapReduce, Pig, Kafka.
Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA.
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects, Cognos.
Version Control Tools: SVM, GitHub.
Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse
Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.
Tools: and Utilities
SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio.Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA, Spark MLlib.
Machine learning Algorithms: Classification, KNN, Regression, Random Forest, Clustering(K-means), Neural Nets, SVM, Bayesian Algorithm, Social Media Analytics, Sentimental analysis, Market Base Analysis, Bagging, Boosting.
PROFESSIONAL EXPERIENCE:
Confidential, San Francisco, CA
Sr. Data scientist/Machine learning Engineer
Responsibilities:
- Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
- Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Spearheaded chatbot development initiative to improve customer interaction with application.
- Developed the chatbot using api.ai.
- Automated csv to chatbot friendly Json transformation by writing NLP scripts to minimize development time by 20%.
- Conducted studies, rapid plots and using advance data mining and statistical modeling techniques to build a solution that optimize the quality and performance of data.
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
- Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
- Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Worked on customer segmentation using an unsupervised learning technique - clustering.
- Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
- Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
- Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
Environment: Erwin r9.6, Python, SQL, Oracle, Netezza, SQL Server, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala NLP, Spark, Kafka, Mongo DB, logistic regression, Hadoop, PySpark, Teradata, random forest, OLAP, Azure, Maria DB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, Map Reduce, AWS.
Confidential, NJ.
Data Scientist
Responsibilities:
- Gathered, analyzed, documented and translated application requirements into data models, supported standardization of documentation and the adoption of standards and practices related to data and applications.
- Worked with several R packages including knit, dplyr, Spark, R, Causal Infer, Space-Time.
- Coded R functions to interface with Caffe Deep Learning Framework.
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
- Installed and used Caffe NLP Framework.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and Mongo DB.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7.
- Utilized Spark, Scala, Hadoop, HBase, Cassandra, Mongo DB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
- Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
- Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
- Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
- Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
- Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Power BI and Smart View.
- Implemented Agile Methodology for building an internal application.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
- As Architect delivered various complex OLAP Databases/Cubes, Scorecards, Dashboards and Reports.
- Programmed a utility in Python that used multiple packages (SciPy, NumPy, Pandas).
- Implemented Classification using supervised algorithms like Logistic Regression, Decision Trees, KNN, Naive Bayes.
- Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snowflake Schemas.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Created SQL tables with referential integrity and developed queries using SQL, SQLPLUS and PL/SQL.
- Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, and Business Objects.
Environment: R, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and Mongo DB.
Confidential - Fremont, CA
Data scientist/ R Developer
Responsibilities:
- Designed an Industry standard data Model specific to the company with group insurance offerings, Translated the business requirements into detailed production level using Workflow Diagrams, Sequence Diagrams, Activity Diagrams and Use Case Modeling
- Involved in design and development of data warehouse environment, liaison to business users and technical teams gathering requirement specification documents and presenting and identifying data sources, targets and report generation.
- Recommend and evaluate marketing approaches based on quality analytics of customer consuming behavior.
- Determine customer satisfaction and help enhance customer experience using NLP.
- Work on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
- Conceptualized the most-used product module (Research Center) after building a business case for approval, gathering requirements and designing the User Interface
- A team member of Analytical Group and assisted in designing and development of statistical models for the end clients. Coordinated with end users for designing and implementation of e-commerce analytics solutions as per project proposals.
- Conducted market research for client; developed and designed sampling methodologies, and analyzed the survey data for pricing and availability of clients' products. Investigated product feasibility by performing analyses that include market sizing, competitive analysis and positioning.
- Successfully optimized codes in Python to solve a variety of purposes in data mining and machine learning in Python.
- Facilitated stakeholder meetings and sprint reviews to drive project completion.
- Successfully managed projects using Agile development methodology
- Project experience in Data mining, segmentation analysis, business forecasting and association rule mining using Large Data Sets with Machine Learning.
- Automated Diagnosis of Blood Loss during Accidents and Applied Machine Learning algorithms to diagnose blood loss from vital signs (ECG, HF, GSR, etc.) . Demonstrated performances of 94.6% on par with state-of-the-art models used in industry
Environment: R, MATLAB, Mongo DB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning), Python, Spark (MLlib, PySpark), Tableau, Micro Strategy, SAS, Tensor Flow, regression, logistic regression, Hadoop 2.7, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and Map Reduce
Confidential - Fairfield, NJ
Data scientist
Responsibilities:
- Perform Data Profiling to learn about behavior with various features of turnover before the hiring decision, when one has no on-the-job behavioral data.
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
- Application of various machine learning algorithms and statistical Modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python, MATLAB.
- Performed data cleaning and feature selection using MLLib package in PySpark and working with deep learning frameworks such as Caffe, Neon etc.
- Conducted a hybrid of Hierarchical and K-means Cluster Analysis using IBM SPSS and identified meaningful segments of through a discovery approach.
- Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, Map Reduce and others.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLLib, Data Frame, Pair RDD's, Spark YARN.
- Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning.
- Developed NLP models for Topic Extraction, Sentiment Analysis
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
- Evaluate models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like Elastic Search, Kibana etc.
- Work with NLTK library to NLP data processing and finding the patterns.
- Addressed over fitting by implementing the algorithm regularization methods like L2 and L1.
- Use MLLib, Spark's Machine learning library to build and evaluate different models.
- Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Create Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Create various types of data visualizations using Python and Tableau.
- Communicate the results with operations team for taking best decisions.
- Collect data needs and requirements by Interacting with the other departments.
Environment: Python 2.x, R, CDH5, HDFS, Hive, Linux, Spark, IBM SPSS, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark, SAS, R, MLIB, Graphs, Hadoop, Spark, Map Reduce.
Confidential
Data Analyst
Responsibilities:
- Designed, Build the Dimensions, cubes with star schema and Snow Flake Schema using SQL Server Analysis Services (SSAS)
- Participated in a JAD session with business users and sponsors to understand and document the business requirements in alignment with the financial goals of the company.
- Performed data analysis and data profiling using complex SQL on various sources systems including Tera-data, SQL Server.
- Developed the logical data models and physical data models that confine existing condition/potential status data fundamentals and data flows using ER Studio
- Reviewed and implemented the naming standards for the entities, attributes, alternate keys, and primary keys for the logical model.
- Performed second and third normalizations for ER data model of OLTP system
- Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
- Translate business and data requirements into Logical data models in support of Enterprise Data Models, ODS, OLAP, OLTP, Operational Data Structures and Analytical systems.
- Design and model the reporting data warehouse considering current and future reporting requirement
- Involved in the daily maintenance of the database that involved monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.
- Worked with Data Scientist in order to create a Data mart for data science specific functions.
- Determined data rules and conducted Logical and Physical design reviews with business analysts, developers, and DBAs.
- Used External Loaders like Multi-Load, TPump and Fast Load to load data into Oracle and Database analysis, development, testing, implementation, and deployment.
- Reviewed the logical model with application developers, ETL Team, DBAs, and testing team to provide information about the data model and business requirements.
Environment: Erwin r7.0, Informatica 6.2, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro and PL/SQL.
Confidential
SQL developer
Responsibilities:
- Responsible for the study of SAS Code, SQL Queries, Analysis enhancements and documentation of the system.
- Used R, SAS, and SQL to manipulate data, and develop and validate quantitative models.
- Brainstorming sessions and propose hypothesis, approaches, and techniques.
- Analyzed data collected in stores (JCL jobs, stored-procedures, and queries) and provided reports to the Business team by storing the data in excel/SPSS/SAS file.
- Performed Analysis and Interpretation of the reports on various findings.
- Responsible for production support Abend Resolution and other production support activities and comparing the seasonal trends based on the data by Excel.
- Used advanced Microsoft Excel functions such as pivot tables and VLOOKUP in order to analyze the data.
- Successfully implemented migration of client's requirement application from Test/DSS/Model regions to production.
- Prepared SQL scripts for ODBC and Tera-data servers for analysis and modeling.
- Provided complete assistance of the trends of the financial time series data.
- Various statistical tests performed for clear understanding to the client.
- Implemented procedures for extracting Excel sheet data into the mainframe environment by connecting to the database using SQL.
- Complete support to all regions (Test/Model/System/Regression/Production)
- Actively involved in Analysis, Development, and Unit testing of the data.
Environment: R/R Studio, SQL Enterprise Manager, SAS, Microsoft Excel, Microsoft Access, outlook.