We provide IT Staff Augmentation Services!

Sr. Data Scientist/machine Learning Engineer Resume

Nyc, NY

SUMMARY

  • Having 8+ years of experience in IT and comprehensive industry knowledge on Machine Learning, Artificial Intelligence, Statistical Modeling, Data Analysis, Predictive Analysis, Data Manipulation, Data Mining, Data Visualization and Business Intelligence.
  • Data scientist with 5+ years of experience in transforming business requirements into actionable data models, working in a variety of industries Banking/ Financial, Healthcare, Pharmaceutical & Insurance domains.
  • Hands on experience in supervised and unsupervised Machine Learning Algorithms.
  • Experience in performing Feature Selection, Linear Regression, Logistic Regression, k - Means Clustering, Classification, Decision Tree, Supporting Vector Machines (SVM), Naive Bayes, K-Nearest Neighbors (KNN), Random Forest, and Gradient Descent, Neural Network algorithms to train and test the huge data sets.
  • Adept in statistical programming languages like Python, R and SAS including Big Data technologies like Apache Hadoop, Hive, HDFS, MapReduce and NoSQL Based Databases.
  • Expertized in Python data extraction and data manipulation, and widely used python Data Analytical libraries like NumPy, Pandas, and Matplotlib for data analysis.
  • Skilled in using Apache Hadoop 2.x (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization and Proficient in HiveQL, SparkSQL, PySpark. In depth knowledge in using of Apache spark machine learning library MLlib.
  • Hands on experience and in provisioning virtual clusters under Amazon Web Service (AWS) cloud which includes services like Elastic compute cloud (EC2), S3, and EMR.
  • Proficient in designing and creating various Data Visualization Dashboards, worksheets and analytical reports to help users to identify critical KPIs and facilitate strategic planning in the organization utilizing Tableau Visualizations according to the end user requirements.
  • Strong familiarity in working with various statistical concepts such as Hypothesis Testing, t-Test, and Chi - Square Test, ANOVA, Statistical Process Control, Control Charts, Descriptive Statistics and Correlation Techniques.
  • Extensively worked on other machine learning libraries such as Seaborn, SciKit learn, SciPy for machine learning and familiar working with TensorFlow, NLTK for deep learning.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Exposed to the manipulating large data sets, by using R Packages like tidyr, tidyverse, dplyr reshape, lubridate, Caret and data visualization using ggplot2 packages.
  • Experience in working with Big Data technologies such as Hadoop, MapReduce jobs, HDFS, Apache Spark, Hive, Pig, Sqoop, Flume, Kafka and familiar with Scala Programming.
  • Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, Fast Load, Multi Load, and Fast Export.
  • Knowledge and experience working in Waterfall as well as Agile environments including the Scrum process and using Project Management tools like ProjectLibre, Jira/Confluence and version control tools such as GitHub/Git .
  • Exposure towards Azure Data Lake and Azure Storage.
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSAS, SSIS and SSRS.
  • Quick learner having strong business domain knowledge and can communication the business data insights easily with technical and nontechnical clients.

TECHNICAL SKILLS

Languages: Python, R, T-SQL,PL/SQL

Packages/libraries: Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, BeautifulSoup, MLlib, ggplot2, Rpy2, caret, dplyr, RWeka, gmodels, NLP, Reshape2, plyr.

Machine Learning: Linear Regression, Logistic Regression, Decision trees, Random forest, Association Rule Mining (Market Basket Analysis), Clustering (K-Means, Hierarchal), Gradient decent, SVM (Support Vector Machines), Deep Learning (CNN, RNN, ANN) using TensorFlow (Keras).

Statistical Tools: Time Series, Regression models, splines, confidence intervals, principal component analysis, Dimensionality Reduction, bootstrapping

Big Data: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, Flume, Oozie, Spark

BI Tools: Tableau, Amazon Redshift, Birst

Data Modeling Tools: Erwin r, Rational Rose, ER/Studio, MS Visio, SAP Power designer

Databases: MySQL, SQL Server, Oracle, Hadoop/Hbase, Cassandra, DynamoDB, Azure Table Storage, Natezza

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, SSRS, IBM Cognos7.0/6.0.

Version Control Tools: SVM, GitHub

Operating Systems: Windows, Linux, Ubuntu

PROFESSIONAL EXPERIENCE

Confidential, NYC, NY

Sr. Data Scientist/Machine Learning Engineer

Responsibilities:

  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various machinelearning algorithms.
  • Utilizing NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.
  • Scaled up to Machine Learning pipelines: 4600 processors, 35000 GB memory achieving 5-minute execution.
  • Developed Python, Pyspark, HIVE scripts to filter/map/aggregate data and used Sqoop to transfer data to and from Hadoop.
  • Developed a Machine Learning test-bed with several different model learning and feature learning algorithms.
  • By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
  • Developed in-disk, huge (100GB+), highly complex Machine Learning models.
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
  • Utilized Spark, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Redesigned Interactive Visualization graphs using BOKEH.
  • Used DataQuality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on DataModeling tools ErwinDataModeler to design the DataModels.
  • Developed various Tableau Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Interacted with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and functionality for various project solutions.

Environment: AWS, R, Python, HDFS, ODS, Oracle 10g, Hive, DB2, Metadata, MS Excel, Map-Reduce, SQL, and MongoDB.

Confidential, Minneapolis, MN

Data Engineer/Machine Learning Engineer

Responsibilities:

  • Developed a Machine Learning test-bed with 24 different model learning and feature learning algorithms.
  • Responsible for working with various teams on a project to develop analytics based solution to target roaming subscribers specifically.
  • Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
  • Performed data imputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn pre-processing.
  • Used Python 2.x/3.X (NumPy, SciPy, Pandas, Scikit-learn, Seaborn to develop a variety of models and algorithms for analytic purposes.
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
  • Coded R functions to interface with Caffe Deep Learning Framework.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various machine learning algorithms.
  • Combination of these elements (travel prediction & multi-dimensional segmentation) would enable operators to conduct highly targeted and personalized roaming services campaigns leading to significant subscriber uptake.
  • Installed and used CaffeDeep Learning Framework.
  • Scaled up to Machine Learning pipelines: 4600 processors, 35000 GB memory achieving 5-minute execution.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX.
  • Develop Python, Pyspark, HIVE scripts to filter/map/aggregate data. Scoop to transfer data to and from Hadoop.
  • Configured the project on WebSphere 6.1 application servers
  • By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
  • Developed in-disk, huge (100GB+), highly complex Machine Learning models.
  • Used SAX and DOM parsers to parse the RAW XML documents
  • Used RAD as Development IDE for web applications.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Redesigned Interactive Visualization graphs in D3.js
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
  • Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snow flake Schemas.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.

Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Hire Now