We provide IT Staff Augmentation Services!

Data Scientist/nlp Engineer Resume

San Jose, CA

SUMMARY:

  • 8+ years of experience in IT and comprehensive industry knowledge on Machine Learning, Artificial Intelligence, Statistical Modeling, Data Analysis, Predictive Analysis, Data Manipulation, Data Mining, Data Visualization and Business Intelligence.
  • Proficient at building robust Machine Learning, Deep Learning models, Convolution Neural Networks (CNN), Recurrent Neural Networks (RNN), LSTM using Tensor Flow and Keras. Adept in analyzing large datasets using Apache Spark, PySpark, Spark ML and Amazon Web Services (AWS).
  • Experience in performing Feature Selection, Linear Regression, Logistic Regression, k - Means Clustering, Classification, Decision Tree, Supporting Vector Machines (SVM), Naive Bayes, K-Nearest Neighbors (KNN), Random Forest, and Gradient Descent, Neural Network algorithms to train and test the huge data sets.
  • Adept in statistical programming languages like Python, R and SAS including Big Data technologies like Hadoop, Hive, HDFS, MapReduce and NoSQL Based Databases.
  • Expertized in Python data extraction and data manipulation, and widely used python libraries like NumPy, Pandas, and Matplotlib for data analysis.
  • Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization and Proficient in HiveQL, SparkSQL, PySpark. In depth knowledge in using of spark machine learning library MLlib.
  • Hands on experience and in provisioning virtual clusters under Amazon Web Service (AWS) cloud which includes services like Elastic compute cloud (EC2), S3, and EMR.
  • Proficient in designing and creating various Data Visualization Dashboards, worksheets and analytical reports to help users to identify critical KPIs and facilitate strategic planning in the organization utilizing Tableau Visualizations according to the end user requirements.
  • Strong familiarity in working with various statistical concepts such as Hypothesis Testing, t-Test, and Chi - Square Test, ANOVA, Statistical Process Control, Control Charts, Descriptive Statistics and Correlation Techniques.
  • Extensively worked on other machine learning libraries such as Seaborn, SciKit learn, SciPy for machine learning and familiar working with TensorFlow, NLTK for deep learning.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Exposed to the manipulating large data sets, by using R Packages like tidyr, tidyverse, dplyr reshape, lubridate, Caret and data visualization using ggplot2 packages.
  • Experience in working with BigData technologies such as Hadoop, MapReduce jobs, HDFS, Apache Spark, Hive, Pig, Sqoop, Flume, Kafka and familiar with Scala Programming.
  • Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, Fast Load, Multi Load, and Fast Export.
  • Knowledge and experience working in Waterfall as well as Agile environments including the Scrum process and using Project Management tools like ProjectLibre, Jira/Confluence and version control tools such as GitHub/Git.
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSAS, SSIS and SSRS.
  • Quick learner having strong business domain knowledge and can communication the business data insights easily with technical and nontechnical clients.

TECHNICAL SKILLS:

Languages: Python, R, T-SQL,PL/SQL

Packages/libraries: Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, BeautifulSoup, MLlib, ggplot2, RWeka, gmodels, NLP, Reshape2, plyr.

Machine Learning: Linear Regression, Logistic Regression, Decision trees, Random forest, Association Rule Mining (Market Basket Analysis), Clustering (K-Means, Hierarchal), Gradient decent, SVM (Support Vector Machines), Deep Learning (CNN, RNN, ANN) using TensorFlow (Keras).

Statistical Tools: Time Series, Regression models, splines, confidence intervals, principal component analysis, Dimensionality Reduction, bootstrapping

Big Data: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, Flume, Oozie, Spark

BI Tools: Tableau, Amazon Redshift, Birst

Databases: MySQL, SQL Server, Oracle, Hadoop/Hbase, Cassandra, DynamoDB, Azure Table Storage

PROFESSIONAL EXPERIENCE

Confidential, San Jose, CA

Data Scientist/NLP Engineer

Responsibilities:

  • Responsible in developing system models, prediction algorithms, solutions to prescriptive analytics problems, data mining techniques, and/ or econometric model.
  • Communicate the results with operations team for taking best decisions and Collect data needs and requirements by Interacting with the other departments.
  • Demonstrated and build statistical / machine learning systems to solve large-scale customer-focused problems and leveraging statistical methods and applying them to real-world business problems
  • Perform Data Profiling to learn about behavior with various features of turnover before the hiring decision, when one has no on-the-job behavioral data.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Application of various machine learning algorithms and statistical Modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python
  • Performed data cleaning and feature selection using MLLib package in PySpark and working with deep learning frameworks such as Caffe, Neon etc.
  • Conducted a hybrid of Hierarchical and K-means Cluster Analysis using IBM SPSS and identified meaningful segments of through a discovery approach.
  • Built Artificial Neural Network using Tensor Flow in Python to identify the customer's probability of canceling the connections. (Churn rate prediction)
  • Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
  • Developed NLP models for Topic Extraction, Sentiment Analysis
  • Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
  • Work with NLTK library to NLP data processing and finding the patterns.
  • Categorize comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
  • Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Create and design reports that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
  • Use MLLib, Spark's Machine learning library to build and evaluate different models.
  • Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Create Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Create various types of data visualizations using Python and Tableau

Environment: Python 2.x, R, CDH5, HDFS, Hive, Linux, Spark, IBM SPSS, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Confidential, Sterling, VA

Data Scientist/ Neural Network Engineer

Responsibilities:

  • Responsible for developing and deploying risk-based decision tools and building knowledge based systems to solve large scale computational problems
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
  • Performed data imputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn pre-processing.
  • Develop necessary connectors to plug ML software into wider data pipeline architectures.
  • Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
  • Regression (linear, multivariate) analysis using R language and plotting graphs of regression results using Shiny R framework.
  • Used Python 2.x/3.X (NumPy, SciPy, Pandas, Scikit-learn, Seaborn to develop a variety of models and algorithms for analytic purposes.
  • Experimented and built predictive models including ensemble methods such as Gradient boosting trees and Neural Network by Keras to predict Sales amount.
  • Conducted analysis and patterns on customers' shopping habits in a different location, different categories and different months by using time series Modeling techniques.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
  • Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.

Environment: Python 2.x/3.x, (Scikit-Learn/SciPy/NumPy/Pandas/Matplotlib/Seaborn), R, Tableau, Machine Learning algorithms (Random Forest, Gradient Boosting tree, Neural network by Keras), GitHub.

Confidential

Machine Learning Engineer

Responsibilities:

  • Design and develop state-of-the-art deep-learning / machine-learning algorithms for analyzing the image and video data among others.
  • Experience with Tensor Flow, Theano, Keras and other DeepLearningFrameworks.
  • Extensively used open source tools - R Studio(R) and Spyder(Python) for statistical analysis and building machine learning algorithms. .
  • Develop project requirements and deliverable timelines; execute efficiently to meet the plan timelines.
  • Analyzing large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, Caffe, Tensor Flow, MLLib, Python, a broad variety of machinelearning methods including classifications, regressions, dimensionally reduction etc.
  • Involved with DataAnalysis Primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and DataFormats.
  • Well experienced in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Understanding requirements, the significance of weld point data, energy efficiency using large datasets
  • Develop necessary connectors to plug ML software into wider data pipeline architectures.
  • Creating and supporting a data management workflow from data collection, storage, and analysis to training and validation.

Environment: R 9.0, R Studio, Machine learning, Informatic a 9.0, Scala, Spark, Cassandra, ML, DL, Scikit-learn, Shogun, Data Warehouse, MLLib, Cloud era Oryx, Apache.

Confidential, Chicago, IL

Data Engineer

Responsibilities:

  • Identified problems with customer data and developed cost effective models by the root cause analysis.
  • Analyzed historical sales data and used statistical techniques to give optimal pricing structure and recommendations
  • Developed and test hypotheses in support of research and product offerings, and communicate findings in a clear, precise, and actionable manner to our stakeholders.
  • Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoop ecosystems such as PIG, HIVE, and HBase.
  • Worked with various data formats such as JSON, XML, performed machine learning techniques using python and R.
  • Involved in collecting and analyzing the internal and external data, data entry error correction, and defined criteria for missing values.
  • Developed Map Reduce jobs written in java, using Hive for data cleaning and preprocessing.
  • Exported the data required information to RDBMS using Hadoop Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.

Environment: Python 2.x, Ski-Kit, R- Studio, ggplot2,XML, HDFS, Hadoop 2.3, Hive, Impala, Linux, Spark, SQL Server 2014, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Confidential

Data Analyst

Responsibilities:

  • Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Explored and analyzed the customer specific features by using dashboards in Tableau .
  • Worked on InformaticaPowerCenter tool - Source Analyzer, Data Warehousing designer, Mapping &Mapp let Designer and Transformation Designer and Developed Informatica mappings and tuning of mappings for better performance.
  • Extracted data from different flat files, MS Excel, MS Access and transformed the data based on user requirement using Informatica PowerCenter and loaded data into the target, by scheduling the sessions.
  • Used the dynamic SQL to perform some pre-and post-session task required while performing Extraction, transformation and loading.
  • Wrangled data, worked on large datasets (acquired data and cleaned the data), analyzed trends by making visualizations using python
  • Used Python to develop a variety of models and algorithms for analytic purposes.
  • Conducted analysis and patterns on customers' shopping habits in a different location, different categories and different months by using time series Modeling techniques.
  • Used RMSE/MSE to evaluate different models' performance.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib .

Environment: Python, ETL, SQL, Tableau, Matplotlib

Hire Now