Sr. Data Scientist/machine Learning Engineer Resume
Minneapolis, MinnesotA
SUMMARY
- Around Six plus years of experience in Computer Vision, MachinelearningandDeeplearningwith experience in implementation of algorithms on R, Python, Spark Scala, SQL and Scikit Learn possessing strong interpersonal, technical and analytical skills along with proven ability to lead projects while meeting challenging deadlines.
- Experience in Business Intelligence/Data Warehousing Design and Architect, Dimension Data Modelling, ETL, OLPA Cube, Reporting and other BItools.
- Developing Logical Data Architecture with adherence to Enterprise Architecture.
- Integration Architect & Data Scientist experience in Analytics, BigData, BPM, SOA, ETL and Cloud technologies.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
- Experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
- Proficiency in understanding statistical and other tools/languages - R, Python, C, C++, Java, SQL, UNIX,Qlikviewdatavisualization tool and Anaplan forecasting tool.
- Multi-layer perceptron, Convolutional Network, Network in Network, Alexnet
- VGGNet, RNN Pixels, Highway Network, Highway Convolutional Network, Residual Network, A bottleneck residual network, Residual Network, ResNeXt, Aggregated residual transformations network (ResNeXt),Google Inception (v3).
- Auto Encoder, Variational Auto Encoder, GAN (Generative Adversarial Networks), DCGAN (Deep Convolutional Generative Adversarial Networks).
- Natural Language Processing
- Recurrent Neural Network (LSTM), Bi-Directional RNN (LSTM), Dynamic RNN (LSTM). City Name Generation. Shakespeare Scripts Generation. Seq2seq-seq2seq recurrent network.
- CNN Seq. Apply a 1-D convolutional network to classify sequence of words from IMDB sentiment dataset.
TECHNICAL SKILLS
Machine Learning: Machine learning Algorithms, Tensorflow, Keras, TFlearn.
Hardware: Arduino, Raspberry pi, NVidia GPU for Deep learning
RLanguageskills: DataPreprocessing, Web Scraping,DataExtraction, Dplyr, GGplot, Apply functions, Statistical Analysis, Predictive Analysis, GGplotly, rvest,DataVisualization.
Frameworks: Shogun, Accord Framework/AForge.net, Scala, Spark, Cassandra, DL4J, ND4J, Scikit-learn
Development Tools: Cassandra,DL4J,ND4J,Scikit-learn,Shogun,AccordFramework/AForge.net,Mahout, MLlib,H2O,ClouderaOryx,GoLearn, Apache Singa.
Version Controller: TFS, Microsoft Visual SourceSafe, GIT, NUNIT, MSUNIT.
Software Packages: MS-Office 2003/ 07/10/13 , MS Access, Messaging Architectures.
Testing Framework: JUnit, Mockito, Cucumber (Integration), JMeter (Performance).
Web Technologies: Windows API, Web Services, Web API (RESTFUL) HTML5, XHTML, CSS3, AJAX, XML, XAML, MSMQ, Silverlight, Kendo UI.
Web Servers: IIS 5.0, IIS 6.0, IIS 7.5, IIS ADMIN.
Operating Systems: Windows(PowerShell), Linux(CLI), Mac, Android
Databases: (RDBMS) - MySQL, MSSQL, SQLite/ Database (NoSQL) - MongoDB, Cassandra, Reactjs, React Native, .NET, Android SDK, NodeJS, Django, RMI, Hadoop, Spark.
PROFESSIONAL EXPERIENCE
Confidential, Minneapolis, Minnesota
Sr. Data Scientist/Machine Learning Engineer
Responsibilities:
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
- Designed the prototype of theDatamart and documented possible outcome from it for end-user.
- Involved in business process modeling using UML
- Developed and maintaineddatadictionary to create metadata reports for technical and business purpose.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Worked on Spark tool collaborating with ML libraries in eliminating a shotgun approach to understand customer buying patterns.
- Responsible for handling Hive queries using Spark SQL that integrates with Spark environment.
- Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
- Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server
- Wrote simple and advanced SQL queries and scripts to create standard and adhoc reports for senior managers.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Participated in Business meetings to understand the business needs & requirements.
- Prepare ETLarchitect& design document which covers ETLarchitect, SSISdesign, extraction, transformation and loading of Duck Creekdatainto dimensional model.
- Provide technical & requirement guidance to the team members for ETL -SSISdesign.
Environment: Python, MDM, MLLib, PL/SQL, Tableau, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, SQL Server, MLLib, Scala NLP, SSMS, ERP, CRM, Netezza, Cassandra, SQL, PL/SQL, SSRS, Informatica, PIG, Spark, Azure, R Studio, MongoDB, JAVA, HIVE.
Confidential, Denver, CO
Data Scientist
Responsibilities:
- Coded R functions to interface with CaffeDeep Learning Framework
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
- Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
- Worked withDatagovernance,Dataquality,datalineage,Dataarchitectto design various models and processes and
- Implemented an image captioner using RNN (with LSTM) in conjunction with a CNN and trained it on the MS-COCO Data-set. Given an input image, it generates a sentence describing the image. The RNN layers were programmed from scratch in Python
- Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
- Develop and designing POC's using Scala, Spark SQL and MLlib libraries.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
Environment: Python, SQL, Oracle 12c, SQL Server, Java, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, NLP, Spark, Kafka, logistic regression, Hadoop, Hive, Random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, MapReduce.
Confidential, Urbana, IL
Deep Learning Software Engineer
Responsibilities:
- Deep Learning Wi-Fi Geolocation (Python, TensorFlow)
- Finds user’s location using deep learning based on signals of nearby wifi routers
- Trains a machine learning model with location and wifi signal data
- ML model will predict locations
- Understood new best-practices for the Deep learning era of how to set up train/dev/test sets and analyze bias/variance.
- Design, and implement algorithms in Deep learning for computer Vision and Image Analytics.
- Implemented deep learning architectures for use in data analysis and classification.
- Developed web & data applications. Managed portfolio of investments.
- Developed real time object detector using Single Shot Detector framework and MobileNets architecture with Caffe model using deep neural network module of OpenCV.
- Modular library for deep neural networks developed a simple modular library in Python for designing deep neural networks (fully-connected, convolutional, and recurrent) of arbitrary length with dropout and batch-normalization layers. Implemented different optimizers like SGD with momentum, RMSprop, AdaGrad and Adam.
Environment: Python, C++, MATLABMachinelearningpackages: Tensorflow, Scikit-learn, Keras Good understanding ofDeep learningarchitectures like Resnet, DenseNets, R-CNN, Fast R-CNN, Word2vec, etc. Familiar with machinelearningalgorithms like SVM, Random Forests, Naïve Bayes, regression, RBM, etc. Passionate about reading research papers and staying up-to-date on the latest trends inDeeplearningSolid mathematical foundation in linear algebra and probability
Confidential, Minnesota
Data Analytics
Responsibilities:
- Implemented public segmentation using unsupervised machine learning algorithms by implementing k-means algorithm using Pyspark.
- Explored and Extracted data from source XML in HDFS, preparing data for exploratory analysis using data munging.
- Used R and python for Exploratory Data Analysis, A/B testing, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
- Created various types of charts like Heat Maps, Geocoding, Symbol Maps, Pie Charts, Bar Charts, Tree Maps, Gantts, Circle Views, Line Charts, Area Charts, Scatter Plots, Bullet Graphs and Histograms in Power BI and Excel to provide better data visualization.
- Used Spark for testdataanalytics using MLLib and Analyzed the performance to identify bottlenecks.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in R.
- Worked on Linux shell scripts for business process and loading data from different interfaces to HDFS.
- Created MDM, OLAP data architecture, analytical data marts, and cubes optimized for reporting.
- Worked with different sources such as Oracle, Teradata, SQL Server2012 and Excel, Flat, Complex Flat File, Cassandra, MongoDB, HBase, and COBOL files.
- Expert in generating on-demand and scheduled reports for business analysis or management decision using Power BI.
- Performed K-means clustering, Multivariate analysis and Support Vector Machines in R.
- Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
- Identified and targeted welfare high-risk groups with Machine learning algorithms.
Environment: Python, MDM, MLLib, PL/SQL, Tableau, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, SQL Server, MLLib, Scala NLP, SSMS, ERP, CRM, Netezza, Cassandra, SQL, PL/SQL, SSRS, Informatica, PIG, Spark, Azure, R Studio, MongoDB, JAVA, HIVE
Confidential
Software Engineer
Responsibilities:
- Developed web site recommendation system using PageRank algorithm using Spark Java APIs.
- Developed distributed client-server applications using sockets and implemented virtual file server (HTTP file server protocol) using Java virtual machines and Java Sockets.
- Employed distributed publish-subscribe applications using the Apache Kafka framework.
- Created distributed applications using the Single Program Multiple Data (SPMD) model and message-passing programs using point-to-point communication primitives in MPI.
- Implemented a parallel file server using both multithreading and sockets.