- 5 years of experience in Data Science
- 5 years of experience in Information Technology
- 4 years of experience in Machine Learning, Artificial Intelligence and Neural Networks
- 6 years of experience writing custom algorithms
- Professional wif Ten plus years of experience in all phases of diverse technology projects specializing in Data Science, Big Data, Azure Machine Learning, Google Cloud and Tableau, using Cloud based infrastructure.
- Worked on analyzing large datasets on distributed databases and developing Machine Learning algorithms to gain operational insights and present them to the leadership.
- Extensively involved in Data preparation, Exploratory analysis, Feature engineering using supervised and unsupervised modeling.
- Experience in building models wif deep learning frameworks like TensorFlow, PyTorch and Keras, as well as deploying these models in a RESTful API.
- Experienced the full software life cycle in SDLC, Agile and Scrum methodologies.
- Proficiency in use of statistical tools and programming languages (R, Python, C, C++, Java, SQL, UNIX).
- Adept in statistical programming languages like R and Python
- Well - versed wif Linear/non-linear, regression and classification modeling predictive algorithms.
- Actively involved in model selection, statistical analysis using SAS and Gretl statistical tool.
- Created dashboards as part of Data Visualization using Tableau.
- Proficiency in using Spark for Bigdata processing in the Hadoop/DataProc/EMR Ecosystem.
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values using Talend tool.
- Performed Dimensionality reduction using TEMPprincipal component analysis, auto encoders, and t-SNE.
- Validate the consolidated data and develop the model that best fits the data. Interpret data from multiple sources, consolidate it, and perform data cleansing using R/Python/Spark.
- Performed multiple Data Mining techniques and derive new insights from the data.
Libraries: nltk, Matplotlib, NumPy, Pandas, Scikit-Learn, Keras, scikit-learn, statsmodels, SciPy, TensorFlow, Keras, PyTorch, CNTK Deeplearning4j, TSA, ggplot2
Data Stores: Large Data Stores, both SQL and noSQL, data warehouse, data lake, Hadoop HDFS
RDBMS: Amazon Redshift, SQL, MySQL, PL/SQL, T-SQL, PostgreSQL
NoSQL: Amazon Web Services (AWS), Cassandra, MongoDB, MariaDB
Data Actions: Data query and Data manipulation, in situ wif Hive, Spark-SQL
Big Data Ecosystems: Hadoop (HBase, Hive, Pig, RHadoop, Spark), Elastic Search, Cloudera Impala, Cloudera/Hortonworks
Cloud Data Systems: AWS (RedShift, Kinesis, EMR), Azure, Google Cloud Platform, IBM
Data Visualization: QlikView, Tableau, PowerBI, R
Software Tools: SAP, SAS, TensorFlow, Keras
Confidential - Atlanta, GA
- Developed Spark/Scala, Python for projects in the Hadoop/Hive environment wif Linux/Windows for big data resources.
- Implemented convolutional and recurrent neural network architecture to analyze spatial-temporal patterns in data.
- Developed autoregressive integrated moving average filters to model relationships in temporal data.
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, R, a broad variety of machine learning methods including classifications, regressions, dimensionality reduction etc.
- Worked on feature engineering, created dummy variables, removed some of the non-significant variables and selected statistically significant variables.
- Implemented feature engineering wif predictors to find most important instances for the models tested
- Used a derivative of a clustering technique KNN Distance to identify outliers and to classify unlabeled data.
- Facilitated the data collection sessions. Analyze and document data processes, scenarios, and information flow.
- Collaborated wif data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
- Used Cloudera Hadoop YARN to perform analytics on data in Hive.
- Implemented big data processing applications to collect, clean and normalize large volumes of open data using Hadoop eco system such as PIG, HIVE, and HBase.
Confidential - Atlanta, GA
- Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression and Linear Regression using Python and determined performance.
- Interrogate analytical results to resolve algorithmic success, robustness and validity
- Use of a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction.
- Developing, deploying, and maintaining production NLP models wif scalability in mind.
- Implemented Agile Methodology for building an internal application
- Use of noledge databases and language ontologies.
- Wrote a Flask app to call CoreNLP for parts-of-speech and named entity recognition on natural English queries.
- Optimized SQL queries to improve performance of data collection. develop an estimate of uncertainties for the semantic predictions made by deep convolutional model.
- Derived high quality information, significant patterns from textual data source. Used Document Term Frequency and TF-IDF (Term Frequency- Inverse Document Frequency) algorithm in order to find information for topic modelling.
- Analyzed large data sets, applied machine learning techniques and developed predictive models, statistical models and developed and enhanced statistical models by leveraging best-in-class modeling techniques.
- Design, develop and produce reports that connect quantitative data to insights that drive and change business
Confidential - Irvine, CA
- Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python and build models using deep learning frameworks
- Performed data mining and developed statistical models using Python to provide tactical recommendations to the business executives.
- Programmed a utility in Python that used multiple packages (scipy, numpy, pandas).
- Integrated R into micro-strategy to expose metrics determined by more sophisticated and detailed models TEMPthan natively available in the tool.
- Existing semantic labeling model was extended to perform Monte Carlo Markov Chaini and provide uncertainties as well as semantic predictions using Bayesian approximation. A new metric was proposed to evaluate the quality of estimated uncertainties. The developed model was shown to outperform baseline model when evaluated on dataset.
- Designed dashboards wif Tableau and provided complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders.
- Worked wif Data Engineers for database design for Data Science.
- Use Git to apply version control. Tracked changes in files and coordinated work on the files among multiple team members.
- Implemented a Python-based distributed random forest.
- Used predictive modeling wif tools in SAS, SPSS, R, Python.
Confidential - Irvine, CA
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionality reduction.
- Worked wif complex applications such as R, SAS, Matlab and SPSS to develop neural network, cluster analysis.
- Implementation of machine learning algorithms and concepts such as: K-means Clustering (varieties), Gaussian distribution, decision tree etc.
- Analyzed data using data visualization tools and reported key features using statistic tools and supervised machine learning techniques to achieve project objectives.
- Analyzed large data sets and applied machine learning techniques and develop predictive models, statistical models.
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Used key indicators in Python and machine learning concepts like regression, Boot strap Aggregation and Random Forest.
- Developed and deployed machine learning as a service on Microsoft Azure cloud service.
- Supervised, Unsupervised, Semi-Supervised classification and clustering of users and products.
- Trained Data wif Different Classification Models such as Decision Trees, SVM and Random forest.
- Built predictive models to forecast risks for product launches and operations and halp predict workflow and capacity requirements