- Experienced data scientist with 6+ years of hands - on experience in Deep Learning, ETL, Machine Learning, and Cloud Computing (AWS, Azure, Google).
- Proficient in Python 3 including Numpy, Pandas, Scikit-learn, Mxnet, Scipy, Tensorflow, Keras, Matplotlib, Plotly, Seaborn, and Pyspark.
- Proficient in Java in deploying MapReduce task on the server.
- Hands on experience on Scala dealing with data pipeline, data streaming.
- Extensive experience in state-of-the-art Deep Learning algorithms in object detection task including SSD, Mask-RCNN, Unet, Segnet, Faster Rcnn.
- Hands on experience in customizing backbone network such as Resnet 18, Resnet 50, Vgg16, Densenet, Googlenet.
- Expertise in AWS resource management such as IAM management and EC2, S3, Redshift configuration.
- Wrote self-customized semi-supervised learning framework for logistic regression. Familiar with Machine Learning algorithm such as Linear/Logistic Regression, Naïve Bayes, Random Forest, Gradient Boosting Machines, SVM, K-means clustering.
- Involved in various research/industry projects throughout the whole life cycle of data science projects.
- Including DataAcquisition/Crawling, Data Cleaning, Data Manipulation, Feature Engineering, Feature Selection, Data Visualization, Predictive modeling, Model Optimization, Testing, and Deployment.
- Strong Statistical methodologies such as Time Series Analysis, Hypothesis testing (A/B testing), Principal Component Analysis (PCA), Singular Value Decomposition (SVD), ANOVA.
- Hands on experience on sampling and simulation models such as Gibbs Sampling, Metropolis Hasting Sampling and Monte carlo simulation, Proficiency with various data visualization tools like Tableau, Matplotlit/Seaborn in Python to create interactive, dynamic reports, and dashboards.
- Hands-on experience with Hadoop ecosystem & Apache Spark Frameworks such as MapReduce, HDFS, HiveQL, Pig, SparkSQL, and PysparkML.
- Adept in developing and debugging Stored Procedures, User-defined Function (UDFs), Triggers, Indexes, Constraints, Transactions, and Queries using Transact-SQL (T-SQL)
Good practice in the management of Systems Development Life Cycle (SDLC) such as Agile, Waterfall, and SCRUM.
Languages: Python, R Machine Learning Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbors (K-NN)
OLAP/ BI / ETL Tool: Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)
Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL Tools Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer. Big Data Technologies spark peg, Hive, HDFS, Map Reduce, Pig, Kafka.
Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra, SAP HANA.
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.
Version Control Tools: SVM, GitHub.
Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse.
Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.
- Analyzed data using SQL, R, Python, Apache Spark and presented analytical reports to management and technical teams.
- Worked with different datasets which includes both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation and Visualization.
- Lead discussions with users to gather business processes requirements and data requirements to develop a variety of conceptual, logical and Physical Data models.
- Expertise in Business intelligence and Data Visualization tools like Tableau.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce and loaded data into HDFS.
- Validating the Unsupervised data by using NLP, and NLP techniques such as LSTM, GRU, and other variant analysis.
- Designed and implemented a recommendation system which leverage Statistical Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend policies for different customers.
- Created Data Quality Scripts using SQL and Hive (HQL) to validate successful data load and quality of the data.
- Participated in features engineering such as feature generating, PCA, Feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python.
- Utilized Informatica toolset (Informatica Data Explorer and Data Quality) to inspect legacy data for data profiling.
Environment: SQL Server, Hive, Hadoop Cluster, ETL, Tableau, Teradata, Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering / Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), GitHub, MS Office suite, Agile Scrum, JIRA.
Confidential, San Antonio, Texas
- Design and develop state-of-the-art deep-learning / machine-learning algorithms for analyzing image and video dataamong others.
- Develop and implement innovative AI and machine learning tools that will be used in the Risk
- Experience with Tensor Flow, Cafe and other Deep Learning frameworks.
- Effective software development processes to customize and extend the computer vision and image processing techniques to solve new problems for Automation Anywhere.
- Involved in Peer Reviews, Functional and Requirement Reviews.
- Develop project requirements and deliverable timelines; execute efficiently to meet the plan timelines.
- Involved with Data Analysis Primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions andData Formats.
- Well experienced in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Understanding requirements, significance of weld point data, energy efficiency using large datasets
- Develop necessary connectors to plug ML software into wider data pipeline architectures.
- Wrangled data, worked on large datasets (acquired data and cleaned the data), analysed trends by making visualizations using Matplotlib and python.
- Experience with TensorFlow, Theano, Keras and other Deep Learning Frameworks.
- Built Artificial Neural Network using TensorFlow in Python to identify the customer's probability of cancelling the connections. (Churn rate prediction)
- Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
- Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning Developed NLP models for Topic extraction, Sentiment Analysis .
- Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
- ML performance a deep analysis of the HTPD/RTPD/LTPD test data to define a model of FBC growth rate across the temperature.
- MLmodels for projection pre-production SLC, MLC, and TLC single and multi-die packages ICC memory.
- Used TensorFlow library in dual GPU environment for training and testing of the Neural Networks .
- Develop necessary connectors to plug ML software into wider data pipeline architectures.
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
Environment: R 9.0, R Studio, Machine learning, Informatic a 9.0, Scala, Spark, Cassandra, DL, Scikit-learn, Shogun,Data Warehouse, MLLib, Cloudera Oryx, Apache.
Confidential -Tempe, AZ
Data Scientist .
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
- Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
- Perform a proper EDA, Univariate and bivariate analysis to understand the intrinsic effect/combined effects.
- Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
- Designed and developed Ad-hoc reports as per business analyst, operation analyst, and project management datarequests.
- Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.
- Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
- Applied statistical Modelling like decision trees, regression models, and SVM.
- Utilized Convolution Neural Networks to implement a machine learning image recognition component. Implemented Back propagation in generating accurate predictions
- Performed Information Extraction using NLP algorithms coupled with Deep Learning (ANN and CNN), Keras and TensorFlow.
- Implemented Apache Spark to speedup Convolutional neural networks Modeling. PL/SQL,, SSAS.
Confidential, SAN DIEGO, CA
DATA ANALYST .
- Worked with Data governance, Data quality, data lineage, Data architect to design various models and processes.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using Informatica, Tableau and business objects
- Designed, developed, tested, and maintained Tableau functional reports based on user requirements.
- Mastered the ability to design and deploy rich Graphic visualizations using Tableau and Converted existing Business objects reports into tableau dashboards
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
- Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target database.
- Created mappings using Designer and extracted data from various sources, transformed data according to the requirement.
- Involved in extracting the data from the Flat Files and Relational databases into staging area.
- Developed Informatica Mappings and Reusable Transformations to facilitate timely Loading of Data of a star schema.
- Developed the Informatica Mappings by usage of Aggregator, SQL overrides usage in Lookups, source filter usage in Source qualifiers, and data flow management into multiple targets using Router.
- Created Sessions and extracted data from various sources, transformed data according to the requirement and loading into data warehouse.
- Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Router and Aggregator to create robust mappings in the Informatica Power Center Designer.
- Imported various heterogeneous files using Informatica Power Center 8.x Source Analyzer.
- Developed several reusable transformations that were used in other mappings.
- Prepared Technical Design documents and Test cases
- Involved in Unit Testing and Resolution of various Bottlenecks came across.
Environment: SAS/Base, SAS/Connect, SAS/UNIX, SAS/ODS, SAS/Macros, SQL, Tableau, MS Excel, Power Point, Mainframe, DB2, Teradata, SAS Enterprise guide.
Data Analyst .
- Communicated effectively in both a verbal and written manner to client team.
- Completed documentation on all assigned systems and databases, including business rules, logic, and processes.
- Created Test data and Test Cases documentation for regression and performance.
- Designed, built, and implemented relational databases.
- Determined changes in physical database by studying project requirements.
- Developed intermediate business knowledge of the functional area and processed to understand the application of data information to support business function.
- Facilitated gathering moderately complex business requirements by defining the business problem
- Utilized SPSS statistical software to track and analyze data.
- Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly basis.
- Used advanced Microsoft Excel to create pivot tables, used VLOOKUP and other Excel functions.
- Successfully interpreted data to draw conclusions for managerial action and strategy.
- Created Data chart presentations and coded variables from original data, conducted statistical analysis as and when required and provided summaries of analysis.
- Maintained the data integrity during extraction, manipulation, processing, analysis and storage.