- Around 7+ years of IT experience as a Data Scientist, including profound expertise and experience on statistical data analysis such as transforming business requirements into analytical models, designing algorithms, and strategic solutions that scales across massive volumes of data.
- Experience in using various packages in R and python like ggplot2, caret, dplyr, Rweka, rjson, plyr, SciPy, scikit - learn, Beautiful Soup, Rpy2.
- Proficient in Big Data, Hadoop, Hive, MapReduce, Pig and NoSQL databases like MongoDB, HBase, Cassandra
- Skilled in Advanced Regression Modeling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts.
- Professional working experience in Machine Learning algorithms such as, linear regression, logistic regression, Naive Bayes, Decision Trees, Clustering, and Principle Component Analysis
- Experienced with machine learning algorithm such as logistic regression, random forest, KNN, SVM, neural network, linear regression, lasso regression and k-means.
- Experience in using cloud services like Amazon Web Services (AWS) such as EC2, S3 to work with different virtual machines. Adept in writing code in R and T - SQL scripts to manipulate data for data loads and extracts.
- Expert in preprocessing data in Pandas using visualization, data cleaning and engineering methods such as looking for Correlations, Imputations, Scaling and Handling Categories
- Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing
- Strong experience with R Visualization, QlikView and Tableau to use in data analytics and graphic visualization.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Experienced in writing complex SQL Quires like Stored Procedures, triggers, joints, and Sub quires
- Experience of working on Python 3.5/2.7 (Numpy, Pandas, Matplotlib, NLTK and Scikit-learn)
- Excellent Tableau Developer, expertise in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau 10.1/10.3
- Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X and Alteryx. Implemented Bagging and Boosting to enhance the model performance.
- Comprehensive knowledge and experience in normalization/de-normalization, data extraction, data cleansing and data manipulation
- Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008, Oracle, RedShift, Neteza.
- Expert in Informatica PowerCenter 9.x, 8.x (Designer, Workflow Manager, Workflow Monitor), and Power Connect, Power Exchange.
- Developed Data Warehouse/Data Mart systems, using various RDBMS (Oracle, MS-SQL Server, Mainframes, Teradata and DB2)
- Experienced in the Analysis, Design, Development, Testing, and Implementation of Data Warehouse solutions for Financial and Retail Sectors. Experience of working on Agile (SCRUM) and Waterfall Methodology.
- Excellent in analyzing and documenting the business requirements in functional and technical terminology.
Machine Learning: Logistics Regression, Naive Bayes, Decision Tree, Random Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means
Analytic/Predictive modeling Tools: Alteryx, Knime, Statistica. Toad Data Point
Visualization Tools: Tableau, Python - Matplotlib, Seaborn, Google charts, Highcharts, plotly, Qlik
Languages: python, R, java
ETL: Informatica, Oracle data integrator, IBM infosphere datastage, Talend
Tools: TOAD, PL/SQL Developer, SQL*Plus, SSMS
Infrastructure tools: Git, Serena Dimensions, VSS
RDBMS: Oracle11g/10g/9i/8i, MS SQL Server 2008/2005, Netezza, RedShift
Data Modeling Tools: Oracle Designer, Erwin r7.1/7.2, Toad data Modeler
Sr. data Scientist
Confidential, Chevy Chase, MD
- This project was focused on customer segmentation based on machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation. Segmented the customers based on demographics using K-means Clustering.
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn and NLTK in python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naïve Bayes, K-means, KNN and random forest.
- Implemented Hadoop stack and different big data analytic tools, migration from different databases to Hadoop. Explored different regression and ensemble models in machine learning to perform forecasting.
- Develop a pricing model for various product & services bundled offering to optimize and predict the gross margin. Worked with NoSQL Database including HBase, Cassandra, and MongoDB.
- Aggregated machine learning techniques for financial fraud/anomaly detections, like Logistic Regression, Support Vector Machine, Random Forest, XGBoost and Neural Network.
- Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering. Developed Spark Python modules for machine learning & predictive analytics in Hadoop. Built price elasticity model for various product and services bundled offering
- Used Spark streaming to process the data from static data sources like HBase, PostgreSQL, Cassandra, MYSQL and streaming data sources like kafka and flume
- Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions prototyping and experimenting ML/DL algorithms and integrating into production system for different business needs
- Worked on Multiple datasets containing 2billion values which are structured and unstructured data about web applications usage and online customer surveys
- Design built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs. Worked on Amazon Redshift platform
- Created Transformation Pipelines for preprocessing large amount of data with methods such as imputing, scaling, selecting and etc.
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
- Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI.
- Managed and coded application development projects using C++ and Python for clinical trials, market research, and capital markets trading risk management systems.
Data Scientist/Machine learning Engineer
Confidential, Charlotte, NC
- Analyze and Prepare data, identify the patterns on dataset by applying historical models
- Perform data manipulation, data preparation, normalization, and predictive modeling.
- Used Python to build different models (a single layer Neural Network with TensorFlow, SVM.
- Worked with both supervised and unsupervised data algorithms and evaluated the models, tested and validated before selecting the best fit model for predictions.
- Wrote code as needed for development of reports, analytics and data visualizations using SQL, data models, Python, MicroStrategy, Google Analytics and DataStudio and Random Forest) for users rating behavior prediction and implement natural language processing (NLP).
- Writing pig scripts for ETL jobs, to acquire data from multiple sources and convert them into uniform format. Improve efficiency and accuracy by evaluating model in R.
- Involved in development of the enterprise social network application using Python, Twisted, and Cassandra.
- Present the existing model to stockholders, give insights for model by using different visualization methods in Power BI. Used R and Python for programming for improvement of model.
- Performed Exploratory Data Analysis using R. Also involved in generating various graphs and charts for analyzing the data using Python Libraries.
- Developed and Supervised classification models to predict if the users will click on certain ads. Using algorithms such as Stochastic Gradient Descent (SGD), Logistic Regression, Random Forest, SVM and more. Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, Numpy
- Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values. Upgrade the entire models for improvement of the product
- Under supervision of Sr. Data Scientist performed Data Transformation method for Rescaling and Normalizing Variables
- Developed a predictive model and validate Neural Network Classification model for predict the feature label. Presented Dashboards to Higher Management for more Insights using Power BI.
Python Engineer/Data Engineer
Confidential, Minneapolis, MN
- Working as ETL Lead developer for consumer and partner business groups
- Build data integration solution to meet functional and non-functional requirements
- Work with analyst and business users to translate functional specification into technical design
- Participate in requirement definition, solution development and implementation phases of data warehouse and reporting projects.
- Importing and exporting the data into HDFS and Hive using Sqoop. Worked on NOSQL database like MangoDB, Cassandra and Hbase.
- Implement process and logic to extract, transform and distribute data across one or more data stores.
- Carrying out specified data processing and statistical techniques such as sampling techniques, estimation, hypothesis testing, time series, correlation and regression analysis Using R.
- ETL development using Informatica in Microsoft SQL Server environment and all aspects of data warehouse reporting. Used R to generate regression models to provide statistical forecasting
- Design and develop ETL specifications, processes, and documentation to produce required data deliverables (data profiling, source to target maps, ETL flows, scripts, etc.).
- Support of ETL environment including but not limited to automation, job scheduling, monitoring, maintenance, security, and administration.
- Configure and tune ETL mappings to optimize the data warehousing architecture
- Troubleshoot and resolve data, system issues and performance issues.
- Implemented Key Performance Indicator (KPI) Objects, Actions, Hierarchies and Attribute Relationships for added functionality and better performance of SSAS Warehouse.
- Applied various data mining techniques: Linear Regression & Logistic Regression, classification, clustering. Analysis and design for data analysis and data integration of disparate systems
- Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis. Create new data model for subject areas and data mart with ER Studio
- Designed rich data visualizations to model data into human-readable form (map, Tableau, etc.)
- Extensively worked on Mapping Variables, Mapping Parameters, Workflow Variables and Session Parameters for the CDC process during that period.
- Extensively used debugger in identifying bugs in existing mappings by analyzing data flow and evaluating transformations.
- Source System Analysis and provide input to data modeling and developing ETL design document as per business requirements.
- Design, Developing and testing of the various Mappings and Mapplets, worklets and workflows involved in the ETL process. Implemented Web-services for Cost Basis Data Propagation Services.
- Performed performance improvement of the existing Data warehouse applications to increase efficiency of the existing system.
- Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
- Designed and developed ETL packages using SSIS to create Data Warehouses from different tables and file sources like Flat and Excel files.
- Used different methods in SSIS such as derived columns, aggregations, Merge joins, count, conditional split and more to transform the data.