- Around 8 years of Experience in the field of Data Engineering, Data Science and Analytics.
- Solid quantitative background, excellent analytical and troubleshooting skills, Very quick learner.
- Broad knowledge and Experience in relevant programming languages (such as R & Python), including an understanding of concepts from functional and/or object - oriented coding paradigms.
- Advanced experience with Python (2.x, 3.x) and its libraries such as NumPy, Pandas, SciPy, SciKit-Learn, Matplotlib, Seaborn, Plotly.
- Actively involved in data pre-processing techniques such as Data Extraction and Data Cleaning with large sets of structured and unstructured data, Exploratory Data Analysis (EDA), Feature Engineering etc.
- Experience in probability theory and statistics, including the distinction between correlation and causality and how to verify the latter using controlled experimental design techniques (e.g. Hypothesis Testing, A/B Testing, ANOVA, Principal Component Analysis (PCA) etc.)
- Proficient in modern applied algorithms (Predictive and Prescriptive) for extracting insights and making decisions based on data. Some of which include Linear and Logistic regression, Naïve Bayes, Decision trees, Neural networks, K-nearest neighbors, Random forest, Support Vector Machine, K means Clustering.
- Expert in creating ETL packages, migrating data, cleaning data and backing up data files, and synchronizing daily transactions using Microsoft SSIS Packages, Alteryx, etc.
- Proficient in building and publishing interactive reports and dashboards with design customizations in Tableau .
- Proficient in Testing and validation using ROC plot, K-fold cross validation, Confusion matrix and statistical significance.
- Proficient in object-oriented programming languages like Java, C#, C.
- Proficient in SQL and the role it plays in data science.
- Good knowledge with Hadoop, Spark, Airflow and Big Data tools such as PySpark, Pig and Hive.
- Good understanding of collaborative software engineering practices (Agile, DevOps), in which solutions evolve through the effort of self-organizing cross-functional teams.
- Good understanding of end-to-end machine learning infrastructure and frameworks like Tensorflow and Porch.
- Experience in using Docker and Kubernetes.
- Experience with front end technologies like HTML (5), CSS.
- Worked with MS Excel using Pivot-tables, Macro’s, V-Lookups, Formulas, and reports.
- Excellent experience in onsite-offshore model interacting with clients and Onsite managers
- Adaptable to new processes and software programs. Ability to work professionally with diverse individuals and groups.
Languages: Python, R, HTML, CSS, SQL.
Python Libraries: NumPy, Pandas, SciPy, SciKit-Learn, Matplotlib, Seaborn, Plotly.
Statistics: Hypothetical Testing, A/B Testing, ANOVA, Principal Component Analysis (PCA), Cross-Validation, Correlation.
Machine Learning: Regression analysis, K-NN, Decision Tree, Support Vector Machine (SVM), Artificial Neural Network (ANN), CNN, RNN, K Means clustering and Hierarchical clustering.
Database: MySQL, SQL Server 2008/2012/2014
Big Data Tools: Hadoop 2.x, Spark 2.x, Map Reduce, Hive, HDFS, Pig
ETL: SSIS, Alteryx
Visualization: Tableau, Seaborn, Matplotlib, ggplot2.
Cloud Services: Azure, AWS
Confidential - Norfolk VA
Data Scientist/Machine Learning
- Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.
- Involved in building various Machine learning algorithm like Linear Regression, Ridge, Lasso, ElasticNet Regression, Decision Tree, Random Forests and more advanced algorithms like ANN, CNN, RNN, Support vector machines (SVM), K-means etc,
- Conducted statistical and descriptive exploratory data analysis to support Machine Learning activities
- Analyzed A/B tests, filtered noisy data and presented insights effectively to stakeholders in Tableau.
- Automated most of the daily task using python scripting.
- Performed Statistical Analysis including A/B testing to validate the results.
- Developed a self-learning Time Series Forecasting model for the collections team to forecast net losses for Car portfolio, which increased the forecast accuracy by 20%.
- Created a Staffing Model for the collections team to calculate optimal number of agents required in order to make collection calls to delinquent customers.
- Conducted Multiple Linear Regression to find impact of new benchmarks and their interactive terms and used Lassocv for selecting the most significant features.
- Extracted and integrated customer data from multiple sources using SQL and created a Tableau dashboard for the Sales and Marketing team.
- Experienced in working with Talend and improved existing ETL processes and optimized the queries to speed up daily Billing and CRM data by ~40%.
- Involved in the CI/CD pipeline management for managing the weekly releases.
- Worked with Scala in supporting utilities for standardizing the data pipelines.
- Removed the missing data while performing data extraction and transformation using Spark and Airflow.
- Worked onPythonOpen stack APIs and usedPythonscripts to update content in the database.
- Experienced in working with machine learning frameworks like TensorFlow and Porch.
- Experienced in using both Docker and Kubernetes and strong knowledge in end-to-end machine learning infrastructure.
- Experienced in working with Java microservices.
- Worked on Azure Databricks for processing, transforming data and to explore data through machine learning models.
- Automate the visualization process to classify high risk driver with the help of Power BI and Alteryx.
- Perform A/B testing to prove driver hypothesis related to high-risk drivers with Python
- Worked on AWS fordatastorage and models' deployment.
Environment: ANN, CNN, RNN, K-means, Regression, Decision Tree, Random Forests, Linear Regression, SQL, AWS, Alteryx, Power BI.
Confidential - PA
Data Scientist/Data Analyst
- Development of dashboards using Tableau to create tailored visualization dashboards and reports using story telling techniques for diverse cross functional teams to answer important business questions.
- Built Predictive Scores, in Python, which identifies potential new customer businesses for a marketing campaign.
- Used ETL to develop jobs for extracting, cleaning, transforming, and loading the data from various sources and worked with Analysis and performance tuning.
- Good Knowledge of building ETL pipelines using cosmos/scope.
- Experienced Python in data wrangling, cleansing, preparation for analysis and creating reports using MSIS.
- Worked on providing analytical solutions that utilizes Machine Learning. The solutions utilize tools such as R Studio, Python, Jupyter notebooks.
- Championed mining of raw data using Python and transforming meta data into understandable metrics which give a pattern into Customer thinking which helps in making predictions based on that data.
- Run SSIS ETL jobs to meet data integrity and reliability requirements and migrate the data to higher environments.
- Write complex SQL queries and optimizing them to pull the required data in an effective way.
- Test the SQL Scripts for report development and/or dashboards
- Build Workflows on Alteryx for data blending, data integration and advanced data analytics.
- Set-up alerts using Alteryx for dashboard to track weekly changes.
- Design visual interface for users to interact with the data.
- Create the design of Dimension and Fact tables in Tableau for Management Dashboard reporting
- Responsible to build the dynamic and robust dashboards using Tableau Objects like dimensions, Measures, Calculated Fields, Action Filters, Blending, Joins.
- Conduct Smoke testing, data completeness testing and Data validation testing.
- Worked on Microsoft Azure for storage and models' deployment.
Environment: Python, ETL, SQL, Machine Learning, R, Alteryx, Microsoft Azure, SSIS, Tableau
Confidential - OH
Data Engineer / Python Developer
- Responsible for gathering requirements, system analysis, design, development, testing and deployment.
- Participated in the complete SDLC process and created business Logic using Python.
- Created database using MySQL, wrote several queries to extract data from database.
- Created database using MySQL, wrote several queries and Rest API's to extract data from database.
- Worked with JSON based REST Web services.
- Used AWS lambda to write code and queries from Python.
- Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon.
- Using Amazon Comprehend to analyze the call transcript, producing insights on keywords, topics, entities, and sentiment.
- Experienced in working with Terradata.
- Data Extraction, aggregations and consolidation of data within AWS Glue using PySpark.
- Optimized Hive SQL quarries and Spark Jobs.
- Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks cluster.
- Worked on development of SQL and designed Cassandra schema for the APIs.
- Compared the performance of Spark with Hive and SQL/Oracle by designing and developing POCs in Spark using Scala.
- Responsible for implementing monitoring solutions in spark, and Jenkins.
- Converted and developed new reports/tools fromSASintoSQL Server 2012to automate production of customized reporting requirements.
- Worked on improvement of existing machine learning algorithms to extract data points accurately.
- Responsible for setting up Python REST API framework using Flask.
- Wrote scripts inPythonfor extracting data from HTML file.
- Effectively communicated with the external vendors to resolve queries.
- Used Git for the version control.
Environment: Python, MySQL, Classification, Regression, REST API, HTML, AWS, PySpark, S3, SQL SERVER, Oracle, Hive, Spark
Confidential - PA
- Identified and defined the datasets for the report generation by writing queries and stored procedures
- Developed drill through, drill down, linked, sub and parameterized reports for better business analysis using SSRS.
- Extracted large volume of data from various data sources using SSIS packages
- Designed different types of reports using SSRS for financial analysis.
- Worked on Statistical Analysis of data for purchasing of materials and equipment.
- Worked at handling the inventory management by maintaining good safety stock levels.
- Created Auto invoice reports (shipment and backlog reports) and Yearly IT budget reports using Pivot Tables and Slicers in MS Excel by connecting to SQL server database.
- Wrote Python modules to extract/load asset data from the MySQL source database.
- Wrote and executed various MYSQL database queries from Python using Python -MySQL connector and MySQL database package.
- Understood the requirements and clarifying the conflicts with the business team.
- Analyzed the existing system, business requirements and functional Specifications.
- Followed coding standards, code versioning and quality process techniques to reduce rework from reviews.
- Tested the application, fixing the defects, and documenting the required information.
- Work on Microsoft Azure fordatastorage and models' deployment.
- Ensured all quality related activities are logged and shared.
- Communicated the work progress to the onshore team.
Environment: Python, MYSQL, SSRS,SSIS Regression, Data Visualization, Microsoft Azure
- Performed analysis on claims for fraud and waste abuse by identifying abnormal patterns and outliers based on historical data.
- Responsible for analyzing health data and producing, verifying, and interpreting client reports with very little oversight.
- Target specific patient populations more effectively and used clustering and Predictive Analytics Models for the same
- Performing the analyses of health care data, including medical and pharmacy claims, membership files and health advisory/coaching interaction, to better understand the quality and level of care delivered and measure the effectiveness of the organization's services as it relates to both clinical quality and financial returns.
- Designed, coded, and implemented Python\R code for analytic studies to fulfill internal and external client requests, maintained, evaluated, and reported on recurring analytic studies. Performed data hygiene and created data validation programs.
- Collaborate with diverse lab groups to provide guidance on experimental design and data interpretation from the computational perspective
- Provided data analysis to measure performance of medication adherence software platform and reported results directly to pharmaceutical clients on a weekly basis; demonstrated using statistical and predictive methods that the platform enhances patient medication by demographic area.
- Completed ad-hoc queries relating to data investigations, follow-ups on analysis results, or other client requests. Documented tasks in a timely and accurately.
- CreatedTableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt chartsfor the patients.
- Extensively createdvisualization dashboardsfor the claims
Environment: R, Python, SQL, Classification, Regression, Data Visualization