- Over 6 years of experience as a Data Analyst SQL with excellent conceptual and technical skills to create Business Requirements Documents, Functional Specification Documents, Process Flow Diagrams, Detailed Design Documents, System Requirements Specifications, and Functional Requirements Specifications.
- Extensive knowledge of using Statistical Modeling Techniques such as Univariate and Multivariate Regression, Hypothesis Testing (Parametric/Non - parametric), ANOVA, MANOVA, PCA & Cluster analysis.
- Proficient in Software Development Life Cycle (SDLC), Project Management methodologies and Microsoft SQL Server database management.
- Experience in using the machine learning algorithms such as Linear, Logistic Regression, Time Series Analysis, SVM, Decision Tree, Random Forest, KNN, Naive - Bayes & K-means.
- Experience and Knowledge of using analytics tools such as Python, Excel.
- Involved in defining the source to target data mappings, business rules and definitions.
- Responsible for defining the key identifiers for mapping/ interface.
- Experience with data visualization packages such as Matplotlib, seaborn.
- Data Base knowledge on MySQL.
- In-depth knowledge and practical experience of using Waterfall, Agile, Scrum, Rational Unified Process, Business Process Reengineering, Unified Modeling Language, System Requirements Specification and Functional Requirements Specification, Rapid Application Development, and MS Visio.
- Experienced to work as a Data Analyst to perform complex Data Profiling, Data Definition, Data Mining,Data Mapping, Validating, and analyzing data and presenting reports.
- Skilled in Advanced Regression Modelling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linearand Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization.
- Experienced in Salesforce.com with Service cloud technology.
- Skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide Data Summarization.
- Used Tableau dashboards to display the results of the customers in order to communicate with the team as well as individual customers. Which helped the support team for better marketing?
- The results observed from the model are communicated with customer support team to take the decision according to the customer’s history.
- Knowledge on SAP HANA database.
- Commitment, Self-Confidence, Positive Attitude and able to learn New Technologies.
Programming and Querying Languages: Python, SQL (MySQL & SQL Server), Hive, Py-Spark, C, C++, SQL Server 2000/2005/2010/2014
Packages and tools: Pandas, NumPy, SciPy, Scikit-Learn, ggplot2.
Machine learning: Linear Regression, Logistic Regression, Decision trees,Ensembles - Random Forest, Gradient Boosting, Support Vector Machines, Time series forecasting and Dimensionality Reduction
Reporting and Visualization: Matplotlib, Seaborn, ggplot2, Tableau.
Cloud Technologies: Salesforce.com (Service cloud), Azure
Processes/ Methodologies: Waterfall, Agile-Scrum, Waterfall-Scrum, Rational Rose, MS Visio, Mockup Screens
Testing Tools: Load Runner, HP QC v10.0, JIRA, SOAPUI
- Interpreted data and analyzed the results using statistical techniques to provide ongoing reports on the effects of different strategies and observe any trends or issues to raise with management.
- Acquired data from primary or secondary data sources to maintain database systems. They also identify and interpret trends or patterns in complex data sets.
- Filtered and cleaned data by reviewing reports and performance indicators to correct any code problems and check the efficiency of the data collection processes in place.
- Worked with management to prioritize business and information needs and identify new processes that will improve the systems in place and define new opportunities.
- Evaluated complex business requirements and data system requirements.
- Monitored the use of data to ensure careful control and effective use to achieve enterprise objectives.
- Assisted in automating and strengthening data management controls by building robust systems with an eye on the long-term and ongoing maintenance and support of the applications
- Performed routine data investigations to profile and analyze data sets to address data quality and integrity.
- Demonstrated an understanding of the use of data to solve a business use case. Create an access strategy to ensure effective implementation across the enterprise.
- Developed and defined data validation, verification, and reconciliation strategies to support data governance, stewardship, and security.
- Experienced in performing data profiling, data mapping, data validation, data manipulation, data analysis.
- Worked across functions to understand the nuances of our data from across the organization and external sources to understand the data and analytical needs of our stakeholders.
- Designed, written and documented complex programs using BI tools (e.g. SQL) for use in gathering statistical data, ensuring that the most efficient procedures are used to access, investigate and analyze data.
- Developed, analyzed, reported and interpreted complex data for ongoing activities/projects while ensuring that all data are accurate and available to meet established deadlines
- Maintained and coordinated existing cross-functional programs and create and develop new programs.
- Monitored and performed in-depth analysis of data for customers to determine areas of improvement or where investigation is needed
- Exposure to designing and developing reports for data visualization using Python.
- Participated in features engineering such as feature generating, PCA, Feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python.
- Utilized Informatica toolset (Informatica Data Explorer and Data Quality) to inspect legacy data for data profiling.
- Responsible for improving data record capacities and develop improvement plans for the vertical management.
- Perform trend analysis to make recommendation on quality improvement opportunities
- Develop and automate standard reporting, analysis, and tools to support corporate, functional, or regional management teams.
- Carrying out data investigation and analysis by reviewing all steps the data flow from source to report
- Used Z-score as numerical measurement used in statistics of a value's relationship to the mean (average) of a group of values, measured in terms of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score. A Z-score of 1.0 would indicate a value that is one standard deviation from the mean. Z-scores may be positive or negative, with a positive value indicating the score is above the mean and a negative score indicating it is below the mean.
Environment: Python 3.2, MySQL, Oracle SQL developer, Jupyter Notebook, Excel 2010Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering / Hierarchical Clustering/ Ensemble methods), Salesforce.com (Service Cloud),MS Office suite, Agile Scrum, JIRA.
Confidential - Greensboro, NC
- Collecting data from various data sources including oracle database server and customer support department and integrating those into a single data set.
- Responsible for data identification, collection, exploration, cleaning for modeling.
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, to visualization of the data after removing missing and outliers to fit in the model.
- Used Spark 2.0 (PySpark, MiLB) to develop variety of models and algorithms for analytic purposes.
- Performed Data Mapping between the data sources.
- Performed and treated outliers and missing values detected using boxplots and Pandas predefined functions.
- Loaded the data into database tables using SQL*loader from text and excel files.
- Developed data model, SQL Queries, SQL Query tuning process and Schemas.
- Created SQL*plus reports as per client's various needs and developed business objects.
- Created reports using Business Objects functionality like slice and dice, drill down, cross tab, master detail tables and formulas, etc.
- Used elastic search for name pattern matching customizing to the requirement.
- Elastic search experience and capacity planning and cluster maintenance. Continuously looks for ways to improve and sets a very high bar in terms of quality.
- Used Kibana plugin to visualize for elastic search.
- Analyzed data by creating SQL Queries and identified fact and dimension tables.
- Worked with dimensionality reduction techniques like PCA.
- Worked with Regression model which includes Random forest regression, Lasso Regression.
- Worked with various classification algorithms including Naïve Bayes, Random forest, Support Vector Machines, Logistic Regression etc.
- Boarded and migrated test and staging use cases for applications to AWS cloud with public and private IP ranges to increase development productivity by reducing test-run times.
- Also worked with K-Nearest Neighbors algorithms for product recommendations including content-based filtering and collaborative filtering methods.
- Developed SQL scripts, packages and procedures for Business rules check to implement business rules.
- Performed ETL process design and requirements, Data Mapping and BI reporting requirements.
- Migrated Tableau reports from the Windows server to Amazon AWS cloud workspace.
- Developed Master Detail, Detail reports using tabular and Group above reports.
- Developed Shell scripts to automate execution of SQL scripts to check incoming data with master tables, insert the valid data into Customer Management System and invalid data into error tables which will be sent back to sender notifying the errors
- Applied Clustering algorithms such as K-means to categories customer’s data in to certain groups.
- Involved in Time Series forecasting models.
- Worked with Content Based Filter, Collaboration filter for recommending products to the customers.
- Used Regularization techniques such as L1 and L2 and Elastic net to balance variance - basics tradeoff.
- Used Pyplot, ggplot, seaborn, Matplotlib for visualizing the results.
Environment: Python 3, SQL Server, Tableau, NumPy, Spark, SQL Queries, Pandas, AWS, Cloud,Matplotlib, Scikit-Learn, Seaborn, ggplot2, AWS, Cloud, Tableau.
Confidential, Berkeley Heights, NJ
- Communicated and coordinated with other departments to gather business requirements.
- Designed and presented executive dashboards and scorecards to show data trendsusing tableau.
- Power use of Python, Excel, Tableau for University Graduate and Undergraduate Surveys.
- Data wrangling, data cleaning and data processing using Python.
- Responsible to generate surveys in Qualtrics and test survey flows.
- Create new, update and maintain existing Tableau survey dashboards and process documentation.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
- Documented the data requirements and system changes into detailed functional specifications, created data mapping documents from various source data feeds to the target databases.
- Works with business SMEs, Delivery teams in gathering the requirements & documents inline of agile methodology. It includes identification high level business needs, Preparation of Duck-creek specific Detail Requirement docs (DRS), Screen field Design Doc, Preparation of Application Workflow.
- Experienced in working with EMR cluster and S3 in AWS cloud.
- Facilitate the Requirement gathering meetings with Business SMEs and Technical Team.
- Performed Exploratory Data Analysis and Data Visualizations using Python.
- In preprocessing phase, used Pandas and Scikit-Learn to remove or impute missing values, detect outliers, scale features, and applied feature selection (filtering) to eliminate irrelevant features.
- Used Python (NumPy, SciPy, Pandas, Scikit-Learn, Seaborn), to develop variety of models and algorithms for analytic purposes.
- Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
- Used Pandas library for statistical Analysis.
- Communicated the results with operations team for taking best decisions.
- Collected data needs and requirements by Interacting with the other departments.
Environment: Python 3.2, MySQL, Jupyter Notebook, AWS, Tableau, Spark, Pandas, NumPy, SciPy, Cloud, Scikit-Learn, ggplot2.
- Data Mining and leveraging Data Analytical models to create descriptive and predictive models “Test Analytics” Platform.
- Analyze the structured and unstructured datasets available from Defect Tracking tools, Test Management tools and other Software Testing platforms.
- Conducted meetings with the business users from HR and Legal domains to understand the business use case and other reporting requirements.
- Created reports, visualizations in Excel, Tableau and presented the results and insights to the business users.
- Performed Data mapping, SWOT analysis, Gap Analysis, Cost Benefit Analysis, designed new process flow, documented the business process, various business scenarios and activities of the business from the conceptual to contextual level.
- Worked with the data science team and to prepare a model for predicting employee turnover.
- Validated data flow from source to the target system using SQL for a new in-house application used for identifying access persons per the SEC requirements.
- Used JIRA, Service Now for projects and issue tracking.