Sr. Data Analyst/data Scientist Resume
St Louis, MO
SUMMARY
- Over 8+ years of IT experience in including Data Analysis, ETL pipelines, Data visualization, Model Evaluation, Predictive modelling, Data warehousing, BI reporting and have experience in various phases of the Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirement specifications, functional specifications, Data Validation, Test Plans, Source to Target mappings, SQL Joins, Data Cleansing
- Hands on experience in implementing Experienced in implementing linear & logistic regression, classification modeling, decision - trees, cluster and Time Series analysis, NLP, Dimensionality Reduction, CNN, ANN, Random forest, XG Boost, Naive Bayes, SVM, Clustering, Association Rule Mining, Reinforcement Learning using Python and R programming, Numpy, Scipy, Matplotlib, Seaborn, Pandas
- Extensively worked on very large structured and unstructured data and transformed business questions into machine learning models and developed data mining and reporting solutions.
- Skilled in Data preparation, Exploratory analysis, Feature engineering, parameter fine-tuning using supervised and unsupervised Machine Learning models.
- Expertise in performing data cleansing, transformation, describe data contents, compute descriptive statistics of data and ETL using Python and R.
- Experience and performing IT roles for various industry leaders. Acquired a deep range of skills in Machine learning algorithms, Statistical analysis Proficient in data mining tools like Python, Alteryx, Splunk, Hive, SQL, Tableau, Hive, pyspark ecosystems Staff leadership and development
- Proficient in advising on the use ofdatafor compiling personnel and statistical reports and preparing personnel action documentsPatterns withindata, analyzingdataand interpreting results
- Strong ability to analyze sets ofdatafor signals, patterns, ways to groupdatato answer questions and solve complexdatapuzzles
- Experienced in Advanced Regression Modeling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts
- Ability to extract Web search anddatacollection, Webdatamining, Extract database from website, ExtractDataentry andDataprocessing
- Possess functional knowledge in the areas of business process study, requirement capture, analysis, project documentation and training
- Highly competent at researching, visualizing and analyzing raw data to identify recommendations for meeting organizational challenges.
- Expert in data flow between primary DB and various reporting tools, Expert in finding Trends and Patterns within Datasets and providing recommendations accordingly.
- Ability to use dimensionality reduction techniques and regularization techniques.
- Understanding of AGILE methodology and practice
- Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server, NoSQL databases like MongoDB.
- Strong knowledge and skills in statistical methodologies such as A/B testing, experiment design, hypothesis test, Z-test, T-test, Chi-square independence test and ANOVA
- Experienced in using various packages in Python 3.5/2.7 and R like ggplot2, caret, dplyr, Pandas, NumPy, SciPy, Scikit- learn, Keras, TensorFlow, OpenCV, PyTorch
- Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X, Jupyter lab, R 3.0, RStudio, Visual studio, Spyder and Excel
- Worked with complex applications such as R, SAS, Matlab and SPSS to develop a neural network, cluster analysis
- Skilled in business intelligence and analytical skills with ability to extract insights and identify risk factors through careful analysis of statistical data
- Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau, Power BI.
- Design and deploy rich Graphic visualizations with Drill Down and Drop-down menu option and Parameterized using Tableau
- Effective team player with strong communication and interpersonal skills, possessing strong ability to adapt and learn new technologies and new business lines promptly
- Experienced in working with both technical and non-technical team members.
- Experienced in working in a team environment to deliver on demand service, ability to deliver appropriate quality solutions under pressure, pro-active and strong analytical problem-solving skills
- Participated in portfolio meetings; experienced in preparing hi-level design documents, low-level design documents, and detailed technical design documents using case scenarios
- I have experience in software development, maintenance, testing, enhancement, production support and system solutions for banking, insurance and Telecommunications industries
TECHNICAL SKILLS
Operating Systems/ Virtualization: Windows9x/NT/2000/XP/2003/2008, Apple Macintosh, UNIX, LINUX, AIX, Windows 2012 Virtualization, VM Ware, MAC
Statistical Modeling: Descriptive statistics, Excel( VLOOKUP, Rand, Pivot tables, Data Analysis tool Pak), Hypothesis Testing, Regression (Linear, Random forest, Lasso, ridge), Classification methods (Logistic, Multinomial, Random forest, XG BOOST, Decision Trees, Naïve Bayes, KNN, SVM), Parameter tuning, Cross validation, Model evaluation (ROC, AUC, Sensitivity, Specificity), NLP (Text mining), Word embedding (CBOW,word2vec,Tf/IDF), Deep learning Neural Networks, AI Computer vision, A/B Testing.
Programming Languages: Python, Tableau, Alteryx, SQL, Splunk, JavaScript, HTML, CSS, Cobol, R, Hive, Pyspark
Databases: MYSQL, NOSQL, VSAM, Db2
Browsers: Internet Explorer, Chrome, Fire Fox, Netscape Navigator
Presentation Tools: Word, Excel, Power Point, Visio
Project Management/ Agile Tools: MS Project, RALLY, JIRA
PROFESSIONAL EXPERIENCE
Sr. Data Analyst/Data Scientist
Confidential, St. Louis MO
Responsibilities:
- Involved in extensive adhoc reporting, routine operational reporting and data manipulation to produce routine metrics and dashboards for management
- Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau
- Participated in End to End data mining life cycle and used advanced data mining techniques to extract the data from different sources, conducted studies and generated rapid plots with different visualization tools.
- Interacting with other data scientists and architects, custom solutions for data visualization using tools like a tableau, Splunk, pyspark and Packages in python
- Developed Python modules for machine learning & predictive analytics
- Maintenance of large data sets, combining data from various sources by Excel, Enterprise, and Access and SQL queries
- Worked on data cleaning, data preparation and feature engineering with Python, including Numpy, Scipy, Matplotlib, Seaborn, Pandas, and Scikit-learn
- Created visualization dashboards in Splunk and Tableau
- Created Deep Learning model to predict on outages as well as other important information needed for DevOps teams
- Preprocessed data using PCA and created workflows other dimensionality reduction techniques as well as tools such as Alteryx
- Performed SQL Testing on databases
- Developed SQL scripts using OLAP functions to improve the query performance while pulling the data from large tables
- Publish Interactive dashboards and schedule auto-data refreshes
- Created Splunk queries that helped market analysts spot emerging trends by comparing incremental data with reference tables and historical metrics
- Machine learning process with Splunk and building models, testing, predicting /preventing the outages of the buyflow
- Created new scripts for Splunk scripted input for system, collecting CPU and OS data
- Built Power BI Data Visualization Dashboard for VoLTE and LTE trending
- Build Dashboards on Market to see the KPIs of VoLTE, LTE & UMTS Top Offenders
- Monitored and trended VoLTE, LTE and Small Cell stats to find and resolve network issues by various performance tools like Splunk and Appd.
- Worked on Market Goals to improve the KPIs of VoLTE, LTE & UMTS Top Offenders, LTE Leakage and LTE Congestion
- Created automated ETL process utilizing Alteryx, SQL and Python to combine Sales/Orders data and Outage Data to provide insights for decision making and effective targeting of Outages reducing up to 90% of manual process work
- Created multiple Tableau dashboards for Released Offer Tracking, Outage Management Analysis & Customer Behavior
- Created ETL utilizing Alteryx and SQL to combine data from outage management system and AppDynamics and Splunk and provide reporting on Alerting setup to calculate response time in turn increasing the efficiency of DevOps team
- Created python scripts for tasks such as SFTP processes and managing Alteryx gallery using Python and R scripts
- Contributed in development of in-house R shiny application for improved management and reporting of system outages
- Creating Alteryx ETL to replace legacy script utilizing SQL which reduced runtime and increased overall efficiency
Environment: Alteryx, Python, R, Tableau, Splunk,9.2, SQL
Data Analyst
Confidential, Romeoville IL.
Responsibilities:
- Analyzed massive and highly complex data sets, performing ad-hoc analysis and data manipulation
- Wrote reports to reporting system to extract data for analysis using filters based on the analysis
- Worked on complex information model, logical relationships, and the data structures that support different jean brands
- Wrote several SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request
- Performing statistical data analysis and data visualization using R and Python
- Worked on creating filters and calculated sets for preparing dashboards and worksheets in Tableau
- Identified risk level and eligibility of new insurance applicants with Machine Learning algorithms
Software Engineer
Confidential
Responsibilities:
- Conducted analysis in assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering
- Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle
- Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL using Scala
- Merged data from different sources using Hive joins and performed Adhoc queries.
- Developed personalized product recommendation with Machine Learning algorithms, including Gradient Boosting Tree and Collaborative filtering to better meet the needs of existing customers and acquire new customers
- Used Python and Spark to implement different machine learning algorithms, including Generalized Linear Model, Random Forest, SVM, Boosting and Neural Network
- Worked on Spark and used PySpark, sparksql programming languages to process the large volumes of data.
- Worked on data cleaning, data preparation and feature engineering with Python, including Numpy, Scipy, Matplotlib, Seaborn, Pandas, and Scikit-learn
- Determined customer satisfaction and helped enhance customer using NLP
- Recommended and evaluated marketing approaches based on quality analytics on customer consuming behavior
- Performed data visualization and Designed dashboards with Tableau and provided complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders
- Identified process improvements that significantly reduce workloads or improve quality
- Provided analytical support to underwriting and pricing by preparing and analyzing data to be used in actuarial calculations
- Garnered technical prowess on Change Management by performing Impact Analysis for Change Requests obtained during Claims processing lifecycle.
- Created Use Case models for policy creation to policy end using UML diagrams like Use Case Diagrams, Sequence diagrams and Flow Diagrams using Rational Rose.
- Makes strategic recommendations for reducing the frequency and severity of losses through the use of a workers' compensation database and/or other claims data reports; designs and develops data bases, performs integrated data analyses, and prepares reports.
- Frequently used Requirements Traceability Matrix (RTM) for identifying and tracing the linkages among PolicyCenter, BillingCenter and ClaimCenter.
- Performed GAP analysis on the traditional claims system and newly integrated Claim Center and carefully elaborated application enhancement specifications detailing in scope/out of scope items, as-is/to-be process maps and critical test scenarios.
Environment: R Studio, Python, Tableau, SQL Server 2012, 2014 and Oracle 10g, 11g
Software Engineer
Confidential
Responsibilities:
- Work with users to identify the most appropriate source of record required to define the asset data for financing
- Implemented Agile Methodologies, Scrum stories and sprints in a Python based environment, along with data analytics and Excel data extracts.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Involved in defining the source to target data mappings, business rules, business and data definitions
- Performed Data analysis and Data profiling using complex SQL on various sources systems including Oracle and Teradata
- Developed normalized Logical and Physical database models for designing an OLTP application
- Developed new scripts for gathering network and storage inventory data and make Splunk ingest data
- Using HiveQL developed many queries and extracted the required information.
- Responsible for defining the key identifiers for each mapping/interface
- Responsible for defining the functional requirement documents for each source to target interface
- Used advanced Excel features like Pivot tables and Charts for generating Graphs.
- Designed and developed weekly, monthly reports by using MS Excel Techniques (Graphs, Charts, Pivot tables) and Power point presentations
- Involved in creation of Excel sheets, including Vlookup, pivots, conditional formatting, large record sets, data manipulation and cleaning
- Imported the customer data into Python using Pandas libraries and performed various data analysis - found patterns in data which helped in key decisions for the company
Environment: SQL/Server, R connector, Python, R, Tableau