Sr. Data Analyst/ Tableau Developer Resume
Houston, TX
SUMMARY:
- Having 7+ years' experience in advising use of data for compiling personnel and statistical reports and preparing personnel action documents patterns within data, analyzing data and interpreting results.
- Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles.
- Proficient in Data Acquisition, Storage, Analysis, Integration, Predictive Modeling, Logistic Regression, Decision Trees, Data Mining Methods, Forecasting, Factor Analysis, Cluster Analysis, Neural Networks and other advanced statistical and econometric techniques.
- Adept in writing code in R and T - SQL scripts to manipulate data for data loads and extracts.
- Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy.
- Ability to extract Web search and Data collection, Web data mining, extract database from website, extract Data entry and Data processing.
- Extensive experience with creating MapReduce jobs, SQL on Hadoop using Hive and ETL using PIG scripts, Flume for transferring unstructured data to HDFS.
- Strong Oracle/SQL Server programming skills, with experience in working with functions, packages and triggers.
- Experience in all phases of Data warehouse development from Requirements, analysis, design, development, testing and post production support.
- Strong in-depth knowledge in doing data analysis, data quality and source system analysis.
- Independent, enthusiastic team player with strong adaptability to new technologies.
- Knowledge about setting up Python REST API Frame work using Django.
- Experienced in using pandas from Numpy stack in Python to analyze the data in .csv file to calculate the number of errors for disk monitoring.
- Experienced in interacting with business users to identify their needs, gathering requirements.
- Good knowledge of Visualization, Dimensional Modeling and Normalization approaches.
- Experience in Tableau functions and developing, designing, implementing statistical/predictive models.
- Experience in working on version control systems GIT and used Source code management client tools like GitBash, GitHub, Git GUI and other command line applications etc.
- Well versed with Agile with SCRUM, Waterfall Model and Test-driven development methodologies.
- Developed web applications and RESTful web services and APIs using Python Flask, Django.
TECHNICAL SKILLS:
Databases: MS SQL Server, Oracle, HBase, Amazon Redshift, MS SQL
Statistical Methods: Hypothetical Testing, Exploratory Data Analysis (EDA), Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation
Machine Learning: Regression analysis, Naïve Bayes, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, Collaborative Filtering, K-Means Clustering, KNN, CNN, RNN and Ada Boosting.
Data Visualization: Tableau, MatPlotLib, Seaborn, ggplot2
Reporting Tools: Tableau Suite of Tools 10.x, Server and Online, Server Reporting Services(SSRS)
Hadoop Ecosystem: Hadoop 2.x, Spark 2.x, MapReduce, Hive, HDFS, Pig
Cloud Services: Amazon Web Services (AWS) EC2/S3/Redshift
Operating Systems: Microsoft Windows, Linux (Ubuntu), Microsoft Office Suite (Word, PowerPoint, Excel)
PROFESSIONAL EXPERIENCE:
Sr. Data Analyst/ Tableau Developer
Confidential - Houston, TX
Responsibilities:
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, R, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.
- Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and Spark2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
- Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.
- Ensure solutions architecture / technical architectures are documented & maintained, while setting standards and offering consultative advice to technical & management teams and involved in recommending the roadmap and an approach for implementing the data integration architecture (with Cost, Schedule & Effort Estimates)
- Designed and developed NLP models for sentiment analysis.
- Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, Microstrategy.
- Developed and evangelized best practices for statistical analysis of Big Data.
- Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
- Designed the Enterprise Conceptual, Logical, and Physical Data Model for 'Bulk Data Storage System 'using Embarcadero ER Studio, the data models were designed in 3NF
- Worked on machine learning on large size data using Spark and MapReduce.
- Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift.
- Explored and analyzed the customer specific features by using SparkSQL.
- Performed data imputation using Scikit-learn package in Python.
- Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches.
- Developed Spark/Scala, SAS and R programs for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
- Conducted analysis on assessing customer consuming behaviours and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
- Built regression models include: Lasso, Ridge, SVR, XGboost to predict Customer Life Time Value.
- Built classification models include: Logistic Regression, SVM, Decision Tree, Random Forest to predict Customer Churn Rate.
- Used F-Score, AUC/ROC, Confusion Matrix, MAE, RMSE to evaluate different Model performance.
Environment: AWS RedShift, EC2, EMR, Hadoop Framework, S3, HDFS, Spark (Pyspark, MLlib, Spark SQL), Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/Matplotlib/Seaborn), Tableau Desktop (9.x/10.x), Tableau Server (9.x/10.x), Machine Learning (Regressions, KNN, SVM, Decision Tree, Random Forest, XGboost, LightGBM, Collaborative filtering, Ensemble), Teradata, Git 2.x, Agile/SCRUM.
Sr. Data Analyst/Data Scientist
Confidential - Miami, FL
Responsibilities:
- As part of the Development team, Designed and developed innovative dashboard, analytics, and reports using SSRS and Tableau 8.2
- Performed data/systems analysis to determine best BI solution (reports, dashboards, scorecards etc.) using Tableau
- Created Stored Procedures to migrate data from Flat File Structure to a Normalized Data structure
- Created Test Report Dashboards that helps in visualizing several thousands of App Status using SSRS and Tableau 8.2
- Created Tableau and SSRS dashboards to know the App Status of machine for user/ organization
- Created Adhoc reports in SSRS and Tableau 8.2 to visualize top 10 incompatible Apps based on User count
- Developed stacked bar chart in SSRS and Tableau 8.2 for the top level management to view the Test Status of Apps across the organization
- Designed/ developed generalized dashboards using SSRS and Tableau 8.2, which are helping Administrators to migrate over 25K machines form one environment to another
- Developed various Dashboards using objects such as Chart Box (Drill Down, Drill up), List, crosstab etc. using Tableau.
- Created complex formulas and calculations within Tableau to meet the needs of complex business logic
- Combined Tableau visualizations into Interactive Dashboards using filter actions, highlight actions etc., and published them to the web
- Developed various data connections from data source to Tableau Desktop for report and dashboard development
- Created Technical Documentation for reports and maintenance/enhancements of existing Reports
- Involved in project planning, scheduling for database module with project managers
- Discussed with business people to understand the rules for defining the Test Status of Apps at the report level
- Created new Data source and replaced with existed Data source. Created Schedules and Extracted the data into Tableau Data Engine
- Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau
- Worked extensively with Advance analysis Actions, Calculations, Parameters, Background images and Maps
- Effectively used data blending feature in tableau. Defined best practices for Tableau report development
- SSRS experience with extensive report development experience, dash boarding and development of cubes for reporting
- Developed Custom Reports, Ad-hoc Reports by using SQL Server Reporting Services (SSRS)
- Developed Tabular, drill down, parameterized, cascaded and sub-reports using SSRS
- Generated Reports using Global Variables, Expressions in SSRS 2008
- Created Parameterized Reports and Linked Reports and thorough knowledge of report server architecture using SSRS. Developed a High level dashboard using SSRS 2005
- Analysis of data for the purpose of Product Version Mapping
Environment: Tableau 8.2/8.4 Desktop, MS SQL Server 2008R2/2005, SSIS, SSRS, Windows 7, Management Studio
Data Analyst- Python /Tableau Developer
Confidential - Princeton, NJ
Responsibilities:
- Used Django/flask Framework in developing web applications to implement the model view control architecture.
- Developed views and templates with Python and Django's view controller and templating language to create user-friendly website interface.
- Developed statistical and predictive models to support business insights that led to fact- based decision making.
- Developed views and templates with Python and Django's view controller and templating language to create user-friendly website interface.
- Implementation of REST web services.
- Getting data from various 3rd party vendors like Spotify, YouTube, iTunes.
- Getting json files from spotify API using python.
- Used Python and Django to interface with the jQuery UI and manage the storage and deletion of content.
- Created a Python/Django based web application using Python scripting for data processing, MySQL for the database, and HTML/CSS/jQuery and High Charts for data visualization of the served pages.
- Implemented AWS solutions using S3, RDS, EBS, Elastic Load Balancer, and Auto scaling groups.
Environment: HTML, CSS, JavaScript, AngularJS, Node.JS, Git, REST API, Mongo DB.
Data Analyst / Python Developer
Confidential
RESPONSIBILITIES:
- Analyzed customer help data, contact volumes, and other operational data in Mysql to provide insights that enable improvements to help content and customer experience.
- Individually developed, implemented and managed a data operation platform system to ensure company's routine work, reduce unnecessary repetitive operations, and highly improve all departments' working efficiency.
- Brought in and implemented updated analytical methods such as regression modeling, classification tree, statistical tests and data visualization techniques with Python
- Deployed Machine Learning Models built using mahout on Hadoop cluster
- Maintained and updated existing automated solutions.
- Analyzed historical demand, filter out outliers/exceptions, identify the most appropriate statistical forecasting algorithm, develop base plan, understand variance, propose improvement opportunities, and in corporate demand signal into forecast and executed data visualization by using plotly package in Python.
- Improved data collection and distribution processes by using pandas and numpy packages in Python while enhancing reporting capabilities to provide clear line of sight into key performance trends and metrics.
- Interacted with QA to develop test plans from high-level design documentation
Environment: Python, MySQL, Hadoop (HDFS), Machine Learning Algorithms (Regression, Classification)