Data Analyst Resume
Waltham, MA
PROFESSIONAL SUMMARY:
- A data analyst professional with 5 years of progressive experience in Data Analytics, Statistical Modeling, Visualization and Machine Learning . Excellent capability in collaboration, quick learning and adaptation.
- Experience in Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
- Experience in integrating data, profiling, validating and data cleansing transformation and data visualization using R and Python .
- Theoretical foundations and practical hands - on projects related to (i) supervised learning (linear and logistic regression, boosted decision trees, Support Vector Machines, neural networks, NLP), (ii) unsupervised learning (clustering, dimensionality reduction, recommender systems), (iii) probability & statistics, experiment analysis, confidence intervals, A/B testing, (iv) algorithms and data structures .
- Extensive knowledge on Azure Data Lake and Azure Storage .
- Experience in migration from heterogeneous sources including Oracle to MS SQL Server .
- Experience in writing SQL queries and working with various databases (MS Access, MySQL, Oracle DB)
- Hands on experience in design, management and visualization of databases using Oracle, MySQL and SQL Server .
- In depth knowledge and hands on experience of Big Data / Hadoop ecosystem (MapReduce, HDFS, Hive, Pig and Sqoop) .
- Experience in Apache Spark, Kafka for Big Data Processing & Scala Functional programming .
- Created, Maintained & scheduled various reports in Power BI like Tabular Reports , and Matrix Reports .
- Experienced in creating multiple kinds of Report in Power BI and present it using Story Points. Experience in dimensionality reduction using techniques like PCA and LDA .
- Extensive experience in designing, developing, and delivering business intelligence solutions using Power B I, SQL Server Integration Services ( SSIS ), and Reporting Services ( SSRS ).
- Used Power Query to acquire data and Power BI desktop for designing rich visuals.
- Experience in data analytics, predictive analysis like Classification , Regression , Recommender Systems.
- Good Exposure with Factor Analysis, Bagging and Boosting algorithms .
- Experience in Descriptive Analysis Problems like Frequent Pattern Mining, Clustering, Outlier Detection .
- Worked on Machine Learning algorithms like Classification and Regression with KNN Model, Decision Tree Model, Naïve Bayes Model, Logistic Regression, SVM Model and Latent Factor Model .
- Hands-on experience on Python and libraries like Numpy, Pandas, Matplotlib, Seaborn, NLTK, Sci-Kit learn, SciPy .
- Expertise and knowledge in TensorFlow to do machine learning/deep learning package in python .
- Good knowledge on Microsoft Azure SQL, Machine Learning and HDInsight .
- Good Exposure on SAS analytics .
- Good Exposure in deep learning with Tensor flow in python .
- Good Knowledge on Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R .
- Good knowledge in Tableau, Power BI for interactive data visualizations.
- In-depth Understanding in NoSQL databases like MongoDB, HBase ..
- Extensive experience in designing, developing, and delivering business intelligence solutions using Power BI, SQL Server Integration Services ( SSIS ), and Reporting Services ( SSRS ).
- Proficient in design and development of various Dashboards, Reports utilizing Tableau Visualizations like bar graphs, scatter plots, pie-charts, Geographic’s and other making use of actions, local and global filters, cascading filters, context filters, Quick filters, parameters according to the end user requirements.
- Good exposure in creating pivot tables and charts in Excel .
- Experience in developing Custom Report and different types of Tabular Reports, Matrix Reports, Ad hoc reports and distributed reports in multiple formats using SQL Server Reporting Services (SSRS) .
TECHNICAL SKILLS:
Languages: Python, R
Numpy, SciPy, Pandas, Scikit: learn, Matplotlib, Seaborn, ggplot2, caret, dplyr, purrr, readxl, tidyr, Rweka, gmodels, RCurl, C50, twitter, NLP, Reshape2, rjson, plyr, Beautiful Soup, Rpy2
Kernel Density Estimation and Non: parametric Bayes Classifier, K-Means, Linear Regression, Neighbors (Nearest, Farthest, Range, k, Classification), Non-Negative Matrix Factorization, Dimensionality Reduction, Decision Tree, Gaussian Processes, Logistic RegressionNa ve Bayes, Random Forest, Ridge Regression, Matrix Factorization/SVD
NLP/Machine Learning/Deep Learning: LDA (Latent Dirichlet Allocation), NLTK, Apache OpenNLP, Stanford NLP, Sentiment Analysis, SVMs, ANN, RNN, CNN, TensorFlow, MXNet, Caffe, H2O, Keras, PyTorch, Theano, Azure ML
Cloud: Google Cloud Platform, AWS, Azure, Bluemix
Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL
Data Modelling Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer
Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka
Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra.
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.
ETL Tools: Informatica Power Centre, SSIS.
Version Control Tools: SVM, GitHub
BI Tools: PowerBI,Tableau, Tableau Server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Azure Data Warehouse
Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat
PROFESSIONAL EXPERIENCE:
Confidential, Waltham, MA
Data Analyst
Responsibilities:
- Implemented Data Exploration to analyze patterns and to select features using Python SciPy.
- Built Factor Analysis and Cluster Analysis models using Python SciPy to classify customers into different target groups.
- Built predictive models including Support Vector Machine, Random Forests and Naïve Bayes Classifier using Python Scikit-Learn to predict the personalized product choice for each client.
- Using R’s dplyr and ggplot2 packages, performed an extensive graphical visualization of overall data, including customized graphical representation of revenue reports, specific item sales statistics and visualization.
- Designed and implemented cross-validation and statistical tests including Hypothetical Testing, ANOVA, Auto-correlation to verify the models’ significance.
- Designed an A/B experiment for testing the business performance of the new recommendation system.
- Supported MapReduce Programs running on the cluster.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Configured Hadoop cluster with Namenode and slaves and formatted HDFS.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Hadoop MapReduce and HDFS.
- Performed Data Enrichment jobs to deal missing value, to normalize data, and to select features by using HiveQL.
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Developed Hive queries for analysis, and exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Worked on improving performance of existing Pig and Hive Queries.
- Created reports and dashboards, by using D3.js and Tableau 9.x, to explain and communicate data insights, significant features, models scores and performance of new recommendation system to both technical and business teams.
- Utilize SQL, Excel and several Marketing/Web Analytics tools (Google Analytics, AdWords) in order to complete business & marketing analysis and assessment.
- Used Git 2.x for version control with Data Engineer team and Data Scientists colleagues.
- Used Agile methodology and SCRUM process for project developing.
Environment: HDFS, Hive, Scoop, Pig, Oozie, Amazon Web Services (AWS), Python 3.x (SciPy, Scikit-Learn), Tableau 9.x, D3.js, SVM, Random Forests, Naïve Bayes Classifier, A/B experiment, Git 2.x, Agile/SCRUM.
Confidential, Seattle, WA
Data Analyst
Responsibilities:
- Created SSIS Packages to integrate data coming from Text files and Excel files.
- Created Tables, Stored procedures and defined functions. Modified SQL scripts for tuning and scheduling.
- Extensively used Joins and Common Table Expressions to simplify complex queries involving multiple tables
- Developed complex Stored Procedures and views to generate various Drill through reports, parameterized reports and linked reports using SSRS.
- Was responsible in redesigning existing ETL 's to SSIS packages
- Took end to end ownership of migrating the SSIS, SQL objects, DB roles, SSRS and Excel Reports from 2008 R2 to SQL 2014
- Expert in creating Power pivots, Power views and SSRS reports using Cubes (MDX) and SQL queries.
- Interactive data visualization products focused on business intelligence such as MICROSOFT POWER BI.
- Developed confidential proprietary analytical tools and reports with Microsoft Excel, and POWER Pivot, and POWER Point.
- Generation of Dashboards with Quick filters, calculated fields, Groups, Parameters and Sets to handle views more efficiently in Tableau.
- Experience with software development life cycle (SDLC), Agile, Scrum and Project Management Methodologies.
- Publishing Power BI reports from Power BI Desktop to Power BI service.
- Build Tabular, line charts, Graph charts visuals for Marketing KPI's metrics based on fields brought in and transform using Power Query functions and DAX calculated Measures.
- Generated reports utilizing SSRS and Excel with power pivots; deployed reports within SharePoint integrated mode.
- Created detailed dashboards utilizing Microsoft BI desktop to improve analysis and decision making with various sources Azure, Sql server, Excels.
- Managed reports/data sources, schedule report execution, and delivery.
- Managed team foundation server utilizing source control, check - in, and check-out.
- Captured the long running jobs and worked on fine tuning the store procedures and tables to improve the performance.
Environment: MS SQL Server 2008 R2/2014, SSIS 2008 R2/2012, SSRS 2008 R2/2012, Power BI, Redgate, Team Foundation Server
Confidential
Data Analyst
Responsibilities:
- Used Developed and Published dynamic and complex Financial and Sales Reports in PowerBI and DAX to create complex measure and calculated columns
- Used bookmarks feature which enables the ease to jump to respective report tabs with one click.
- Imported custom themes and page backgrounds for rich visual effects.
- Performed end to end testing on the reports and packages developed to ensure data quality is met.
- Created drill down and drill through the report to go to details and child report within power bi based on hierarchy.
- Utilized Power Query in Power BI to Pivot & Un - pivot the data model for data cleansing and data massaging.
- Creating and managing schema objects such as Tables, Views, Indexes and referential integrity depending on user requirements.
- Developed SQL queries for data migration and Reporting Services. (SSRS)
- Develop and deploy SSIS packages, configuration files, and schedules a job to run the packages to generate data in CSV files.
- Troubleshooting reports issues, ETL job failures, optimizing query performances.
- Created many calculated columns and measures using DAX in Power BI based on report requirements.
- Implemented Event Handlers and Error Handling in SSIS packages and notified process results to various user communities.
- Explore data in a variety of ways and across multiple visualizations using Power BI.
- Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages in both the environments (Development and Production).
- Involved with creating complex SSAS cubes with multiple fact measures groups, and multiple dimension hierarchies based on the OLAP reporting needs.
- worked with SSAS in Power BI service and generated the reports based on user requirements to monitor regional performance.
- Designed and built up high-performance ETL packages to migrate and manipulate data from MS Excel and SQL Server by using SSIS.
- Extensively worked on Dimensional modeling, Data cleansing and Data Staging of operational sources using ETL processes.
- Extensively used Joins and Sub-Queries to simplify complex queries involving multiple tables
Environment: ETL, PowerBI, SSIS, SSAS, SSRS, OLAP, OLTP
Confidential
Data Reporting Analyst
Responsibilities:
- Designed SSIS strategies to pull data from different sources, like SQL server, flat files and load it to the destination database.
- Converting VSAM(Legacy System) Flat Files to SQLSERVER Destination using SSIS Packages
- Defined precedence constraints and implemented various control flow tasks such as Data Flow Task , For-Each Loop Container, For- Loop Container, Execute SQL task, and Send Mail Task
- Provided Package Configuration, Custom Package Logging, Error Handling , and Email Notification to SSIS packages.
- Used different transformations lookup , condition split, row count Transformation, Derived Column, Data Conversion, Union All.
- Extensively used SSIS Packages to load data from various formats into a warehouse
- Designed and developed Star Schema in Data warehouse.
- Designed and developed various complex reports such as Drill down , Drill Through , Cross Tabbed using MS Reporting Services ( SSRS ).
- Involved in design and deployment of Report Models for generating Ad-Hoc reports as per the client requirements.
- Responsible for stylish design layout like Pie charts , Graphs and Linked Reports .
- Involved in designing Parameterized Reports and also cascaded Parameterized Reports.
- Maintenance and provide support to the users for the deployed reports
- Involved in cubes designing using SSAS .
- Created stored procedures , Functions , Triggers , Cursors and SQL scripts Using SSMS
- Created views on database for better Performance and Security.