We provide IT Staff Augmentation Services!

Data Analyst/ Data Scientist Resume

5.00/5 (Submit Your Rating)

Minneapolis, MinnesotA

SUMMARY:

  • Over 8 years of experience in Data mining, predictive modeling, Statistical analytics, econometric modeling, data visualization with large data sets of structured and unstructured data.
  • Strong experience in Data Analysis, Data Cleaning, Data Migration, Data Conversion, Data Export and Import, Data Integration.
  • Experience in using python libraries like Numpy, SciPy, Pandas, Matplotlib, Scikit - learn.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Experienced in integration of various relational and non-relational sources such as, Teradata, Oracle, SQL Server, NoSQL, COBOL, XML and Flat Files.
  • Performed extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Experience in performance tuning and query optimization techniques in transactional and data warehouse environments.
  • Experience working on BI visualization tools (Tableau, & QlikView).
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Created reports for the users using Tableau by connecting to multiple data sources like Flat Files, MS Excel, CSV files, SQL Server, and Oracle.
  • Hands on experience in writing queries in SQL and R to extract, transform and load (ETL) data from large datasets using Data Staging.
  • Evaluating data sources and strong understanding of data warehouse/data mart design, ETL, BI, OLAP, client/server applications.
  • Experience in creating partitions, indexes, indexed views to improve the performance, reduce contention and increase the availability of data.

TECHNICAL SKILLS:

Programming Languages: Scripting Languages Python (Numpy, SciPy, Pandas, matplotlib, scikitlearn and seaborn), R (ggplot, Weka, dplyr, knitr, caret)

BI and Visualization: Tableau, QlikView, Tableau server, SAP Business Objects, OBIEE, Crystal Reports XI, Power BI, Tableau, SSRS and SPSS

Databases: MSSQL Server, Oracle database, and MySQL and NoSQL (Hive, and Oracle NoSQL

Machine Learning: Na ve Bayes, Decision Trees, Regression models, random forests, K-means clustering, Market Basket Analysis, Time-series and support vector machines

Tools: HQL, Pig, Apache Spark, Microsoft Visual Studio, SPSS, Microsoft SQL Integration Services, Microsoft SQL Analyzing Services, Microsoft SQL Reporting Services, Oracle Database Integrator.

ETL tools: Informatica power center, SSIS, SSAS.

Data Modelling Tools: Erwin 8.0, ER/Studio, SAP Power designer

PROFESSIONAL EXPERIENCE:

Data Analyst/ Data Scientist

Confidential, Minneapolis, Minnesota

Responsibilities:

  • Involved in Exploratory data analysis using Descriptive statistics and Data visualization to determine the base line MLAs.
  • Involved in building and automating the robust model with very good accuracy for the given customer base.
  • Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Coordinated the execution of A/B tests to measure the effectiveness of personalized recommendation system.
  • Applied Wilcoxon sign test to stock performance data for pre-acquisition and post-acquisition for different sectors to find the statistical significance in R programming
  • Recommended and evaluated marketing approaches based on quality analytics of customer consuming behavior.
  • Utilized SQL and Hive QL to query, manipulate data from variety data sources including Oracle and HDFS, while maintaining data integrity.
  • Worked on data cleaning, data preparation and feature engineering with Python including Numpy, SciPy, Pandas, Matplotlib, Seaborn and Scikit-learn.
  • Predicted the claim severity to understand future loss and ranked importance of features.
  • Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting.
  • Identifying internal and external information sources, building effective working relationships with subject matter experts across research groups within the firm and the external marketplace Involved in Data preparation using various tasks like.
  • Data reduction - Obtains reduced representation in volume but produces the same or similar analytical results Developed logistic regression models to predict subscription response rate based on customers' variables like past transactions, response to prior mailings, promotions, demographics, interests etc.
  • Data discretization - transform quantitative data into qualitative data.
  • Data cleaning - Fill in missing values, handle the noisy data, identify or remove outliers and resolve inconsistencies.
  • Data integration - Integration of multiple databases, data cubes, or files.
  • Data transformation - Normalization, standardization and aggregation.
  • Designed dashboards with Tableau and D3.js and provided complex reports, including summaries, charts, and graphs to interpret findings to team and stakeholders.

Environment: Tableau, Oracle, Teradata, R, Python and Spark, SQL, Hive QL, Machine learning algorithms, HDFS.

Data Analyst

Confidential, Omaha, NE

Responsibilities:

  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
  • Led technical implementation of advanced analytics projects, Defined the mathematical approaches, developer new and effective analytics algorithms and wrote the key pieces of mission-critical source code implementing advanced machine learning algorithms utilizing caffe, TensorFlow, Scala, Spark, MLLib, R and other tools and languages needed.
  • Built analytical data pipelines to port data in and out of Hadoop/HDFS from structured and unstructured sources and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python and R.
  • Professional Tableau user (Desktop, Online, and Server), Experience with Keras and Tensor Flow.
  • Created mapreduce running over HDFS for data mining and analysis using R and Loading & Storage data to Pig Script and R for MapReduce operations and created various types of data visualizations using R, and Tableau.
  • Worked on machine learning on large size data using Spark and MapReduce.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Stored and retrieved data from data-warehouses using Amazon Redshift.
  • Responsible for planning & scheduling new product releases and promotional offers.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Worked on NOSQL databases like MongoDB, HBase.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.
  • Worked on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.

Environment: Python, MongoDB, JavaScript, SQL Server, HDFS, Pig, Hive, Oracle, DB2, Tableau, ETL (Informatica), SQL, T-SQL, EC2, EMR, Teradata, Hadoop Framework, AWS, Spark SQL, Scala, SparkMllib, NLP, SQL, Matlab, HBase, Cassandra, R, Pyspark, Tableau Desktop, Excel, Linux, CDH5

Data Analyst

Confidential, Omaha, NE

Responsibilities:

  • Actively involved in the gathering requirements and analysis of the requirements related to the project and responsible for writing Business Approach Documents and Technical Documents for new projects.
  • Worked with the team in mortgage domain to implement designs based on the free cash flow, acquisition, and capital efficiency.
  • Extensively worked on production support that includes Deployment of SSIS Packages into Development, Production Servers, Creating, Scheduling and Managing the SQL Jobs.
  • Creating SSIS Package Configurations and maintaining their Tables by editing the values of the variables as per the requirement.
  • Worked on Data migration project by validating three source data row by row and field by filed.
  • Using SQL Server Integration Services (SSIS) Migrated data from Heterogeneous Sources and legacy system (Oracle, Access, Excel) to centralized SQL Server databases to overcome transformation constraints and limitations.
  • Worked on development of SSIS package to generate excel reports individually and consolidated reports by dynamically passing the carrier ids from the SQL Table in Shared Folder to make available to the business users.
  • Used transformations like SCD,, Data Conversion, Conditional Split, Merge Join, Derived Column, Lookup, Cache Transform and Union all etc., to convert raw data into required data meeting the business requirements.
  • Worked on re-design of the few SSIS Package which involved very complex logic and more than a hundred tables and tasks and developed simplified version of the Packages.
  • Created Rich dashboards using Tableau Desktop and prepared user stories to create compelling dashboards to deliver actionable insights.
  • Performed data refresh on Tableau Server on weekly, monthly and quarterly basis and ensured that the views and dashboards were accurately displaying the changes in data.
  • Designed and developed a business intelligence dashboard using Tableau Desktop, allowing executive management to view past, current and forecast sales data.
  • Created multiple Visualization reports/dashboards using Dual Axes charts, Histograms, Filled map, Bubble chart, Bar chart, Line chart, Tree map, Box and Whisker Plot, Stacked Bar etc.,
  • Experience with creation of users, groups, projects, workbooks and the appropriate permission sets for Tableau server logons and security checks.
  • Created advanced analytical dashboards using Reference Lines, Bands and Trend Lines.
  • Developed and rendered monthly, weekly and Daily reports as per the requirement and gave roles, user access with respect to security in report manager.
  • Extensively used Teradata utilities (BTEQ, Fast Load, Multiload, TPUMP) to import/export and load the data from oracle and flat files.

Environment: Microsoft SQL Server 2012/2008 R2, Teradata, SSIS/SSAS/SSRS 2012/2008, Microsoft Office, MS VISO, Crystal Reports XI Release, Visual Source Safe, Microsoft TFS, BIDS, And Visual Studio, Tableau

Business Data Analyst

Confidential

Responsibilities:

  • Involved in all phases of the SDLC (Software Development Life Cycle) from Requirement Gathering, analysis and design, development, testing, maintenance with timely delivery against aggressive deadlines.
  • Developed and customized stored procedures, functions, packages and Triggers.
  • Used Bulk Collections for better performance and easy retrieval of data, by reducing context switching between SQL and PL/SQL engines.
  • Handled errors using Exception Handling extensively for the ease of debugging and displaying the error messages in the application.
  • Involved in ETL Data validation, count and source to target table mapping using SQL queries and back-end testing.
  • Involved in Writing Backend SQL queries for source And Target Table Validation.
  • Worked with Informatica Tool for Understanding Transformation.
  • Troubleshooting performance issues and fine-tuning queries and stored procedures.
  • Worked under the supervision of a DBA and created database objects such as tables, views, sequences, synonyms, and table/column constraints, indexes for enhancement
  • Involved in the design of the overall database using Entity Relationship diagrams.
  • Involved in code review and unit testing.
  • Involved in Functional Testing, Integration Testing, Regression Testing.
  • Created several database triggers for implementing Integrity Constraints.

Environment: Oracle 11g, Informatica Power Center 8.5, SQL developer, SQL, PL/SQL.

We'd love your feedback!