We provide IT Staff Augmentation Services!

Data Scientist / Sr Data Analyst Resume

4.00/5 (Submit Your Rating)

Scottsdale, AZ

SUMMARY

  • Above 8+ years of extensive IT experience in the field of Data Analysis/Business analysis, ETL Development, and Project Management.
  • Above 2+ years of extensive experience as a Data Scientist with extensive experience in Data Mining, Statistical Data Analysis, Exploratory Data Analysis and Machine Learning with various forms of data.
  • Proven expertise in employing techniques for Supervised and Unsupervised (Clustering, Classification, PCA, Decision trees, KNN, SVM) learning, Predictive Analytics, Optimization methods and Natural Language Processing(NLP), Time Series Analysis.
  • Experienced in Machine Learning Regression Algorithms like Simple, Multiple, Polynomial, SVR(Support Vector Regression), Decision Tree Regression, Random Forest Regression.
  • Experienced in Advanced Statistical Analysis and Predictive Modeling in structured and unstructured data environment.
  • Strong expertise in Business and Data Analysis, Data Profiling, Data Migration, Data Conversion, Data Quality, Data Governance, Data Lineage, Data Integration, Master Data Management(MDM), Metadata Management Services, Reference Data Management (RDM).
  • Hands on experience of Data Science libraries inPythonsuch asPandas, NumPy, SciPy, scikit - learn, Matplotlib, LibSVM and NLTK.
  • Researched on theArtificialIntelligence (AI)space to gather requirements, comparing enterrpise products like Google and AWS, and vertical marketing such as IBM Watson.
  • Solid understanding of AWS (Amazon Web Services) S3, EC2, RDS and IAM, Azure ML, Apache Spark, Scala process and concepts.
  • Experienced in Machine Learning Classification Algorithms like Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree & Random Forest classification.
  • Hands on experience on R packages and libraries like ggplot2, Shiny, h2o, dplyr, reshape2, plotly, ElmStatLearn, caTools etc.
  • Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in writing/documenting Technical Design Document(TDD), Functional Specification Document(FSD), Test Plans, GAP Analysis and Source to Target mapping documents.
  • Solid understanding of Artificial Neural Networks(ANN) and Deep Learning models using Theano, Tensorflow and keras packages using Python.
  • Excellent understanding ofHadooparchitecture and Map Reduce concepts and HDFS Framework.
  • Strong understanding of project life cycle and SDLC methodologies including RUP, RAD, Waterfall and Agile.
  • Strong expertise in ETL, Data warehousing, Operational Data Store (ODS), Data Marts, OLAP and OLTP technologies.
  • Excellent Team player and self-starter, possess good communication skills.

TECHNICAL SKILLS

Databases: SQL Server, MS Access, Teradata, Oracle

NoSql Databases: HBase, MongoDB, Cassandra

Programming Languages: C, C++, MATLAB, R, Python, PowerShellScripting, Java, Javascriptscala, pig

Markup languages: XML, HTML, DHTML, XSLT, X Path, X Query and UML

ETL Tools: ETL Informatica Power Center, SSIS

Data Modeling Tools: MS Visio, Rational Rose, Erwin

Testing Tools: HP Quality Center ALM

Big Data Tools: Hadoop, Hive, Apache Spark, Pig

Operating Systems: UNIX, Linux, Windows

Reporting & Visualization: Tableau, Matplotlib, Seaborn, ggplot, SAP Business Objects, Crystal Reports, SSRS, Cognos

PROFESSIONAL EXPERIENCE

Confidential, Scottsdale, AZ

Data Scientist / Sr Data Analyst

Responsibilities:

  • Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Used Python Matplotlib packages to visualize and graphically analyses the data.
  • Data pre-processing, Splitting the identified data set into Training set and Test set.
  • PerformedDataWrangling to clean, transform and reshape thedatautilizing pandas library.
  • Data cleaning, Data wrangling, manipulation, and visualization. Extract data from relational databases and perform complex data manipulations. Also conducted extensive data checks to ensure data quality.
  • Used R, Python programming languages to graphically analyses the data and perform data mining. Also Built and analyzed datasets using Python,MATLABand R.
  • Handled importingdatafrom variousdatasources, performed transformations using Hive, Map Reduce, and loadeddatainto HDFS.
  • Understand transactiondataand develop analytics insights using statistical modeling usingArtificial Intelligence (AI)using Python.
  • PowerShell script creation and operation.
  • Analyzed performance of image segmentation using ConvolutionalNeuralNetworks(CNN).
  • Analyzed performance of recurrentNeuralNetworksfor data over time.
  • Used Python NumPy, SciPy, Pandas packages to perform dataset manipulation.
  • Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified various anomalies.
  • Applied NLP techniques (LDA, NMF) on Steam game descriptions and discovered latent features incorporated as item sidedata
  • Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R,MATLAB and Python.
  • Detected and classified required container images usingdeep learning algorithms (NN, ANN, CNN with backend in Keras,Tensorflow) in Python.
  • Used Python Scikit-learn, Theano, Tensorflow and keras packages to train machine learning models.
  • Performed Pre-research on big data tools such as Spark, Cassandra, NoSQL databasesand assess the advantages and disadvantages of them.
  • Extensively used open source tools - R Studio(R), Spyder(Python), Jupyter Notebooks for statistical analysis and building the machine learning models.

Environment: R, Python, Spark MLlib, TensorFlow, Keras, Caffe, ETL, Sypder, Jupyter notebook, Azure ML, Data Quality, R Studio, Tableau, Scala, HDFS, Hive, NoSQL, NLP, NLTK, NumPy, SciPy, h2o, Pandas, AWS (EC2, RDS, S3), Matplotlib, Scikit-learn, Shiny, Supervised & Unsupervised Learning, Deep Learning, Artificial Intelligence

Confidential, Wilmington, DE

Data Scientist / Sr Data Analyst

Responsibilities:

  • Responsible for predictive analysis of credit scoring to predict whether or not credit extended to a new or an existing applicant will likely result in profit or losses.
  • Improved classification of bank authentication protocols by 20% by applying clustering methods on transactiondatausing Python Scikit-learn locally, and Spark MLlib on production level.
  • Data was extracted extensively by using SQL queries and used R, Python packages for the Data Mining tasks.
  • Performed Exploratory Data Analysis, Data Wrangling and development of algorithms in R and Python for data mining and analysis.
  • Implemented Natural Language Processing (NLP) methods and pre-trained word2vec models for the improvement of in-app search functionality.
  • Involved in transforming data from legacy tables to HDFS and HBASE tables using Sqoop. Research on Reinforcement learning and control (Tensorflow, Torch) and machine learning model (Scikit-learn).
  • Used Python based data manipulation and visualization tools such as Pandas, Matplotlib, Seaborn to clean corrupted data before generating business requested reports.
  • Developed extension models relying on but not limited to Random forest, logistic, Linear regression, Stepwise, Support Vector machine, Naive Bayes classifier, ARIMA/ETS model, K-Centroid clusters.
  • Used Machine Learning to build various algorithms (Random Forest, Decision trees, Naive Bayes) classification models.
  • Extracteddatafrom HDFS and prepareddatafor exploratory analysis using Data Munging.
  • Extensively used R packages like (GGPLOT2, GGVIS, CARET, DPLYR) on huge data sets.
  • Used R, Python programming languages to graphically analyses the data and perform data mining.
  • Did extensive data mining to find out relevant features in an anonymized dataset using R and Python. Used an ensemble of Xgboost (Tuned using Random Search) model to make predictions.
  • Explored 5 supervised Machine Learning algorithms (Regression, Random Forest, SVM, Decision tree, Neural Network) and used parameters such as Precision/Adjusted R-Squared /residual splits to select the winning model of the 5 different models.
  • Developed Tableau based dashboard from oracle, SQL databases to present to business team for data visualization purpose.

Environment: R (dplyr, caret, ggplot2), Python (Numpy, Pandas, PySpark, Scikit-learn, MatplotLib, NLTK), T-SQL, MS SQL Server, R Studio, Spyder, Jupyter notebook, Tensorflow, Theano, Caffe, MATLAB, ETL, HDFS, Scala, Shiny, h2o, Oracle, Teradata, Java, Tableau, Supervised & Unsupervised Learning

Confidential, Alpharetta, GA

Sr Data Analyst

Responsibilities:

  • Worked on Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses and Data Marts using Informatics Power Center (Repository Manager, Designer, Workflow Manager, Workflow Monitor, Metadata Manger), Power Exchange, Power Connect as ETL tool on Oracle, DB2 and SQL Server Databases.
  • Created data transformations from internal and third party data sources into data suitable for handheld devices, including XML.
  • Implemented SDLC methodologies including RUP, RAD, Waterfall and Agile.
  • Conducted GAP analysis so as to analyze the variance between the system capabilities and business requirements.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Created Logical/physical Data Model in ERwin and have worked on loading the tables in the Data Warehouse. Documented various Data Quality mapping documents.

Environment: Informatica Analyst 9.6.1, ETL, Agile, Waterfall, XML, Teradata, SharePoint, SSIS, SSRS, Microstrategy, UML, SQL Server, MS Visio, Machine Learning, SQL, ETL, Oracle, Erwin

Confidential, Louisville, KY

Sr Business Data Analyst

Responsibilities:

  • Conducted User Acceptance testing (UAT) and worked with users and vendor who build the system.
  • Performed data profiling in the source systems that are required for Dual Medicare Medicaid Data mart.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the trumping rules applied by Master Data Repository.
  • Worked with internal architects and, assisting in the development of current and target state enterprise data architectures.
  • Performed data mining on Claims data using very complex SQL queries and discovered claims pattern.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Extensively used Trend Analysis, Data Discovery tools for in depth analysis to provide a great means for helping the top level management.
  • Designed and implemented data integration modules for Extract/Transform/Load (ETL) functions.
  • Responsible for performing Trend Analysis for analyzing the Medicaid / Medicare data marts for the best decisions.
  • Good Experience with Mainframe enterprise billing systems Involved in defining the business/transformation rules applied for Dual Medicare Medicaid data.
  • Used data analysis techniques to validate business rules and identify low quality for Missing data in the existing Humana Enterprise data warehouse (EDW).

Environment: SQL Server, Oracle10, 11g, MS-Office, Embarcadero, Netezza, Teradata, Informatica, Data Mining, ER Studio, Agile, XML, Informatica

Confidential, Chicago, IL

Business Data Analyst

Responsibilities:

  • Used data analysis techniques to validate business rules and identify low quality for Missing data in the existing State Farm Enterprise data warehouse (EDW).
  • Work with users to identify the most appropriate source of record and profile the data required for sales and service.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in generating Test cases forpropertyandcasualty(P&C) Insurances for Different Levels of Business.
  • Involved in defining the business/transformation rules applied for ICP data.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Utilized Informatica toolset (Informatica Data Explorer, and Informatica Data Quality) to analyze legacy data for data profiling.
  • Evaluated data profiling, cleansing, integration and extraction tools(e.g. Informatica)

Environment: Sql Server, Oracle, MS-Office, Agile, Teradata, Informatica, ER Studio, XML, SQL, Business Objects

Confidential

Business Analyst

Responsibilities:

  • Acted as primary liaison between the clients and the Information systems IT department to perform analysis, review, and estimation of client requests, prepared and reviewed requirements and client documentation.
  • Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams and Business Process Modeling.
  • Reviewed Development Plans, Quality Assurance Test Plans and User Documentation to ensure correct interpretation of original specifications.
  • Created and managed Project Templates, Use Case Project Templates, Requirement Types and Traceability Matrix.
  • Conducted Functional Walkthroughs, User Acceptance Testing (UAT), and supervised the development of User Manuals for customers.
  • Wrote T-SQL statements and stored procedures in Sql Server Management Studio for extracting as well as writing data.

Environment: Microsoft Office, Rational Clear Case, XML, MS Visio, MS Excel, MS Access, Manual Testing

Confidential

Business Analyst

Responsibilities:

  • Involved in all the phases of the project, communicated with the business and prepared Technical Specification Document.
  • Actively participated in the development and Peer reviews of new applications and enhancement work.
  • Used SQL queries to compare the database and the periodic reports generated by the application at various levels.
  • Prepared Test plans, Test Cases based on business requirements for functional testing.
  • Analyzed data and created class diagrams and ER diagrams for designing databases. Closely interacted with designers and software developers to understand application functionality and navigational flow and keep them updated about Business user’s sentiments.
  • Applied Unified Modeling Language (UML) methodologies to design Use Case Diagrams, Activity Diagrams, Sequence Diagram and ER diagrams.
  • Managed defects, Execute Test cases, and track product progress and consolidated the Status Reports on Daily and Weekly Basis.
  • Documented System Test Summary report for project sponsors and obtained business sign-offs.

Environment: MS Access, MS Excel, MS Visio, UML diagrams, Mainframes, Sql Server

We'd love your feedback!