We provide IT Staff Augmentation Services!

Data Scientist/data Analyst Resume

2.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Over 6+ years as a Data Scientist/Data Analyst leading the development and deployment of large - scale models using big data tools and architectures, such as Apache Spark and Hadoop. Experience using accurate machine-learning models in Python and R.
  • Experienced in in ETL development using Informatica Power center and SSIS.
  • Experience with Cloud Computing technologies such as AWS and Azure.
  • Strong expertise in Apache Hadoop/MapReduce on YARN, Apache Sqoop, Apache Storm, Zookeeper, Apache OOzie/Spark/H2O.ai/Kafka/pySpark and AWS to run computing in large valume of data.
  • Designing and developing various machine learning frameworks using Python, R, and Matlab.
  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
  • Rich Experience in managing entire data science project life cycle and involved in all phases, including data extraction, data cleaning, statistical modeling and data visualization, with large datasets of structured and unstructured data.
  • Hands-on experience in Machine Learning algorithms such as Linear Regression, GLM, CART, SVM, KNN, LDA/QDA, Naive Bayes, Random Forest, SVM, Boosting, K-means Clustering, Hierarchical clustering, PCA, Feature Selection, Collaborative Filtering, Neural Networks and NLP.
  • Expert noledge in MS Office, PowerPoint, MS Visio, MS-Excel, Origin and Access including vlookups, pivot tables, graphs, etc.
  • Proficient in Statistical Modeling, Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing datamining and reporting solutions dat scales across massive volume of structured and unstructured data.
  • Professional working experience with Python 2.X / 3.X libraries including MatplotLib, Numpy, Scipy, Pandas, Beautiful Soup, Seaborn, Scikit-learn and NLTK for analysis purpose.
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 / 2.X, (Jupyter Notebook, Spyder), R 2.15 / 3.0 (Reshape, ggplot2, Dlpr, Car, Mass and Lme4), SAS 9.3, Matlab 8.0 and Excel.
  • Experience with data visualizations using Python 2.X / 3.X and R 2.15 / 3.0 and generating dashboard with Tableau 8.0 / 9.2 / 10.0.
  • Working experience in Statistical Analysis and Testing including Hypothesis test, Anova, Survival Analysis, Longitudinal Analysis, Experiment Design and Sample Determination and A/B test.
  • Hands-on experience in importing and exporting data using Relational Database including Oracle 11g / 12c, MySQL 5.0 and MS SQL Server, and NoSQL database like MongoDB 3.3 / 3.4.
  • Working experience in big data environment like Hadoop Ecosystem 1.X / 2.X including HDFS, MapReduce, Hive 0.11, HBase 0.9, Spark Framework 1.4 / 1.6 / 2.0 including Pyspark, MLlib and SparkSQL.
  • Working experience in version control tools such as Git 2.X to coordinate work on file with multiple team members.
  • Experience working with data modeling tools like Erwin, PowerDesigner and ERStudio.
  • Experience in designing Star Schema, Snow Flake Schema for Data Warehouse, ODS Architecture.
  • Good understanding of Teradata SQLAssistant, Teradata Administrator and data load/ export utilities like BTEQ, FastLoad, MultiLoad, FastExport.
  • Employing various SDLC methodologies such as Agile and SCRUM methodologies.
  • Knowledge of working with Proof of Concepts (POC), Gap Analysis, gathering necessary data for analysis from different sources, preparing data for exploration using Data Mining and Teradata.
  • Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Good team player and quick-learner; highly self-motivated person with good communication and interpersonal skills.

TECHNICAL SKILLS

Predictive Modeling Technique: Linear Regression, Logistic Regression, Segmentation and clustering, Decision Trees, Random Forest, Support Vector Machine and K Nearest Neighbor Classification, Feature Engineering

Statistical Methods: Regression models, hypothesis testing and confidence intervals, TEMPprincipal component analysis and dimensionality reduction.

Data Management skills: Reading Raw data files, Merging, Sorting, visualizing, Handling missing values, Handling programming errors, Appending of various datasets

Analytics: R (Rstudio, Rcmdr, Rmarkdown, rpython, Shiny, RGoogle Analytics, rattle, Rapidminer, nlme, gbm, Forecast, AST, ggmap, ggplot2, lattice, reshape, reshape2, data.table, dplyr, RODBC, foreign, sqldf, DBI, RMySQL, xlsx, WriteXLS), SAS, Python (Numpy, Pandas, Scipy, Scikit, statsmodels, Blaze, scrapy,Machine learning-Theano, NLKT and Visualization: Matplotlib,seaborn,scikit-image), Big Data(HDFS,Pig,Hive,HBase,Sqoop, Spark), Excel

Cloud Technologies: AWS, Azure

ETL Tools: Informatica Power Center (Informatica Designer, Workflow Manager, Work flow Monitor), SSIS

Business Intelligence (BI): MicroStrategy, SSRS

Algorithms: Random Forests, XGBoost, Clustering, Association Rules, Logistic Regression

Software Development: Distributed Systems, REST APIs, Streaming

Database: Hive, Postgres, Access, Oracle, SQL Server, NoSQL(Mongo DB)

Visualization: Tableau, Matplotlib, Matplotlib, Seaborn, Plotly, Cufflinks and Geographical Plotting

Machine Learning: scikit-learn, TensorFlow, MLLib, H2O.ai, Ker

PROFESSIONAL EXPERIENCE

Confidential - Atlanta, GA

Data Scientist/Data Analyst

Responsibilities:

  • Perform Data Profiling to learn about user behavior and merge data from multiple data sources.
  • Participate in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Designing and developing various machine learning frameworks using Python, R, and Matlab.
  • Integrate R into Micro Strategy to expose metrics determined by more sophisticated and detailed models TEMPthan natively available in the tool.
  • Develop documents and dashboards of predictions in Microstrategy and present it to the Business Intelligence team.
  • Collaborate with data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
  • Collect unstructured data from MongoDB 3.3 and completed data aggregation.
  • Perform data integrity checks, data cleaning, exploratory analysis and feature engineer using R 3.4.0.
  • Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Develop personalized products recommendation with Machine Learning algorithms, including Collaborative filtering and Gradient Boosting Tree, to better meet the needs of existing customers and acquire new customers.
  • Work on outliers identification with box-plot, K-means clustering using Pandas, Numpy.
  • Participate in features engineering such as feature intersection generating, feature normalize and Label encoding with Scikit-learn preprocessing.
  • Use Python 3.0 (numpy, scipy, pandas, scikit-learn, seaborn, NLTK) and Spark 1.6 / 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Analyze Data and Performed Data Preparation by applying historical model on the data set in AZURE ML.
  • Coordinate the execution of A/B tests to measure the TEMPeffectiveness of personalized recommendation system.
  • Perform data visualization with Tableau 10 and generate dashboards to present the findings.
  • Recommend and evaluate marketing approaches based on quality analytics of customer consuming behavior.
  • Determine customer satisfaction and halp enhance customer experience using NLP.
  • Work on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms.
  • Use Git 2.6 to apply version control. Tracked changes in files and coordinated work on the files among multiple team members.

Environment: R, Matlab, MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning (Gradient Boosting Tree, NLP), Python (numpy, scipy, pandas, scikit-learn, NLTK), Spark (MLlib, PySpark), Tableau, MicroStrategy, Git

Confidential - Minneapolis, MN

Data Scientist/Data Analyst

Responsibilities:

  • Implemented Data Exploration to analyze patterns and to select features using Python SciPy.
  • Built Factor Analysis and Cluster Analysis models using Python SciPy to classify customers into different Confidential groups.
  • Built predictive models including Support Vector Machine, Random Forests and Naïve Bayes Classifier using Python Scikit-Learn to predict the personalized product choice for each client.
  • Using R’s dplyr and ggplot2 packages, performed an extensive graphical visualization of overall data, including customized graphical representation of revenue reports, specific item sales statistics and visualization.
  • Designed and implemented cross-validation and statistical tests including Hypothetical Testing, ANOVA, Auto-correlation to verify the models’ significance.
  • Designed an A/B experiment for testing the business performance of the new recommendation system.
  • Supported MapReduce Programs running on the cluster.
  • Evaluated business requirements and prepared detailed specifications dat follow project guidelines required to develop written programs.
  • Configured Hadoop cluster with Namenode and slaves and formatted HDFS.
  • Used Oozie workflow engine to run multiple Hive and Pig jobs.
  • Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Hadoop MapReduce and HDFS.
  • Performed Data Enrichment jobs to deal missing value, to normalize data, and to select features by using HiveQL.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Developed Hive queries for analysis, and exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Worked on improving performance of existing Pig and Hive Queries.
  • Created reports and dashboards, by using D3.js and Tableau 9.x, to explain and communicate data insights, significant features, models scores and performance of new recommendation system to both technical and business teams.
  • Utilize SQL, Excel and several Marketing/Web Analytics tools (Google Analytics, Bing Ads, AdWords, AdSense, Criteo, Smartly, SurveyMonkey, and Mailchimp) in order to complete business & marketing analysis and assessment.
  • Used Git 2.x for version control with Data Engineer team and Data Scientists colleagues.
  • Used Agile methodology and SCRUM process for project developing.

Environment: HDFS, Hive, Scoop, Pig, Oozie, Amazon Web Services (AWS), Python 3.x (SciPy, Scikit-Learn), Tableau 9.x, D3.js, SVM, Random Forests, Naïve Bayes Classifier, A/B experiment, Git 2.x, Agile/SCRUM.

Confidential

Data Analyst (Excel/SSIS/SSRS)

Responsibilities:

  • Interviewed Business Users to gather Requirements and analyzed the feasibility of their needs by coordinating with the project manager and technical lead.
  • Developed a Daily Cash Management Database in Access to support Annual budgets preparation, monthly forecasts, and ad-hoc financial analysis.
  • Developed forms, reports, queries, macros, VBA code and tables to automate data importation and exportation to a system created in MS Access.
  • Generated Tools in MS Excel using VBA Program for users to extract the data automatically from Teradata database and SQL Server.
  • Involved heavily in writing complex SQL queries to pull the required information from Database using Teradata SQL Assistance.
  • Created Tableau Dashboard for the top key Performance Indicators for the top management by connecting various data sources like Excel, Flat files and SQL Database.
  • Created Heat Map in Tableau showing current service subscribers by color dat were broken into regions allowing business user to understand where we have most users vs. least users and High valuable users Vs less valuable users.
  • Created Tableau scorecards, dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts using show me functionality.
  • Designed and developed various analytical reports from multiple data sources by blending data on a single worksheet in Tableau Desktop.
  • Manipulating, cleansing & processing data using Excel, Access, SQL and Teradata.
  • Developed basic to complex SQL queries to research, analyze, troubleshoot data and to create business reports.
  • Conduct cash flow analysis to prepare a summarized reports relating to the cash in-flow and cash out-flow.
  • Track and analyze on a per-project basis all production funding related to original programming in order to provide annual budget and quarterly forecasts.
  • Created new database objects like tables, procedures, Functions, Indexes and Views.
  • Designed Constraints, rules and set Primary, Foreign, Unique and default key and hierarchical database.
  • Developed stored procedures in SQL Server to standardize DML transactions such as insert, update and delete from the database.
  • Created SSIS package to load data from Flat files, Excel and Access to SQL server using connection manager.
  • Created data transformation task such as BULK INSERT to import data.
  • Created SSRS report in BI studio, prepared prompt generated/ parameterized report using SSRS 2008.
  • Created reports from OLAP, sub reports, bar charts and matrix reports using SSRS.
  • Worked on generating various dashboards in Tableau Server using different data sources such as Teradata, Oracle, Microsoft SQL Server and Microsoft Analysis Services.

Environment: MS Excel 2010/2013, MS Access 2010/2013, SQL Server, SSAS, SSIS, SSRS, SharePoint 2010/2013, Tableau, Teradata SQL Assistant

Confidential

Data Analyst (SAS)

Responsibilities:

  • Communicated and coordinated with other departments to collection business requirement.
  • Tackled highly imbalanced Fraud dataset using undersampling with ensemble methods, oversampling and cost sensitive algorithms.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Implemented machine learning model (logistic regression, XGboost) with Python Scikit- learn.
  • Optimized algorithm with stochastic gradient descent algorithm Fine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization.
  • Developed a technical brief based on the business brief. This contains detailed steps and stages of developing and delivering the project including timelines.
  • After sign-off from the client on technical brief, started developing the SAS codes.
  • Wrote the data validation SAS codes with the halp of Univariate, Frequency procedures.
  • Summarising the data at customer level by joining the datasets of customer transaction, dimension and from 3rd party sources.
  • Separately calculated the KPIs for Confidential and Mass campaigns at pre-promo-post periods with respective to their transactions, spend and visits.
  • Also measured the KPIs at MoM (Month on Month), QoQ (Quarter on Quarter) and YoY (Year on Year) with respect to pre-promo-post.
  • Measured the ROI based on the differences pre-promo-post KPIs.
  • Extensively used SAS procedures like IMPORT, EXPORT, SORT, FREQ, MEANS, FORMAT, APPEND, UNIVARIATE, DATASETS and REPORT.
  • Standardised the data with the halp of PROC STANDARD.
  • Implemented cluster analysis (PROC CLUSTER and PROC FASTCLUS) iteratively.
  • Performed Data audit, QA of SAS code/projects and sense check of results.

Environment: SAS Enterprise Guide, SAS/MACROS, SAS/ACCESS, SAS/STAT, SAS/SQL, ORACLE, MS-OFFICE, Python (scikit-learn, pandas, Numpy), Machine Learning (logistic regression, XGboost), Gradient Descent algorithm, Bayesian optimization, Tableau

Confidential

Jr. Data Analyst (Intern)

Responsibilities:

  • Involved in designing conceptual, logical and physical models using Erwin and build datamarts using hybrid Inmon and Kimball DW methodologies.
  • Worked closely with Business team, Data Governance team, SMEs, and Vendors to define data requirements.
  • Used Microsoft Excel for formatting data as a table, visualization and analyzing data by using certain methods like Conditional Formatting, removing Duplicates, Pivot and Unpivot tables, create Charts, sort and filter Data Set.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Designed the prototype of the Data mart and documented possible outcome from it for end-user.
  • Involved in business process modeling using UML.
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
  • Design, coding, unit testing of ETL package source marts and subject marts using Informatica ETL processes for Oracle database.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional datamodels using Star and Snowflake Schemas.

Environment: Informatica, ODS, OLTP, Oracle 10g, OLAP, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro

We'd love your feedback!