We provide IT Staff Augmentation Services!

Data Scientist/data Analyst Resume

4.00/5 (Submit Your Rating)

Boston, MA

SUMMARY:

  • 6+ years of experience in Machine Learning, Data mining, Data Architecture, Data Modeling, Data Mining, Data Analysis, NLP with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web Scraping, Adept in statistical programming languages like R and Python including Big Data technologies like Hadoop, Hive, HDFS, MapReduce and NoSQL Based Databases.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and data visualization.
  • Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
  • Excellent understanding of Hadoop cluster architecture that include Map Reduce (MRv1), YARN (MRv2), HDFS, Pig, Hive, Impala, HBase, Spark, Sqoop, Flume, Oozie and Zookeeper.
  • Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema and Extended Star.
  • Excellent working experience and knowledge in Hadoop eco-system like HDFS, MapReduce, Hive, Pig, MongoDB, Cassandra, HBase.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool.
  • Expertise in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
  • Experienced in data mining & loading and analyzing unstructured data -XML, JSON, flat file formats into Hadoop.
  • Experienced in using various packages in R and python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Analyze Data and Performed Data Preparation by applying historical model on the data set in AZUREML.
  • Excellent hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSql.
  • Experienced in Teradata RDBMS using Fast load, Fast Export, Multi load, T pump, and Teradata SQL Assistance and BTEQ Teradata utilities.
  • Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions and experience with working in Agile/SCRUM software environments.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.

TECHNICAL SKILLS:

Languages: Java 8, Python, R

Packages: ggplot2, caret, dplyr, Rweka, gmodels, RCurl, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, Seaborn, sciPy, matplot lib, sci-kit-learn, Beautiful Soup, Rpy2.

Web Technologies: HTML, CSS, Javascript, JQuery, Bootstrap, AngularJS

Decision Tree, SVM, KNN, K: Means, EM, Apriori, PageRank, AdaBoost, Deep-Learning

Data Modelling Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra.

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

ETL Tools: Informatica Power Centre, SSIS.

Version Control Tools: SVM, GitHub

Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

BI Tools: Tableau, Tableau Server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Project Management: PMP, Lean Manufacturing, Six Sigma, Agile Methodology, Scrum Master

Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat

Analysis Tools: Python (Pandas, Numpy, scikit-learn, matplotlib), R, SAS, Tableau, Advanced MS Excel, A/B testing

Decision Tree, SVM, KNN, K: Means, EM, Apriori, PageRank, AdaBoost, Deep-Learning

Databases: Oracle, MySQL, Microsoft SQL Server, MS Access

Big Data Technologies: Hadoop, HDFS, MapReduce, Sqoop, Hive, Spark

Cloud: Azure, Amazon Web Services (AWS)

Application & System: Linux

Web Development: HTML, CSS, Javascript, JQuery, Bootstrap, AngularJS

EXPERIENCE:

Confidential

Data Scientist/Data Analyst

Responsibilities:

  • Designs and perform analyses to highlight, address and resolve operational concerns using statistical predictive indicators and visualization reports.
  • Built ways for cross-sell of different products based on the existing demand in a particular market with SQL joins to create a report that helped view all the customers as a single entity and generated graphs and visualizations with Excel.
  • Design an A/B experiment for testing the business performance of the new recommendation system.
  • Create Machine Learning models with Python and scikit-learn which assisted the trading team in their trading strategies
  • Optimize parameters using grid search, Cross-validation and developed a deep learning algorithm using Keras and Feed Forward Networks
  • Use Python Libraries including NumPy, Pandas, Scipy, Sklearn, Matplotlib, Keras and Tensor flows
  • Build Data visualization like Heat maps and Time series plots by using Python libraries Such as Matplotlib and Seaborn.
  • Maintain and enhance existing Algorithmic framework to cope with the ever-changing market dynamics and business requirements and back tested several in-house algorithms by data-driven and statistics driven approach using Time series.
  • Support MapReduce Programs running on the cluster.
  • Evaluate business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Configure Hadoop cluster with Namenode and slaves and formatted HDFS.
  • Use Oozie workflow engine to run multiple Hive and Pig jobs.
  • Participate in Data Acquisition with Data Engineer team to extract historical and real-time data by using Hadoop MapReduce and HDFS.
  • Perform Data Enrichment jobs to deal missing value, to normalize data, and to select features by using HiveQL.
  • Develop multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
  • Work on loading the data from MySQL to HBase where necessary using Sqoop.
  • Develop Hive queries for Analysis across different banners.
  • Extract data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
  • Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Develop Hive queries for analysis, and exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Analyze the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Create HBase tables to store various data formats of data coming from different portfolios.
  • Work on improving performance of existing Pig and Hive Queries.
  • Analyze cross platform data from energy and restaurant companies and worked on creation and automation of daily reports with SQL and dashboards with Tableau including charts, calculated fields and statistical functions.
  • Data mining from SAP, analyzed order completion status to make sure each order be delivered within 48 hours.
  • Analyze daily and monthly order trend and using time series model to predict future forecast for all SKUs.
  • Independently develop python scripts to clean data faster and generate better reports and charts.
  • Extract useful columns and rows from large scales of datasets and cleaned data using Python Pandas library.
  • Combine different datasets, grouped products by station and analyzed the overall allocation.
  • Develop a user friendly GUI to help internal non-technical users to operate the datasets.
  • Develop line balance automation python scripts, and improve the output from 150 orders/h to 260 orders/h.
  • Create custom SQL queries for data analysis and data validation: such as checking duplicates, null values and etc.
  • Design an A/B experiment for testing the business performance of the new recommendation system.
  • Support MapReduce Programs running on the cluster.
  • Evaluate business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Configure Hadoop cluster with Namenode and slaves and formatted HDFS.
  • Use Oozie workflow engine to run multiple Hive and Pig jobs.
  • Participate in Data Acquisition with Data Engineer team to extract historical and real-time data by using Hadoop MapReduce and HDFS.
  • Perform Data Enrichment jobs to deal missing value, to normalize data, and to select features by using HiveQL.
  • Develop multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
  • Launched Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
  • Developed Hive queries for analysis, and exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Create HBase tables to store various data formats of data coming from different portfolios.
  • Work on improving performance of existing Pig and Hive Queries.
  • Swap low run SKUs with high run SKUs based on the trend, and reduced QC reject rate from 1% to 0.1%.
  • Generate monthly/quarterly KPI dashboard using Tableau: Heat maps, Box, Scatter Plots, Pie Charts, Bar Charts and etc.
  • Primary activities include designing technology roadmap for Tableau; product installation and implementation development of insightful dashboards; serving as SME point of contact for Tableau and delivery of solutions adhering to BI industry best practices.
  • Proficient in data integration between different sources to the SQL environment and reporting the same on Tableau environment.
  • Responsible for the design, development and production support of interactive data visualizations used across the project.
  • Administer user, user groups, and scheduled instances for reports in Tableau and documented upgrade plan.
  • Involved in creating Created Tableau dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts etc. using show me functionality. Dashboards and stories as needed using Tableau Desktop and Tableau Server.
  • Create interactive data visualizations in Tableau, using relational and aggregate data sources.
  • Excellent knowledge in RDBMS concepts and constructs along with Database Objects creation such as Tables, User Defined Data Types, Indexes, Stored Procedures, Views, User Defined Functions, Cursors and Triggers etc.

Confidential - Boston, MA

Data Scientist/Data Analyst

Responsibilities:

  • Review suspicious activity and complex fraud cases to help identify and resolve fraud risk trends and issues.
  • Clearly and thoroughly document investigation findings and conclusions.
  • Offline analysis of customer data to tune rules, exposes patterns, research anomalies, reduce false positives, and build executive and project-level reports.
  • Identify meaningful insights from chargeback data. Interpret and communicate findings from analysis to engineers, product and stakeholders.
  • Analyze high-volume data to investigate, identify and report trends linked to fraudulent transactions.
  • Utilize Sqoop to ingest real-time data. Used analytics libraries Sci-Kit Learn, MLLIB and MLxtend.
  • Extensively use Python's multiple data science packages like Pandas, NumPy, matplotlib, Seaborn, SciPy, Scikit-learn and NLTK.
  • Performed Exploratory Data Analysis, trying to find trends and clusters.
  • Built models using techniques like Regression, Tree based ensemble methods, Time Series forecasting, KNN, Clustering and Isolation Forest methods.
  • Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts.
  • Extensively perform large data read/writes to and from csv and excel files using pandas.
  • Tasked with maintaining RDD's using SparkSQL.
  • Communicate and coordinate with other departments to collection business requirement.
  • Tackle highly imbalanced Fraud dataset using undersampling with ensemble methods, oversampling and cost sensitive algorithms.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Implemented machine learning model (logistic regression, XGboost) with Python Scikit- learn.
  • Optimize algorithm with stochastic gradient descent algorithm Fine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization.
  • Develop a technical brief based on the business brief. This contains detailed steps and stages of developing and delivering the project including timelines.
  • After sign-off from the client on technical brief, started developing the SAS codes.
  • Write the data validation SAS codes with the help of Univariate, Frequency procedures.
  • Summarising the data at customer level by joining the datasets of customer transaction, dimension and from 3rd party sources.
  • Separately calculated the KPIs for Target and Mass campaigns at pre-promo-post periods with respective to their transactions, spend and visits.
  • Also measure the KPIs at MoM (Month on Month), QoQ (Quarter on Quarter) and YoY (Year on Year) with respect to pre-promo-post.
  • Measure the ROI based on the differences pre-promo-post KPIs.
  • Extensively use SAS procedures like IMPORT, EXPORT, SORT, FREQ, MEANS, FORMAT, APPEND, UNIVARIATE, DATASETS and REPORT.
  • Standardise the data with the help of PROC STANDARD.
  • Work extensively with data governance team to maintain data models, Metadata and dictionaries.
  • Use Python to preprocess data and attempt to find insights.
  • Iteratively rebuild models dealing with changes in data and refining them over time.
  • Create and publish multiple dashboards and reports using Tableau server.
  • Extensively use SQL queries for legacy data retrieval jobs.
  • Task with migrating the django database from MySQL to PostgreSQL.
  • Gain expertise in Data Visualization using matplotlib, Bokeh and Plotly.
  • Responsible for maintaining and analyzing large datasets used to analyze risk by domain experts.
  • Develop Hive queries that compared new incoming data against historic data. Built tables in Hive to store large volumes of data.
  • Use big data tools Spark (Sparksql, Mllib) to conduct the real time analysis of credit card fraud based on AWS.
  • Perform Data audit, QA of SAS code/projects and sense check of results.

Environment: Spark, Hadoop, AWS, SAS Enterprise Guide, SAS/MACROS, SAS/ACCESS, SAS/STAT, SAS/SQL, ORACLE, MS-OFFICE, Python (scikit-learn, pandas, Numpy), Machine Learning (logistic regression, XGboost), Gradient Descent algorithm, Bayesian optimization, Tableau, securonix.

Confidential - San Jose, CA

Data Analyst/Data Scientist

Responsibilities:

  • Create and enhance Technical Specification Document TSD and Customer Requirement Document CRD through constant interaction with Manager and Tech Lead.
  • Created data model with required fact and dimensions.
  • Used Pandas, NumPy, Scikit-learn in Python for developing various machine learning models such Random forest and step-wise regression.
  • Worked on NLTK library in python for doing sentiment analysis on customer product reviews and other third party websites using web scrapping.
  • Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.
  • Created and Maintained Teradata Databases, Users, Tables, Views, Macros, Triggers and Stored Procedures.
  • Imported and Exported Data from Access Database and building SQL Queries for data manipulation Reporting in Access and VLOOKUP in Excel.
  • Created and updated Crystal Reports to client specifications using SQL.
  • Developed a database in MySQL to manipulate data with monthly updates and created reports in Crystal Reports.
  • Extensively used Visio to create Use Cases Diagrams, Activity Diagrams.
  • Effectively used data blending feature in tableau to connect different databases like Oracle, MS SQL Server.
  • Designed business intelligence dashboard using Tableau Desktop and publishing the same on Tableau server, allowing executive management to view current and past performance sales trends at various geographic locations.
  • Conducted analysis on various tools available at the client to recommend the best possible option for different project, for example Informatica Data Explorer, Informatica Data Quality, Power centre, Micro strategy, etc
  • Prepared BI Interactive Dashboards using calculations, parameters in Tableau.
  • Expertise in connecting to Oracle and SQL databases and troubleshooting.
  • Trend Lines, Statistics, and Log Axes. Groups, hierarchies, sets to create detail level summary report and Dashboard using KPI's.
  • Provide data-driven models and analyze data to drive the business and make key business decisions.
  • Successfully setup global Build-to-Stock (BTS) warehousing and logistic process to meet demand growth.
  • Built a performance management system to integrate with the supplier scorecards.
  • Successfully transferred 19 router and 14 wireless access point products from new product to mass production.
  • Performed daily analysis of on time delivery (OTD), investigating root cause and applying countermeasures.
  • Prepared manufactory and reliability test plans to make products 100% meet Cisco requirement guidelines.
  • Managed several vendors and continuously driving the end-to-end yield to 90% during development stage.
  • Minimized single source material percentage, and maintained 88% multiple source rate in BOM risk report.
  • Led internal and external meetings, analyzed defect data, developed corrective actions for customer complaints.
  • Led numerous cost reduction activities from concept phase to implementation, tracked the results and saved an average $1M per year.
  • Creation of Jobs and scheduling of flows through Management Console.
  • Used excel sheet, flat files, CSV files to generated Tableau Ad-hoc reports.
  • Creation of ER diagrams and database design for the project.

We'd love your feedback!