We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

New, YorK

SUMMARY:

  • Around 7 years of experiences as a Data Scientist and Data Analyst, including experiences on Statistical Analysis, Data Mining and Machine Learning Skills using R Studio, Python and SQL
  • Worked with Business Intelligence tools for dashboards, e.g. with Tableau, Qlik tools and Microsoft Power BI.
  • Worked with statistical models, including but not limited to, Clustering, Regression (linear/logistic), Hypothesis testing, Decision trees, Random Forests, Decision Trees, K - Means Clustering Association Rules and others
  • Created data models both for big data and transactional DataBases. Architected data lakes and data warehouses
  • Worked closely with the clients to identify business requirements and technical requirement.
  • Assisted in planning, designing and implementing of AWS (S3)Cloud in Windows and Linux Environments
  • Experienced in Agile Software Development Process. Used Tracking tools like JIRA and RTC
  • Worked with Big Data toolkits like SparkML. Used R, Python (Anaconda) SPSS, Google Analytics and Excel.
  • Used Hive and Python on Spark to form tables and code Machine Learning models. Programmed in C/C++
  • Wrote SQL queries and R codes to perform Extract, Transform and Load (ETL)
  • Worked on Python Packages and libraries such asnumpy, scipy, Pandas, matplotlib, PIL.
  • Worked with SQL Server and Oracle databases. Performed data quality assessments.
  • Worked both with structured and unstructured data sets.
  • Strong understanding of complex business challenges, experienced of designing scientific solutions, manipulating large data sets, using cutting edge machine learning and statistical modeling techniques
  • Experience working with various kinds of data files such as Images (jpeg, jpg, png) and audio (.mp4, .wav).
  • Successfully collaborated with cross functional, cross-regional (virtual) and cross-cultural teams
  • A good team player, listener, good at following instruction and at the same time an independent thinker.
  • Knowledge on Microsoft Azure, HBase and MondoDB. Proficient in MS office

TECHNICAL SKILLS:

Computer Skills: R, Python, Java, SQL, SAS Base, basic proficiency in Scala, Stata, Matlab, Hadoop, Mapreduce, C++, C

Tools: MS Azure ML, Cloudera, Spark, SAS EM, Eclipse, Erwin, IPython, SQL Server, Spring Framework, MySQL, Oracle, RedShift, Tableau, MS Excel, MS Powerpoint, Qlikview, SAP, Microsoft Power BI and Business Objects

PROFESSIONAL EXPERIENCE:

Confidential, New York

Data Scientist

Responsibilities:

  • Worked independently and collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results (In Agile Environment)
  • Formulated predictive models to forecast product category wise order volumes, season wise color and style choices so that departmental buyers can make educated and data driven decisions
  • Developed and implemented customized Data Quality (DQ) Matrices to improve data integrity of both raw and aggregated data
  • Worked with Different data types in R such as vectors, lists, matrices, arrays and dataframes
  • Read and wrote data from and to various .csv, .xml, .json files both in R-studio and Python IDE (Anaconda).
  • Applied various machine learning algorithms and statistical modeling techniques like decision trees, Naïve Bayes, Principal Component Analysis, regression models, C 4.5, Artificial Neural Network, clustering, SVM to identify Volume using scikit-learn package in R and Python.
  • Formulated customized cost estimation model using AROMA, Monte Carlo Simulation and Time Series Analysis to predict sales price, purchasing cost, cost of making (CM) and sales
  • Used Pentaho to perform ETL and created star schema and snowflakes schema for new data sets to map in the existing transactional databases (Oracle and Amazon Redshift)
  • Designed the Enterprise Conceptual, Logical, and Physical Data Model for ‘Bulk Data Storage System’ using Embarcadero ER Studio, the data models were designed in 3NF. Used Hive on Spark to create on the fly data modeling and data warehousing
  • Used Tableau and Business Object (SAP)for Business Intelligence tasks (data cleaning, sanitizing, analyzing and creating dashboards) and presented before the clients at Confidential
  • Used different types of charts, including but not limited to pie chart, bar diagram, square diagram, heat maps and others to create dashboard to assist the management to understand and decide local and international distribution on vendors, manufacturing units, warehouses, cost effective logistics and others
  • Created roles for EC2, S3 and EBS resources to communicate within the team using IAM
  • Built collaborative relationships with cross functional team members
  • Used JIRA as tracking tool

Confidential, Virginia

Data Scientist (Consultant)

Responsibilities:

  • Architected Data Quality Matrix (identified and mapped dimensions, sub-dimensions and criteria) and implemented Quality process for Confidential Data framework to formulate well integrated data
  • Developed an easily understandable scoring system for data quality assessments representation
  • Developed search algorithms (Brute-force search, Fibonacci Search technique, binary search methodology, Regression analysis, Decision dress (SVM) and others to search imports of banned items and to detect existing and prospective importers
  • Expanded footprints with the client specifically in Machine Learning, Predictive Modelling, Advanced Analytics and Systems integration, especially for APHIS and IPHIS projects.
  • Coordinated with the Enterprise wide Data Correlation working group by gathering, designing and testing activities for data correlation matching algorithm. Additional responsibilities included leading working sessions across the user community and briefing the development contractor on the requirements and design.
  • Used Business Intelligence and Data Visualization tools: Tableau, Microstrategy and Qlik View
  • Participated in writing of Business Plan to create and expand a range of Big Data, Data Fusion, and Information Management & Data Analytics Capabilities across the firm.
  • This business plan detailed new technology capabilities such as Extraction, Data Ingestion, Entity Resolution, Geospatial Data Management, specialized Data Warehouse appliances’ such as Cloudera’s Hadoop
  • Built advanced statistical models using Bayesian learning techniques, pattern recognition and outlier detection algorithms, and predictive modeling methods including clustering, decision trees, GMM, regression analysis, Fuzzy C-means and K-Nearest Neighbors using advanced statistical tools like R and SAS
  • Done ETL on SQL and NoSQL databases including Oracle, SQL Server and Hadoop based DC
  • Used Pig for data modeling and warehousing and used Sqoop and Flume for streaming data in to Accumulo
  • Assisted in Designing Earned Value Management (EVM) measurement process to complete the Level of Effort (LOE) and Work Breakdown Structure (WBS) for project planning purposes and to track the EV against LOE
  • Worked with large and complex databases containing billions of records
  • Assisted in architecture of Master Data Management framework.
  • Drafted and edited artifacts like Data Tagging Workbook, Schema, ERD, Data Dictionary and others
  • Used Jupyter Notebook on top of Spark Shell environment for Python Coding
  • Read data from various files including .html, .csv,.sas7bdat file etc using SAS/R/Python.
  • Used RTC as tracking tool

Confidential, New York

Senior Data Analyst

Responsibilities:

  • Interpreted and translated Business and System Requirements and interacted with users, product owners and developers.
  • Administered business intelligence systems and done business data analysis, visualization and reporting.
  • Performed End-to-end financial data flow testing and performed test data collection, integration and QA.
  • Created predictive models to identify credit card frauds and target customer groups based on Regression, Logit Regression, Decision Tree, BootStrap, ANN and others.
  • Formulated model through residual analysis, R-square value analysis, normal distribution and so on
  • Collected data from various sources, done data cleaning and created synthetic data to find missing values
  • Closely communicated with the client (business team/ Product Owner) to understand business requirements and to transfer them as test cases and test steps to implement and facilitate Test Driven Development (TDD)
  • Analyzed business flow of the application and used MS Project and JIRA as tracking tools
  • Wrote SQL (2008) queries for RDBMS (Oracle 10g and SQL) to update tables, pull data and perform analysis.
  • Used Informatica to perform ETL processing and Interfaced with data systems to perform focused reviews.
  • Done API testing with XML scripts, used SOAP UI and REST Client. Performed ad-hoc querieswith PL SQL
  • Collaborated with Project Manager to determine data driven reporting needs and tracking measures
  • Done content management by updating contents, comparing versions, merging files, editing change requests, generating reports & charts, creating & organizing requirements using Pulse and knowledge link
  • Created Traceability Matrix for Business requirements and test cases
  • Used Cognos 8BI and Micro Strategy to analyze and present data analytics and predictive models
  • Used Fusion Charts to and Spreadsheet for data visualization before the business entity.

We'd love your feedback!