We provide IT Staff Augmentation Services!

Data Scientist Resume

SUMMARY

  • Over 8+ years of experience in Data Analysis, Decision Trees, Random Forest, Data Profiling, Data Integration, Data governance, Migration and Metadata Management, Master Data Management and Configuration Management.
  • Experience in various phases of the Software Development lifecycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirement specifications, functional specifications, DataValidation, TestPlans, Source to Target mappings, SQL Joins, and DataCleansing.
  • Extensive experience in Text Analytics, developing different Statistical MachineLearning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Had executive experience and performing IT roles for various industry leaders. Acquired a deep range of skills in:
  • Proficient in data mining tools like R, SAS, Python, SQL, Excel, eco­systems Staff leadership and development
  • Experienced with MachineLearning Algorithm such as Logistic Regression, KNN, SVM, RandomForest, Neural Network, Linear Regression, Lasso Regression and K - Means.
  • Experienced working with datamodelling tools like Erwin, Power Designer and ER Studio. Expertise in synthesizing Machine Learning, Predictive Analytics and Big data technologies into integrated solutions.
  • Creating from scratch MachineLearning and NLP solutions for Big Data on top of Spark using Scala. Experienced in developing Entity-Relationship diagrams and modelling Transactional Databases and Data ware house using tools like ERWIN, ER/Studio and PowerDesigner and experienced with modelling using ERWIN in both forward and reverse engineering cases.
  • Skilled in Advanced Regression Modelling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts
  • Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing.
  • Experience with Data Analytics, Data Reporting, Ad­hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Expert in data flow between primaryDB and various reporting tools, Expert in finding Trends and Patterns within Datasets and providing recommendations accordingly.
  • Ability to use dimensionality reduction techniques and regularization techniques.
  • Expertise in Data Reporting, Ad­hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Proficient in data mining tools like R, SAS, Python, SQL, Excel, Big Data Hadoop eco­systems.
  • Independently handle Hadoop administration in local and on cloud in Linux environment.
  • Extracting and modelling datasets from verity of data sources like Teradata and Snowflakes for ad­hoc analysis and have fair understanding of AGILE methodology and practice.
  • Working knowledge on Application design, architecture and development.
  • Experienced in complete SDLC and STLC with end-user interaction for functional specification, system analysis, and unit regression testing; participated in system integration testing
  • Experienced in working in a team environment to deliver on demand service; ability to deliver appropriate quality solutions under pressure; pro-active and strong analytical problem-solving skills
  • Participated in portfolio meetings; experienced in preparing hi-level design documents, low-level design documents, and detailed technical design documents using case scenarios.

TECHNICAL SKILLS

Data Modelling Tools: Erwin R, Rational Rose, ER/Studio, Ms Visio, Sap Power Designer.

Machine Learning: Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbours (K-NN).

R Package: dplyr, SQLdf, data table, Random Forest, GBM, caret, elastic net and all sort of Machine Learning Packages.

Programming Languages: Java, Oracle Pl/SQL, Python, SQL, T- SQL, Unix Shell Scripting, Bash, Html5.

Scripting Languages: Python (Numpy, Scipy, Pandas, Gensim, Keras), R (Caret, Weka, Ggplot), Xml, Json

Big Data Technologies: Hadoop, Hive, Hdfs, Map Reduce, Pig, Kafka, Spark, Hbase.

Reporting Tools: Crystal Reports Xi, Business Intelligence, Ssrs, Business Objects, Cognos, and Tableau.

ETL: Informatica Power Centre, Ssis.

BI Tools: Tableau, Tableau Server, Tableau Reader, Sap Business Objects, Obiee, Qlik view, Sap Business Intelligence, Amazon Redshift, Or Azure Data Warehouse

Tools: Ms-Office Suite (Word, Excel, Ms Project And Outlook), Spark Mllib, ScalaNlp, Mariadb, Azure, Sas.

Operating Systems: Windows, Unix, Ms Dos, Sun Solaris.

Databases: Oracle, Teradata, DB2 UDB, MS SQL Server, Netezaa, Sybase ASE, Informix, AWS RDS, Cassandra, and Mongo DB, Postgre SQL.

Project Execution Methodologies: Ralph Kimball And Bill Inmon Data Warehousing Methodology, Rational Unified Process (Rup), Rapid Application Development (Rad), Joint Application Development (Jad)

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist

Responsibilities:

  • Involved in extensive hoc reporting, routine operational reporting and data manipulation to produce routine metrics and dashboards for management
  • Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Interacting with other data scientists and architects, custom solutions for data visualization using tools like a tableau and Packages in Python.
  • Involved in running MapReduce jobs for processing millions of records.
  • Written complex SQL queries using joins and OLAP functions like Count, CSUM, and Rank etc.
  • Building, publishing customized interactive reports, report scheduling and dashboards using Tableauserver.
  • Worked on different data formats such as JSON, XML and performed MachineLearning algorithms in Python.
  • Used pandas, numpy, seaborn, matplotlib, scikit-learn, scipy, NLTK in Python for developing various MachineLearning algorithms.
  • Utilized ApacheSpark with Python to develop and execute BigData Analytics and Machine Learning applications, executed MachineLearning use cases under Spark ML and Mllib.
  • Designed and developed NLP models for sentiment analysis.
  • Designed and provisioned the platform architecture to execute Hadoop and MachineLearning use cases under Cloud infrastructure, AWS, EMR, and S3.
  • Worked on MachineLearning on large size data using Spark and MapReduce.
  • Application of various MachineLearning algorithms and statistical modeling like decision trees, regression models, neuralnetworks, SVM, clustering to identify Volume using the scikit-learn package in python, Matlab.
  • Developed in Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
  • Performing statistical data analysis and data visualization using Python.
  • Worked on creating filters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Created data models in Splunk using pivot tables by analyzing vast amount of data and extracting key information to suit various business requirements.
  • Created new scripts for Splunk scripted input for system, collecting CPU and OS data.
  • Implemented data refreshes on TableauServer for biweekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
  • Developed normalized Logical and Physical database models for designing an OLTP application.
  • Knowledgeable in AWS Environment for loading data files from on prim to Redshift cluster.
  • Performed SQL Testing on AWS Redshift databases
  • Developed TeradataSQL scripts using OLAP functions like rank and rank over to improve the query performance while pulling the data from large tables.
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.
  • Designed the DataMarts in dimensional data modelling using star and snowflake schemas.
  • Analyzed Data Set with SAS programming, R and Excel.
  • Publish Interactive dashboards and schedule auto-data refreshes
  • Maintenance of large data sets, combining data from various sources by Excel, Enterprise, and SAS Grid, Access and SQL queries.
  • Created Hive queries that helped market analysts spot emerging trends by comparing incremental data with Teradata reference tables and historical metrics.
  • Design and development of ETL processes using Informatica ETL tools for dimension and fact file creation.
  • Develop and automate solutions for a new billing and membership Enterprise data Warehouse including ETL routines, tables, maps, materialized views, and stored procedures incorporating Informatica and Oracle PL/SQL toolsets.
  • Performed analysis of implementing Spark uses Scala and wrote spark sample programs using PySpark.

Environment: - SQL/Server, Oracle, MS-Office, Teradata, Informatica, ER Studio, XML, R connector, Python, R, Tableau 9.2

Confidential

Data Scientist

Responsibilities:

  • Prepared the workspace for Markdown. Accomplished Data analysis, statistical analysis, generated reports, listings, and graphs.
  • Found outliers, anomalies, trends in any given data sets.
  • Assisted in migrating data, data pump with the Export/Import utility tool.
  • Implemented various Performance tuning techniques at ETL&TeradataBTEQ for efficient development and performance.
  • Used Simple storage services (s3) for snapshot and Configured S3 lifecycle of Applications & Databases logs, including deleting old logs, archiving logs based on retention policy of Apps and Databases.
  • Built models using Statistical techniques like Bayesian HMM and MachineLearning classification models like XG Boost, SVM, and Random Forest.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Created logical data model from the conceptual model and its conversion into the physical database design using ERWIN 9.6.
  • Piping and processing massive data-streams in distributed computing environments such as Hadoop to facilitate analysis (ETL).
  • Predominant practice of PythonMat plot lib package and Tableau to visualize and graphically analyses the data. Data pre-processing, splitting the identified data set into Training set and Test set using other libraries in python.
  • Analysed data using SQL, R, Java, Scala, Python, Apache Spark and presented analytical reports to management and technical teams.
  • Superintended usage of Python NumPy, SciPy, Pandas, Mat plot, Stats packages to perform dataset manipulation, data mapping, data cleansing and feature engineering. Built and analysed datasets using R and Python.
  • Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R and Python.
  • Multi-layers Neural Networks built in PythonScikit-learn, Theano, Tensor Flow and keras packages to implement machine learning models and export them into protobuf and performed integration job with client's application.
  • Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2 CLOUD PLATFORMS) and Django platform for the company's core business.
  • Pre-researches on bigdata techniques such as Spark, Cassandra, No SQL databases and assess the advantages and disadvantages of them for a particular goal of the project.
  • Prevailingly driven open source tools Spyder(Python) and R Studio(R) for statistical analysis and contriving the machine learning. Involved in defining the Source to Target data mappings, Business rules, and data definitions.
  • Designed and Developed Oracle PL/SQL and ShellScripts, Data Import/Export, Data Conversions and DataCleansing.
  • Responsible for the development of target data architecture, design principles, quality control, and data standards for the organization.
  • Worked with DBA to create Best-Fit Physical Data Model from the Logical DataModel using Forward Engineering in Erwin.
  • Produced quality reports for management for decision making.
  • Participated in all phases of research including data collection, data cleaning, data mining, developing models and visualizations.
  • Redefined many attributes and relationships and cleansed unwanted tables/columns using SQL queries.
  • Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.

Environment: - ETL, Teradata BTEQ, S3, XGBOOST, SVM, Random Forest, AWS, Oracle PL/SQL, Erwin 9.6, DBA, SQL, Shell Script, HMM, Spark SQL, PySpark.

Confidential - Princeton, NJ

Data Scientist

Responsibilities:

  • Used Tableau to automatically generate reports. Worked with partially adjudicated insurance flat files, internal records, 3rdpartydatasources, JSON, XML and more.
  • Experienced in building models by using Spark (PySpark, Spark SQL, Spark MLLib, and Spark ML).
  • Experienced in CloudServices such as AWS EC2, EMR, RDS, S3 to assist with big data tools, solve the data storage issue and work on deployment solution.
  • Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, space time.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and Mongo DB.
  • Knowledge extraction from Notes using NLP (Python, NLTK, MLLIB, PySpark,)
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Built and optimized data mining pipelines of NLP, and text analytic to extract information.
  • Coded R functions to interface with Caffe Deep LearningFramework
  • Working in AmazonWebServices cloud computing environment
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.
  • Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Designed data models and data flow diagrams using Erwinand MS Visio.
  • Performed data cleaning and imputation of missing values using R.
  • Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for Forward/Reverse Engineered Databases.
  • Built and optimized data mining pipelines of NLP, and text analytic to extract information.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
  • Creating customized business reports and sharing insights to the management.
  • Take up ad-hoc requests based on different departments and locations.
  • Used Hive to store the data and perform data cleaning steps for huge datasets.
  • Created dash boards and visualization on regular basis using ggplot2 and Tableau.

Environment: - R 3.0, Erwin 9.5, Tableau 8.0, MDM, Qlikview, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), Map Reduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

Data Scientist

Responsibilities:

  • Worked with large amounts of structured and unstructured data.
  • Knowledge in MachineLearning concepts (Generalized Linear models, Regularization, Random Forest, Time Series models, etc.)
  • Worked in Business Intelligence tools and visualization tools such as BusinessObjects, Tableau, Chart IO, etc.
  • Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, and AJAX.
  • Configured the project on Web Sphere 6.1 application servers
  • Implemented the online application by using Core Java, JDBC, JSP, Servlets and EJB 1.1, Web Services, SOAP, WSDL.
  • Handled end-to-end project from data discovery to model deployment.
  • Monitoring the automated loading processes.
  • Communicated with other Health Care info by using Web Services with the help of SOAP, WSDLJAX-RPC
  • Used Singleton, factory design pattern, DAO Design Patterns based on the application requirements
  • Used SAX and DOM parsers to parse the raw XML documents
  • Used RAD as Development IDE for web applications.
  • Used Log4J logging framework to write Log messages with various levels.
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Implemented MicrosoftVisio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Doing functional and technical reviews
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Guaranteeing quality in the deliverables.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support
  • Created test plan documents for all back-end database modules
  • Implemented the project in Linux environment.

Environment: - R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLlib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

Data Analyst

Responsibilities:

  • Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
  • Implemented MicrosoftVisio and RationalRose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Worked with other teams to analyse customers to analyse parameters of marketing.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support
  • Created test plan documents for all back-end database modules
  • Used MSExcel, MSAccess, and SQL to write and run various queries.
  • Used trace ability matrix to trace the requirements of the organization.
  • Recommended structural changes and enhancements to systems and databases.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Maintenance in the testing team for System testing/Integration/UAT
  • Guaranteeing quality in the deliverables.

Environment: - Teradata SQL Assistant, Teradata Loading utilities (Bteq, Fast Load, Multi Load), Python, UNIX, Tableau, MS Excel, MS Power Point, Business Objects, Oracle.

Confidential

Data Analyst

Responsibilities:

  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modelling.
  • Developed large data sets from structured and unstructured data. Perform data mining.
  • Partnered with modellers to develop data frame requirements for projects.
  • Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
  • Tracked various campaigns, generating customer profiling analysis and data manipulation.
  • Provided R/SQLprogramming, with detailed direction, in the execution of data analysis that contributed to the final project deliverables. Responsible for data mining.
  • Analyzed large datasets to answer business questions by generating reports and outcome- driven marketing strategies.
  • Used Python to apply time series models, the fast growth opportunities for our clients
  • Involved in fixing bugs and minor enhancements for the front-end modules.
  • Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support.
  • Involved in loading data from RDBMS and weblogs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to the database the existing system.

Environment: -R3.0, Erwin9.5, Tableau8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata14.1, JSON, Map Reduce, MySQL, Spark, R Studio.

Hire Now