Data Scientist Resume
Ridgefield, NJ
PROFILE:
- Data scientist with 7+ years of professional experience with solid capabilities in exploratory data analysis, statistical analysis using R, Python, SQL etc. Expertise in designing predictive supervised and unsupervised models using machine
- Ability to answer the organization's most pressing questions utilizing data science principles.
- Understanding of distribution of variables with respect to anomalies and outliers and finding correlations.
- Machine learning skills to analyze real - world datasets, validate the findings using testing, feature selection, tuning of the algorithm for maximum performance. Coding skills to design optimized algorithms, web development skills to design effective GUIs.
- Have delivered solutions for healthcare, banking, education and IT domains successfully.
- Effective team player with great oral and written communication skills.
- Evaluate business data to identify their different characteristics and locate their similarities to find ways they can be integrated for better results.
- Interacted with the business and gained good knowledge of business process and systems.
- Updated the data dictionary by creating 230 new terms and organizing existing terms. Created Tableau visuals.
- Worked closely with the Chief Data Officer to design strategies for Data Democratization 2020 Plan.
- Earned ¾ ratings on performance throughout the internship and our intern team won the best project award.
- Participated in the code reviews with peers and managers to ensure that each increment adheres to original vision as described in the user story and all standard resource libraries and architecture patterns as appropriate. Experience with statistical analysis packages (Python, R) and A/B Testing; develop, validate, evaluate, deploy, and optimize modelling techniques/algorithms that support many aspects of the business.
- Adept and deep understanding of statistical modeling, multivariate analysis, model testing, problem analysis, model comparison and validation. Experienced in developing conceptual, logical and physical data models using UML and IE notations for Online Transactional processing (OLTP) and Online
- Analytical Processing (OLAP) systems using Erwin, ER Studio, Enterprise Architect and Power Designer.
- Experience in Statistical Modelling, Data Mining (Sequential pattern mining), Data Visualization (Out Flow diagrams, Sankey diagrams, Trend Analysis), Machine learning using R.
SKILLS:
- R
- Python
- Tableau
- Jupyter Notebook
- MS Excel
- C#
- JavaScript
- Java
- Amazon Web Services
- SQL
- SPSS
- Predictive Analytics
- Machine Learning
- Algorithms
- Hadoop MapReduce
- SAP BI
- Apache Spark
- Deep Learning
- Apache Kafka
- MongoDB
- Statistical Analysis System(SAS)
- Hadoop FileSystem
- TensorFlow Library
EMPLOYMENT HISTORY:
Data Scientist
Confidential, Ridgefield, NJ
Responsibilities:
- Acted as single point of contact between organization management and appropriate client or groups from solution planning, sizing, to fulfillment. Used data mining technique (Associate Rule Mining) to predict satisfactory demanding products for customers.
- Collected data from various databases and cleaning data for statistical analysis and model.
- Integrated Kafka with Spark streaming for high speed data processing.
- In depth knowledge of various neural nets like Convolutional Neural Networks, and TensorFlow to handle very large data sets and build predictive model on them.
- Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, SQL and
- Hadoop MapReduce respectively.
- Worked closely with cross functional teams to encourage statistical best practices with respect to experimental design and data analysis.
- Developed scripts and ad-hoc tests to ascertain data validity and correct attribute calculation.
- Performed statistical and predictive analysis on corporate market data to identify trends, buy-sell opportunities.
- Optimized data access by efficient SQL coding for high throughput and low latency.
- Experience working with Amazon Web Services to store and retrieve geospatial data from S3.
- Analyzed high volume, high dimensional client and survey data from different sources using SAS and R.
- Performed Data Mining, Data Analysis, Statistical Modelling (i.e Regression) to find the information regarding sales industry and Predictive Analytics.
- Used clustering and K-NN algorithms to categorize business expenses, improving budget. Calculated WCSS (Within Cluster Sum Squares) to calculate optimal number of clusters for predictive classification model.
- Closely monitored the operating and financial results against plans and budgets.
- Used R to develop regression modeling for data analysis.
- Increased pace & confidence of learning algorithm by combining statistical methods; provided expertise and assistance in integrating advanced analytics into on-going business processes.
- Parsed data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format.
- Interpreted complex simulation data using statistical methods.
- Designed various reports using pivot - tables, and pivot chart charts like bar, pie, line in SPSS.
- Built machine learning models on independent AWS EC2 server to enhance data quality.
- Established strategic goals by gathering pertinent business, financial, service, and operations information.
- Generated ad-hoc or management specific reports using SSRS and Excel.
Environment: Microsoft SQL Server, IBM SPSS, MS Excel, Python, Amazon Web Services Hadoop MapReduce, RStudio, Statistical Analytics, Predictive Analytics, Machine Learning, Tableau, SSRS
Data Scientist
Confidential, Atlanta, GA
Responsibilities:
- Derived critical insights from big data and enable strategic insights for better decisions.
- Enabled ultra-personalized offers generation by merging data from a number of datasets such as behavioral targeting, previous purchase patterns, social media, itineraries, location tracking records, and predictive analytics.
- Performed univariate, bivariate and multivariate analysis of approx. 4900 tuples using bar charts, box plots and histograms.
- Participated in features engineering such as feature creating, feature scaling and One-Hot encoding with Scikit-learn.
- Enable Revenue/Yield Management which aims at maximizing the short term expected profitability of assets, modelling and forecasting demand and optimizing prices and product availability.
- Applied K-Means Clustering as exploratory data analysis tools to develop geo-demographic customer segmentation models.
- Developed and deployed different Machine Learning algorithm to predict customer insight, target marketing, risk management (e.g., predicting equipment failure), and revenue management.
- Used spatial analysis to identify potential customers and discover additional factors of current customers for predictive modelling purposes. Implement multi-criterion decision making logic.
- Implemented Natural Language Processing to quickly route customers and customer service agents to the information they need.
- Conducted sentiment analysis by extracting unstructured structured streaming data from different social media platforms like Twitter, Facebook and LinkedIn etc.
- Developed Text Mining applications for Opinion Mining and Document Retrieval.
- Familiarized with predictive models using ensemble methods like bagging, boosting and Random Forest to improve the efficiency of the predictive model.
- Using the longitude and latitude coordinates of customer buildings I was able to identify and map by using Google maps, buildings that could be potential future customers.
- Hands on experience with Spark Machine Learning techniques such as classification, regression, clustering, collaborative filtering, dimensionality reduction.
- Designed & developed customized interactive reports and dashboards using Tableau, Matplotlib and ggplot.
- Demonstrated ability to comprehend complex models in SAS and porting effectively into a different scripting language.
- Saved thousands of dollars in AWS big data cloud infrastructure by recognizing the proper instance type of RedShift and EMR.
- Communicated results and educate others through presentation of insightful visualizations.
Environment: Python, R, Tableau, Spark Mlibs, Natural Language Processing, Amazon Web Services, Predictive Analytics, K-Means Clustering, Statistical Analysis System.
Data Analyst
Confidential, Chicago, IL
Responsibilities:
- Created actions, action filters, and parameters; calculated sets for preparing dashboards & worksheets.
- Communicated results and educate others through presentation of insightful visualizations.
- Interacted with the end-user (client) to gather business requirements.
- Converted and loaded data from flat files to temporary tables in oracle database using SQL Loader.
- Extensively used PL/SQL in writing database packages, stored procedures, functions and triggers in Oracle 10g.
- Developed SQL scripts involving complex joins for reporting purposes.
- Fine Tuned (performance tuning) SQL queries and PL/SQL blocks for the maximum efficiency and fast response using Oracle Hints, Explain plans.
- Used Teradata as a source and a target for few mappings.
- Developed Functional and Regression Testing scenarios based on XML and XSD schema validations.
- Loaded data from MS Access database to SQL Server 2005 using SSIS (creating staging tables and then loading the data).
- Developed various SQL scripts and anonymous blocks to load data SQL Server 2005.
- Created procedures, functions and views in SQL Server 2005.
- Developed ad hoc reports using Crystal reports for performance analysis by business users.
- Exported reports into various formats like XML, PDF, HTML, and EXCEL using Crystal Reports XI.
- Involved extensively in unit testing, integration testing, system testing and User Acceptance Testing.
- Responsible for the reports deployment through SSRS on the Reports Server.
- Returned the stored procedure and the UDF for the reports.
- Participated in weekly end user meetings to discuss data quality, performance issues. Ways to improve data accuracy and new requirements, etc.
Environment: Agile, Oracle 10g, PL/SQL, TOAD 9.5, DB2, Crystal reports, Teradata, SQL Server 2005, SSIS, SSRS.
Data Analyst
Confidential
Responsibilities:
- Work with users to identify the most appropriate source of record and profile the data required for sales and service.
- Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Involved in defining the business/transformation rules applied for sales and service data.
- Define the list codes and code conversions between the source systems and the data mart.
- Worked with internal architects and, assisting in the development of current and target state data architectures
- Involved in defining the source to target data mappings, business rules, business and data definitions.
- Worked with Basic Teradata Query Tool to submit SQL statements, import and export data, and generate reports in Teradata.
- Responsible for defining the key identifiers for each mapping/interface. Responsible for defining the functional requirement documents for each source to target interface.
- Document, clarify, and communicate requests for change requests with the requester and coordinate with the development and testing team.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
Environment: Teradata, MS Excel, MS Access, Oracle 10g, UNIX, Windows XP, SQL, PL/SQL.
Data Analyst
Confidential
Responsibilities:
- Involved in building, debugging and running forms.
- Involved in Data profiling, cleansing, Data loading and Extracting functions using SQL*Loader.
- Designed and developed all the tables, views for the system in Oracle.
- Created SSIS packages to load the data from Text File to staging server and then from staging server to Data warehouse.
- ETL implementation using SQL Server Integration Services (SSIS), Applying some business logic and data cleaning in staging server to maintain child and parent relationship.
- Deliver reports and ad-hoc analysis focused in the area of client behavior and profiling using, SQL and Excel.
- Worked on Control flow tasks such as Execute SQL task, File System Task, Dataflow Task and used different data sources and destination with derived column, lookup transformation within Dataflow Task.
- Designed SSIS packages to extract data from different sources like SQL server 2008, MS Excel, MS Access, transform and then load into Dimension and Fact tables in Data Warehouse using SSIS.
- Developed SSIS packages using for each loop in Control Flow to process all excel files within folder, File System Task to move file into Archive after processing and Execute SQL task to insert transaction log data into the SQL table.
Environment: MS Excel, MS Access, UNIX, Windows XP, SQL, PL/SQL, SSIS & SSRS,Oracle 10g