Data Scientist Resume
PennsylvaniA
PROFESSIONAL SUMMARY:
- Over 6+ years of experience in Machine Learning, Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Writing Business and Functional requirement documents, QA and UAT testing Web and Mobile applications.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, DataMining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Can analyze and extract relevant information from large amounts of data to help automate for self - monitoring, self-diagnosing and optimize key process.
- Has outstanding proficiency in understanding statistical and other tools/languages such as R, Python, MATLAB
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing Data Mining, Artificial Intelligence and reporting solutions.
- Excellent experience in Extract, Transfer and Load process using ETL tools like Data Stage, Informatica, Data Integrator and SSIS for Data migration and Data Warehousing projects.
- Proficient in using visualization tools like Tableau, ggplot2 for creating dashboards.
TECHNICAL SKILLS:
Languages: Python, R, Java, Matlab, SAS
Database technologies: PostgreSQL, SQL Server
Hadoop Technologies: Hive, Pig, HDFS, MapReduce, Spark
NO SQL Databases: Cassandra, MongoDB
Software/Libraries: Keras, Caffe, TensorFlow, OpenCV, Scikit-learn, Pandas, NumPy, Microsoft Visual Studio, Plotly, stats models
Data Visualization: Tableau, Visualization packages
Machine Learning: Regression, Classification, Clustering, Association, Simple Linear Regression, Multiple Linear Regression, Decision Trees, Random forest, Logistic Regression, K-NN, SVM, Recommendation system, Association Rules, Apriori, ARIMA, EWMA, TF-IDF
PROFESSIONAL EXPERIENCE:
Confidential, Pennsylvania
Data Scientist
Responsibilities:
- Developed Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Implemented Porter Stemmer (Natural Language Tool Kit) and NLP bag of words model (CountVectorizer) to prepare the data. Resulted clusters are plotted visually using Tableau legends.
- Developed Natural Language Processing/A.I to automate the classification of positive and negative reviews by the text processing using NLTK. (Sentimental Analyzer)
- Extracting data from Big Data Hadoop Data Lake, Excel, Analyzing, Cleaning, Sorting, Merging Reporting and creating dashboards using Base SAS, SAS Macros, SQL, Hive, SAS VA, SAS, and Excel.
- Conducting studies, rapid plots and using advanced data mining and statistical modeling techniques to build a solution that optimizes the quality and performance of data.
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data lifecycle management in both RDBMS, Big Data environments.
- Developed deep neural network using H2O package in R with 5 hidden layers and predicted particle signals with 70 percent accuracy.
- Implementing Natural Language Processing (NLP) tools such as NLTK, Stanford’s core NLP suite
- Stored and retrieved data from data-warehouses using Amazon Redshift and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig and developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Analyzing large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Experience in Implementing python alongside using various libraries such as matplotlib for charts and graphs, MySQL DB for database connectivity, Pandas data frame, NumPy.
- Responsible for developing predictive models and deploy them with interactive visualizations in Seaborn, matplotlib, Tableau.
- Involved in Process mining for bridging the gap between data mining and business process modeling
- Create and maintain SSIS packages. Maintain SSRS reports. Utilize SourceTree and Git to maintain version control.
- Built Random Forest Regression model in R for the time series prediction and connected with tableau using external ODBC in the tableau.
- Generated interactive Bar Graphs on the forecasted sales through Tableau.
- Wrangled data, worked on large datasets (acquired data and cleaned the data), analyzed trends by making visualizations using matplotlib and python.
- Experience with TensorFlow, Theano, Keras and other Deep Learning Frameworks.
- Built Artificial Neural Network using TensorFlow in Python to identify the customer's probability of canceling the connections. (Churn rate prediction)
- Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
- Predicted the products that are prone to be back ordered and products that are expected to be canceled.
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Designed both 3NF data models for ODS, OLTP systems and Dimensional Data Models using Star and Snowflake Schemas.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Created SQL tables with referential integrity and developed queries using SQL, SQL Plus,and PL/SQL.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and BusinessObjects.
Confidential, Albuquerque, NM
Business/ Data Analyst
Responsibilities:
- Understand and Created BRD’s, FSD’s to meet the business needs
- Interfaced with the stakeholders and facilitated communication between client and the IT department.
- Involved in creating targeted questionnaires and interviews for SME’s, clients and business users to gather requirements.
- Involved in Feasibility and Risk analysis to identify the business critical and high-risk areas of the application.
- Worked as a Liaison between the business users and the developers to submit requirements and changes, clarify questions and issues.
- Extensively experienced for organizing and documenting requirements, preparing use-cases, writing business document and reports in Pre-Testing phase.
- Documented Swim Lane diagrams, workflows using MS Visio, creation of forms libraries and document libraries.
- Responsible for tracking milestones for intranet migration projects of portals and web sites.
- Specialized experience in generating, gathering, and compiling data from a variety of business data systems (Oracle, Microsoft Access, MySQL, SQL Server, Tableau) to provide information to management on improving the effectiveness of business/program operations, processes, projects and/or systems.
- Developed queries on the existing databases to provide ad-hoc reports using SQL for QA testing, Reporting / Data validation and verification.
- Managed the planning and development of design and procedures for metrics reports.
- Presented tools and recommendations to directors and executive staff
- Improved data flow for over 3000 claims and corrected financial Out of balance of huge amount
- Managed end-to-end process for updating and verifying special orders data
- Conducted ETL performance tuning, troubleshooting, support, and capacity estimation.
- Assisted with user Acceptance testing of systems (UAT), with developing and maintaining quality procedures, and with ensuring that appropriate documentation was in place.
- Advanced Microsoft Office software suite and Microsoft Excel.
- Efficiently organized information in a logical manner so that it can be easily retrieved
Confidential
Machine Learning/ Data Scientist
Responsibilities:
- Participated in Business meetings to understand the business needs & requirements.
- Extracting data from Big Data Hadoop Data Lake, Excel, Analyzing, Cleaning, Sorting, Merging Reporting and creating dashboards using Base SAS, SAS Macros, SQL, Hive, SAS VA, SAS, and Excel.
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Used Python to generate regression models to provide statistical forecasting and applied Clustering Algorithms such as K-Means to categorize customers into certain groups.
- Analyzing large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Responsible for developing predictive models and deploy them with interactive visualizations in Seaborn, matplotlib, plotly.
- Wrangled data, worked on large datasets (acquired data and cleaned the data), analyzed trends by making visualizations using matplotlib and python.
- Experience in creating Data Visualizations for KPI's as per the business requirements for various departments.
- Responsible for Extracting data to create Value Added Datasets using Python, R, SAS, Azure and SQL to analyze the behavior to target a specific set of customers to obtain hidden insights within the data to effectively implement the project objectives
- Proficient in gathering and analyzing the Business Requirements with experience in documenting System Requirement Specifications (SRS) and Functional Requirement Specifications (FRS)
- Extracting data from Big Data Hadoop Data Lake, Excel, Analyzing, Cleaning, Sorting, Merging Reporting and creating dashboards using Base SQL, Hive, SAS, and Excel.
- Conducting studies, rapid plots and using advanced data mining and statistical modeling techniques to build a solution that optimizes the quality and performance of data.
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data lifecycle management in both RDBMS, Big Data environments.
- Have used OpenCV and tenser-flow to analyze the image to create recognition models (face or object).
- Preprocessing and filtering collected image dataset by Python and OpenCV. The dataset is used for ML application.
- Leveraged OpenCV to implement and train machine learning models to identify and recognize images to provide image recognition capabilities for clients
- Experience with TensorFlow, Theano, Keras and other Deep Learning Frameworks.
- Built Artificial Neural Network using TensorFlow in Python to identify the customer's probability of canceling the connections. (Churn rate prediction)
- Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
- Used Predictive Analytics to analyze the shopping behavior of the customers.
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Created SQL tables with referential integrity and developed queries using SQL, SQL Plus,and PL/SQL.
- Strong knowledge on SWIFT and Global cash payments.
- Good understanding in AML standards and has experience working on sanctions screening.
Confidential
Data Analyst/ BI Developer
Responsibilities:
- Requirement gathering and analyzing data sources. Assisted in design and implementation of Database.
- Created SQL Server Reports and designing stylish report layout parameterized, performance, Ad-hoc reports.
- Extensively used Joins and sub-Queries to simplify complex queries involving multiple tables.
- Worked with Stored Procedure, Queries, Triggers, Functions, Indexes, User-defined functions
- Used SSIS and T-SQL stored procedures to transfer data from source databases to the staging area and finally transfer into the data warehouse.
- Developed and deployed data transfers packages to the appropriate destinations using SSIS.
- Created SQL server configurations for SSIS packages.
- Expert in transforming complex business logic into Database design and maintaining it by using SQL tools like Stored Procedures, User Defined Functions, and Views.
- Expert in Developing SSIS Packages to Extract, Transform and Load (ETL) data into the Data warehouse from Heterogeneous databases such as Oracle, DB2, Sybase and MS Access.
- Used SAS Proc SQL pass-throughfacility to connect to Oracle tables and created SAS datasets using various SQL joins such as left join, right join, inner join and full join.
- Performing data validation, transforming data from RDBMS oracle to SAS datasets.
- Produce quality customized reports by using PROC TABULATE, PROC REPORT Styles, and ODS RTF and provide descriptive statistics using PROC MEANS, PROC FREQ, and PROC UNIVARIATE.
- Developed SAS macros for data cleaning, reporting and to support routing processing.
- Performed advanced querying using SAS Enterprise Guide, calculating computed columns, using afilter, manipulate and prepare data for Reporting, Graphing, and Summarization, statistical analysis, finally generating SAS datasets.
- Involved in Developing, Debugging, and validating the project-specific SAS programs to generate derived SAS datasets, summary tables, and data listings according to study documents.
- Created datasets as per the approved specification collaborated with project teams to complete scientific reports and review reports to ensure accuracy and clarity.
- Experienced in working with data modelers to translate business rules/requirements into conceptual/logical dimensional models and worked with complex de-normalized and normalized data models
- Performed different calculations like Quick table calculations, Date Calculations, Aggregate Calculations, String and Number Calculations.
- Good expertise in building dashboards and stories based on the available data points.